Books as a technology have been around for such a long time that it’s easy to forget that they were once major innovations themselves: Before the book there was the scroll, and the clay tablet before that, and it wasn’t until the end of the 16th century that modern indexes were invented, still largely the standard of organizing book content in modern times. But today, data management technologies have matured to the point where they can be used more widely in the book industry and impact publishers’ bottom lines.
These technologies, also called semantic technology, allow for a kind of automated and intelligent remixing of content at a much lower overhead cost than manually combining and repackaging books. This isn’t a particularly new idea, however the technology that enables this to happen at scale has reached the level of maturity that publishers must take it seriously in order to remain competitive.
Here are a couple of ways publishers can implement data management technology to drive new revenue for their business.
Textbooks: Innovation Ground Zero
In many ways the educational publishing sector is most suited for technological innovation of their business models. For certain subjects, textbook content is practically identical across the globe: Everyone learns about calculus, chemical reactions, and how to use a comma. The key difference between countries’ education systems, and therefore their textbook requirements, is the national curriculum set out by a country's education department or ministry.
Right now educational publishers spend vast amounts of money on textbook localization, which seems like a waste considering the similarity of the content they publish across different territories. Publishers are beginning to understand the need for a smart, automated system that can repackage their content for different territories.
That type of system is built on semantic technology and analysis and is currently offered by a number of vendors. Semantic analysis can get under the skin of unstructured data by isolating and labeling entities within text, breaking down content into subject, predicate, and object, and then saving the relationships between these information points in specialized databases. Publishers who implement this technology, gain a more nuanced grasp of their text assets and it is this more nuanced understanding which enables publishers to repackage and repurpose their content more efficiently.
How would this work for textbook producers? This technology is helpful in analyzing and matching the educational content that needs to go in the textbook, and the government requirements for that specific textbook. For example, let us imagine that a well-known textbook company has been commissioned for a math textbook from the South African government. The company would be able to translate the government’s requirements into a standard format, which they could then use to parse their math content and identify the relevant concepts, lessons, and chapter text.
Bringing Linked Data to Non-Fiction & Reference Works
This technology can be easily applied in the broader non-fiction publishing sector, with similar results. The rise of ebooks has led publishers to adopt more platform agnostic strategies. Publishers must view themselves as providers of a service -- imparting knowledge -- and need to be flexible about whether that is delivered via a printed book, an ebook, or even a third party website, phone, or tablet to which they have sold their content.
If non-fiction publishers link up their data, they will gain enhanced abilities to ‘slice and dice’ their content according to the financial opportunities that come knocking, and will not have to invest as heavily in rewriting or repurposing existing content. Storing linked data in a specialized database makes it easier to isolate relevant existing content for reuse.
For example, imagine you are content manager for an encyclopaedia specializing in dogs. If Crufts, a well-known British dog show, approaches you about a partnership you need to be able to make the most out of any potential deal, while minimizing your expenses. Under existing processes you’d have to use up your employees’ precious time to isolate the relevant content for the Crufts project and package it up for them. This is because content is trapped within the print media with minimal metadata, and can’t be used anywhere else. However if the publisher made use of linked data, content would be organized around the concepts associated with dogs, such as breed, region, or price. A good example of this concept is the BBC nature website, where visitors can view content on animals by region, behaviour, or genus.
Semantic technology is becoming a necessity for book publishers to organize the content they create and maximize its value. Like the printed book itself, this technological innovation will on day become the norm.
- Categories:
- Book Distribution
- Data
- Education
- Trade

Jarred McGinnis is UK managing consultant at Ontotext, a semantic technology company. Jarred has worked with organisations, companies and universities doing interesting projects involving semantic technologies, and has previously worked with companies such as the BBC and Press Association. Jarred holds a PhD in Informatics from the University of Edinburgh.