Finding Fortune With Predictive Semantics
Publishers have become complacent. They have settled for plateauing growth in ebook adoption, as though accounting for 30% of total book sales is the intended and mature state. It is not. We have not infused enough of the "e" into ebook by taking advantage of metadata as a source of information. The greatest value of ebooks is not in making them sexy for readers. The greatest value happens behind the scenes.
Arguably, the largest missed opportunity for publishers in the digital revolution is in predictive analytics. Predictive analytics is an algorithm-based science of deciphering captured data to discover probable customer actions. Capturing relevant data is severely limited with printed books as compared to ebooks. Location, device, reader behavior, and demographics (attainable depending on device and app) are just some of the ebook data that can power analytics.
The value of predictive analytics has already been proven in other industries. In e-commerce it has increased sales. According to the Harvard Business Review, "Research shows that personalization can deliver five to eight times the ROI on marketing spend and lift sales 10% or more." In insurance it determines our premiums, and in sports it wins games. The Boston Red Sox won the World Series in 2004 and 2007 and perhaps even last year thanks in large part to a change in strategy where they created a predictive value to players. Thanks to the movie Moneyball, this use case is widely known!
Predictive analytics requires three ingredients: a goal, data sets, and an algorithm. For e-commerce, the data sets are curated sales data and external variables (e.g. weather at customer location), the algorithm is based on user behavior, and the goal is to provide personalized product recommendations. For example, a teenage girl in rainy London shopping for ruby red boots may be interested in a ruby red raincoat, rather than other footwear products.
In insurance, the data set is client demographic data, the algorithm assigns life expectancy, and the goal is to tie risk to premiums. For example, a 30-year-old fighter pilot serving in Iraq should have higher life insurance premiums than a 40-year-old physician in New Jersey with no family history of disease.
In sports, the data sets are player statistics, the algorithm assigns values to a player's potential success, and the goal is get the highest score. For example, as in Moneyball, a recruiter can tie a baseball player's potential success to his salary, ensuring the lowest price per win.
It is time for book publishers to profit from predictive analytics as well.
In the publishing arena, however, applying predictive analytics is trickier. Unlike the aforementioned cases, our data sets are not numbers, our data sets are words. Therefore, we need to introduce a method of translating words into numbers. Semantic analysis is that method.
This analysis requires a semantic parser. A semantic parser, extracts the text and analyzes the structure of sentence meaning by mapping a natural language sentence into a logical set of defined entities. The entities may be sourced from a pre-defined taxonomy, a third-party database of concepts, people and locations, or a more general dictionary. Advanced semantic parsers are self-learning and require substantive customization. In applying such a tool, we can extrapolate data including introduced concepts, sentiment, and document summaries. The output is what some in the industry may refer to as "Smart Content" that is primed to be split into custom chunks of text. These content chunks can be networked with other content, or even scored for relative importance and relevance (thus translating text into numbers). Combining semantic analysis with the strategies found in predictive analytics gives us "predictive semantics."
Here is where the magic happens. Cross-referencing semantic data about a book about animals, for example, with external data sources, such as reading and buying patterns of that book, provides us with actionable insights. For example, we could posit that based on the reading patterns, 60% of readers started by flipping to the chapter on elephants, specifically the pictures. Based on this information (and other relevant data), we can predict that a photo book on elephants would be a valuable investment. Abracadabra!
Predicting what content would attract an audience is just the beginning. Semantic analysis offers insight into the sentiment relayed through text, which could further improve recommendations. For example, predictive semantics evince that a reader of an article about Vladimir Putin's "over-aggressive" leadership style, would be interested in a book that alleges Putin is a sociopath or a book about the atrocities committed by Stalin, rather than Putin's autobiography. Abracadabra again!
The potential of predictive semantics grows exponentially when adding external data sources. Audience data (e.g. gender, age, location, and income) significantly furthers the predictive value. For example, such data could tell me that the reader is 23 years old and away from home with limited income. Semantic analysis can then identify books that are travel guides suited for younger travelers on a budget.
A few startups are diligently chasing business models behind these predictive analytics and semantic analysis and building solutions for publishers in predictive semantics. For example, Aptara, the company that I work for, is investing in one iteration of predictive semantics in the discoverability arena. Our effort ties semantically tagged content with dynamic, real-time interests of potential consumers. The result is the automatic matching of relevant books with a larger audience of book shoppers.
Applying predictive semantics supplies actionable insights that lead to revenue. It is my opinion that successful solutions will not only apply predictive semantics, but also automate resulting actions to deliver measurable returns. As such, book publishers should expect vendors in the analytics space to do more than just "advise," but also automate much of the "do."
The next evolution of ebooks will not just add value to readers, but will empower book publishers, distributors, and retailers with data. Predictive semantics will leverage the "e" in ebooks to feed the data necessary to not only boost revenues but also realize the potential of the digital promise.
Pavan Arora is Aptara's chief innovation officer. Previously he was a digital innovation consultant to McGraw-Hill, World Bank Publishing, and the Library of Congress.
Related story: Welcome to the Metadata Millennium: A Complete Overview of What Metadata Can Do for Publishers