Publishers Must Embrace Data-First Thinking

I’ve always viewed direct-to-consumer strategies through the lens of revenue. Direct selling is a moneymaking strategy meant to bypass revenue-draining middlemen like Baker & Talyer and Amazon.

It hadn’t occurred to me, though, that with any direct-to-consumer sale or online interaction, publishers gain something equally as valuable as revenue—consumer data.

Much of the publishing industry is not in a data-first mindset asserts Tom Davenport in a recent article dubbed “Book Publishing’s Big Data Future”. Although Davenport criticizes the publishing industry as a big data underachiever, he admits that’s in part because publishers have traditionally worked through intermediaries to reach consumers. But that dynamic is changing.

The intermediaries, like Amazon and Scribd, are becoming the content creators with their own publishing platforms and even, in Amazon’s case, original series. These companies can do this because they began as retail channels, tracking data that built a base for future business development. It’s time that publishers create their own channels and do the same.

Though slow to adapt, there are signs of change among Big Five publishers. Simon & Schuster recently announced the launch of its book recommendation platform, Off the Shelf. The site recommends previously published works (from all publishers) and allows users to build a “Shelf” of must-reads. Users can then buy the work through the online retail channel of their choosing. Not only can Simon & Schuster pocket a portion of those sales, but the publisher can also track what books and book lists are most popular among readers.

HarperCollins has also dabbled in creating channels for consumers with its launch of and in 2013, both of which sell direct the works of C.S. Lewis. Though still very much an experiment, the sites provide an opportunity for HarperCollins to track reader consumption through a partnership with Bluefire, and gather data that may influence the creation of future direct sale sites.

In the education sector, variations of the direct-to-consumer approach have proliferated much more rapidly than in the trade sector. Higher education publishers like John Wiley & Sons have launched their own e-learning platforms, such as WileyPlus, to distribute their content directly to students and professors. Just yesterday, News Corp announced the launch of Amplify, an all-digital English curriculum for middle schools. Not far behind, McGraw Hill recently partnered with StudySync to provide its own middle school curriculum. These companies are embracing new startups and technology to create their own channels and work directly with consumers.

Although these experiments are encouraging, publishers still have a long way to go in harnessing big data. Like tech giants Google, Facebook, and yes, Amazon, publishers need to create workflows in which big data informs product development. No longer should books be created on hunches or on memories of what has sold before. Big data is a crucial foundation for development and imperative for building meaningful consumer relationships. Information is key for publishers, and right now retailers have a monopoly.

Ellen Harvey is the associate/digital editor of Publishing Executive. 

Related Content
  • Stanislav Fritz

    I posted a response to this in my own blog ( which I will repeat here:

    This article has some very interesting facts, but suffers from a number of fallacies. It is a conflagration of data and facts that destroys any semblance of a conclusion. This may be rooted in the article that the blog cites, by Tom Davenport, but I would have hoped that there would have been some analysis included in the blog entry.

    Data, by its nature, is the past, yet there is a blast against big publishers for making bets on books based on what succeeded in the past. A retailer (such as Amazon) can use REAL TIME data, or near real time data to adjust certain things, such as pricing and what people might be interested in from a VIRTUAL inventory. A publisher has a lag time. The data will be old, even with technology, and the time to market—even rushed—will create a lag between the recognition of the data and action on it.

    Now, this does not preclude the importance of data, I am a data person from way back. But, there is data and there is information. Data tends to be raw and needs interpretation. It also tends to be badly interpreted. I can demonstrate a statistical correlation between eating tomatoes and (for instance) the number of orgasms per year a person has. Yet, the correlation is probably nothing but a statistical artifact. Still, I am sure someone would then market tomatoes as the next cure for your sex life if I came up with a study with that data.

    What Amazon is doing (disclaimer, I used to work there) is not just data, but using the business concept of “spinning the flywheel.” (The concept of the flywheel effect was popularized by Jim Collins in his book “Good to Great.”) The root of this is not just data, but synergies. Because of technology, Amazon is becoming a both vertically integrated and horizontally integrated, where ever it spins the flywheel. The examples Ms. Harvey cites of positive actions by publishers are really more of the vertical integration. Combining two, or more, businesses that are in different stages of production of similar products (e.g. a farm combined with a food manufacturer combined with a supermarket). By doing this, they are indeed capturing data. But, they are more importantly in control of multiple stages and able to respond separately to each stage as they see fit. Amazon does this incredibly well and it is all part of the flywheel.

    Unfortunately, for the publisher, they do not have the horizontal integration that Amazon has. This is what really spins the flywheel. While Amazon uses some data on trends and what customers wanted, it too has the issue of lag time when producing its own video, or book imprints. What it has is the ability to largely ignore the need to guess and interpret the data by letting the market (or rather its market) determine the winners and losers and then automatically the system responds. If you pay your authors (largely) by only a percentage, the authors self-select out of the system. This is not so much data, but self-correcting systems that are possible when you control a large share and are both vertically and horizontally integrated.

    To Ms. Harvey’s credit, the final section of her post captures the essence of this. Workflow. When you have a vertically integrated system, with a strong flywheel, you create a workflow that always provides additional momentum to the flywheel, rather than spin against it. Amazon is fantastic at this sort of thinking and internal development. In no other company that I have observed is this flywheel effect and workflow to mesh with it better implemented. It has its flaws, including stifling innovation that goes against the flywheel, but it is massively successful in creating growth (which Amazon does for revenue and customers, if not for profits).