Improving Ebook Data Quality: A Frank Assessment & The Path Forward
Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction, and skillful execution; it represents the wise choice of many alternatives.
–WILLIAM A. FOSTER
Book publishers are a fortunate lot, especially compared to those in other, more besieged media markets, such as the magazine or newspaper industries. In particular, book publishers benefit from a robust, digital formatting standard in EPUB. The EPUB 3 specification is a comprehensive technical standard for digital book content that includes not only a standard way for encoding book content (HTML5), but also a comprehensive set of specifications for metadata that define both the structure of a book as well as label its subject matter.
EPUB 3 is a tremendous asset for publishers of all sizes in most market segments. Yet the level of data quality in publishers' EPUB titles continues to be both inconsistent and surprisingly low. This seems to be true regardless of whether the EPUB is generated in-house or through a service provider.
EPUB data quality is an issue that involves a number of aspects including:
- Standards Conformance: Does the EPUB file conform to the IDPF standards? Does it pass validation?
- Consistency of Data: Are a publisher’s EPUB titles implemented in a similar way or is there variability from title to title?
- Robust Metadata: Has the publisher consistently implemented Dublin Core bibliographic metadata, subject-specific subject terms, and structural tags?
- Quality of CSS: Cascading Style Sheets file in the EPUB package determines how the ebook will look on a device or in a browser. Was the CSS created with precision and with a sensitivity to design best practices, as well as the design standards of the publisher?
- Cross-browser/device behavior: Different devices and browsers may handle display in different ways. Do the CSS and other EPUB components include the requisite information to optimize the appearance of the EPUB in these varying scenarios?
The strategic value a publishing organization derives from EPUB will be determined in large part by the level of data quality of the EPUBs they produce. But publishers are giving ebooks short shrift. The quality of commercial EPUBs today is far lower than the standards publishers set for their print products. While this does not come as a surprise to production managers and publishing services organizations—those involved in the nitty-gritty of data conversion—it often is a shock to senior management, who thought that EPUB creation was a problem solved long ago.'s
In order to validate my observations, I turned to friend and colleague, Joshua Tallent, chief ebook architect at eBook Architects. Joshua is one of my gurus for all things EPUB. Joshua echoed what I am seeing in terms of shaky EPUB data quality. "Publishers are either unaware of the issues of data quality in their EPUB files or unable to make changes needed to improve it. Publishers who outsource their work to vendors are at the mercy of the vendor’s practices and tools. Publishers who create their ebooks in-house run into other problems if they are relying on one-button conversion tools (as found in InDesign) without first defining well-formed manuscripts. These tools tend to be garbage in, garbage out, so the quality of the EPUB files coming out the other side is usually not very good.”
Publishers must start to raise the level of data quality of their EPUB titles. If not, they will find it difficult to hold to price points in the market place as customers become more savvy and their expectations rise. Further, the quality of EPUB data will increasingly be a competitive differentiator, not only for customer sales but also for author acquisition. Authors will sign with publishers that can create the best EPUB experience for their title.
The EPUB quality status quo appears untenable. Rectifying the situation starts at the individual level. “Anyone involved in the creation or distribution of ebook files needs to learn more about the formats and best practices,” says Joshua. “That is true regardless of whether they are a developer, a manager, or the person doing QA. The more you know about what EPUB can do the better your ebook files will become. That means digging in and learning. It means reading the specs, taking classes about ebook development, and testing how these things work.”
Joshua continues: “I have been impressed by some movement in the past year or so on this front. Production managers at some publishing houses are becoming more knowledgeable about the need for quality code, and are starting to implement standards for their EPUB files. Some are moving to EPUB 3 with the goal of re-thinking their processes and taking their quality to the next level.”
A Recipe for Success
Publishing organizations can improve the quality of the EPUBs they produce by addressing the following issues in the following sequence:
- People: EPUB quality rests on the quality of the publishing team. Starting with training is a great idea, as Joshua described. Joshua’s own eBook Ninjas course is a good place to start. The IDPF can provide additional options for ramping up the EPUB chops of the publishing teams.
- Standards Definition: Publishers must define and document their own EPUB standards to include the detailed specifications for CSS, subject tagging, and structural tagging. These standards can be used as acceptance criteria for external EPUB service providers. All EPUBs produced must conform to these standards.
- Defined Workflow: A well-defined, documented production workflow assists the creation of high-quality EPUBs.
- Automation Platforms: An organization that has an EPUB-savvy team and well defined and documented EPUBs and workflow may be in a position to use an automated approach to create many of their EPUB titles directly from source manuscripts. Those organizations that embark on automation initiatives without first putting in place a team, standards, and workflow will find themselves on a more lengthy and painful course. A caveat: as Joshua points out there are limits to this approach and it may not be suitable for all titles.
For many publishing organizations, there is much to do in order to raise the bar on EPUB quality. But there is a bright side: the path forward is clear, far clearer than it was even a few years ago.
We have robust digital book standards. We have identified opportunities in the marketplace. Now is the time to put the two together.
Andrew Brenneman is founder and president of Finitiv. Finitiv partners with publishers to deliver subscription services to individuals and groups that provide access to collections of ebook titles.