Improving Ebook Data Quality: A Frank Assessment & The Path Forward
Quality is never an accident; it is always the result of high intention, sincere effort, intelligent direction, and skillful execution; it represents the wise choice of many alternatives.
–WILLIAM A. FOSTER
Book publishers are a fortunate lot, especially compared to those in other, more besieged media markets, such as the magazine or newspaper industries. In particular, book publishers benefit from a robust, digital formatting standard in EPUB. The EPUB 3 specification is a comprehensive technical standard for digital book content that includes not only a standard way for encoding book content (HTML5), but also a comprehensive set of specifications for metadata that define both the structure of a book as well as label its subject matter.
EPUB 3 is a tremendous asset for publishers of all sizes in most market segments. Yet the level of data quality in publishers' EPUB titles continues to be both inconsistent and surprisingly low. This seems to be true regardless of whether the EPUB is generated in-house or through a service provider.
EPUB data quality is an issue that involves a number of aspects including:
- Standards Conformance: Does the EPUB file conform to the IDPF standards? Does it pass validation?
- Consistency of Data: Are a publisher’s EPUB titles implemented in a similar way or is there variability from title to title?
- Robust Metadata: Has the publisher consistently implemented Dublin Core bibliographic metadata, subject-specific subject terms, and structural tags?
- Quality of CSS: Cascading Style Sheets file in the EPUB package determines how the ebook will look on a device or in a browser. Was the CSS created with precision and with a sensitivity to design best practices, as well as the design standards of the publisher?
- Cross-browser/device behavior: Different devices and browsers may handle display in different ways. Do the CSS and other EPUB components include the requisite information to optimize the appearance of the EPUB in these varying scenarios?
The strategic value a publishing organization derives from EPUB will be determined in large part by the level of data quality of the EPUBs they produce. But publishers are giving ebooks short shrift. The quality of commercial EPUBs today is far lower than the standards publishers set for their print products. While this does not come as a surprise to production managers and publishing services organizations—those involved in the nitty-gritty of data conversion—it often is a shock to senior management, who thought that EPUB creation was a problem solved long ago.'s
Related story: Serial Fiction Startup JukePop Democratizes the Editorial Process