The Future of Ebook Production Is Single-Sourced
Bob Glushko, Adjunct Professor at University of California Berkeley, Graduate School of Information Management & Systems
This article is part of the Summer Issue of Book Business. View the complete issue here.
Many publishers have adopted single-source technologies and methods that are motivated by the central idea that different output programs can select, transform, and assemble a variety of content formats by exploiting markup in a book's source files -- essentially streamlining the creation of multiple book formats and editions. Single-sourcing eliminates the redundancy, cost, and inconsistency of separate paths for print and ebook editions. However, for even the most ambitious publishers the number of formats and editions produced by single-sourcing from a content repository can be counted on one hand: a print edition or two, and maybe a couple of ebook formats.
It is possible to do vastly more with single-sourcing and provide a more valuable reading experience to the end user. Consider how a single automobile production line can support the assembly of thousands of customized variations of a car model. My research team at the University of California, Berkeley, with some consulting help, has developed analogous "production-line" techniques for single-source publishing.
Our publishing production line uses a simple configuration file to specify what and how markup is used, and can reliably automate the generation of over two thousand ebook versions of a textbook by tweaking the configuration. This flexibility would enable an instructor to tailor the text for a wide range of courses in many different academic disciplines and customize the text for both graduate and undergraduate students.
The crucible for this work is a book called "The Discipline of Organizing" (TDO). TDO proposes a transdisciplinary synthesis of ideas from library and information science, computer science, informatics, cognitive science, business, and other disciplines that "intentionally arrange collections of resources to enable interactions with them." We wanted TDO's book structure to mirror this content structure so we designed it as a core text that is supplemented by about 600 notes, each of which is identified by discipline in the source markup. In addition, we label about 15% of the chapter content as being focused on disciplinary-specific rather than the core transdisciplinary content. This content model turns the source files for our book into a "family of books" whose common core is extended by discipline-specific content. We then can build any particular book by filtering the content based on the disciplinary tagging.
Automakers do not manufacture in advance every possible configuration of options, and anything other than the most popular models must be built to order. Similarly, with eleven discipline tags in TDO's source files, there are 2048 distinct combinations of disciplines, and it is impractical to publish all of them. Instead, in August 2014, we published just two ebook versions that define the endpoints of possible disciplinary customization; a Professional Edition that contains all the content from all disciplines, and a Core Concepts Edition, with none of the disciplinary endnotes or discipline-specific content. Each edition has found a niche; the Professional Edition is typically used in graduate-level courses, and the Core Concepts is usually chosen in undergraduate courses and by lay readers.
Nevertheless, we were dissatisfied with having to constrain our powerful publishing production line to just two editions because it doesn't match the diversity of contexts in which TDO is being used. More than 50 schools use TDO for courses in Information Organization, Knowledge Management, Digital Collections, Information Architecture, Information Systems Design, and other fields that emphasize different disciplines around the core idea of organizing resources. Furthermore, not all students in a particular course have the same disciplinary backgrounds and interests, and not all parts of a book require or permit the same disciplinary supplementation. We wanted readers to be able to customize content in response to his preferences. This has led us to experiment with another approach that defers the generation of a particular version of an ebook from "publishing time" to "reading time". The same algorithms apply, but now the reader decides when and how to apply them, enabling the dynamic configuration of the book's content.
Most ebooks deal with supplemental content by including links that take readers away from the content or pop-up notes. The reader learns by trial and error whether the supplemental content is helpful or relevant. We believe that the reader's experience can be improved with better advance information, or cues, about what can be included and more precise mechanisms for selective inclusion. We are experimenting with embedding interactive visualizations in ebooks to enable readers to decide the extent and nature of available supplemental content.
The reader can interact with the visualizations to select disciplines, and then the ebook dynamically reformats to include and exclude content according to the reader's preferences. Soon readers will be able to save multiple configurations that adjust the book's content for different reading scenarios. For example, one student reading the book for the first time might want just the core content, and another might want to see everything, but both might want a different mix of content when studying for an exam. Our ultimate goal is to implement a distributed authoring and publishing system in which an instructor can author and tag new content that can be logically included and dynamically discovered in the body of source content.
Our project did not require rocket science, but it required much iteration of source markup and transformation software to get everything to work. We have not tried to hide the XML markup task by giving authors an HTML front end, as some publishers have done, because that just makes the software under the hood more complex. However, we plan to release most of our work as an open source project within the next year, and perhaps a clever publisher can find a way to combine the expressiveness and flexibility of rich markup in our platform with the ease of use of lightweight authoring tools.
Bob Glushko is Adjunct Professor at University of California Berkeley, Graduate School of Information Management & Systems
Related story: Multi-Channel Publishing: A Case Study



