Creating An Interoperable Publishing Ecosystem

By Bill Kasdorf

In the past, the publishing technology standards landscape has been pretty fragmented and siloed. The famous quip, "The great thing about standards is that there are so many of them," is quite true.

Why is this? It's because most standards are developed to address the needs of a specific community of interest, in a specific domain, to accomplish specific things. This is not a bad thing in itself. In fact, it's usually a pretty good thing for those individual communities.

But it's increasingly clear that this is not a good thing for the publishing ecosystem in general. Today, everything wants to be connected to everything. Disconnects are dysfunctional. People expect interoperability, portability, reliability, and adaptability.

Interoperability because enabling systems and processes to interoperate smoothly and predictably removes friction, resulting in efficiency, economy, and flexibility.
Portability because requiring different versions of content for different devices, systems, or platforms-including ones not yet invented-gets old real fast.
Reliability because things need to "just work" when you move or access content between devices and systems.
Adaptability because we should be able to take advantage of new developments without starting over from scratch.

Standards provide the oil that removes friction, the "common language" that lets systems understand each other, and the broad base of support and wide use that keeps them up to date. And the best standards are standards based on other standards.

Today's digital publishing ecosystem is largely based on, or depends upon, the group of standards developed and maintained by the W3C, the World Wide Web Consortium and referred to collectively as the Open Web Platform (OWP). Comprising over 50 standards, including the ubiquitous XML, HTML, and CSS, the OWP is the basis for marking up, managing, transforming, rendering, and disseminating publications via web-based technologies.

Of course, there are a lot of standards maintained by other standards organizations as well-and even de facto standards that are not formal standards at all. Many such are those that I referred to at the outset, developed by particular communities to address specific needs. While those standards are still essential for those communities, new standards being developed and established standards being updated are increasingly produced in a spirit of collaboration, or at least with the goal in mind to avoid conflict with widely used standards. We're seeing real convergence occurring.

What is most striking is that folks who used to be firmly committed to doing things in proprietary ways (often for the perfectly reasonable goal of differentiating and competing in the marketplace) are coming to realize the benefit of conforming to standards. Even companies that are in most other respects fierce competitors are sitting down together and collaborating to work things out for the common good.

Here are some interesting examples.

Schema.org
Schema.org is a collection of vocabularies and properties that enable semantic enrichment of content in a manner that is natively recognized by web browsers and search engines. In plain English, this is what lets you "tag" a string of numbers and letters as an address, and a browser or search engine will automatically know it's an address. If your content contains recipes, schema.org tagging lets a browser or search engine know which parts are recipes, and within a recipe, which bits are ingredients. With schema.org, you can tag a word as the name of a corporation, and an acronym as a stock ticker abbreviation, and they "work" in search and online.

In Google, these are known as "rich snippets." But if this only worked in Google, how useful would it be, really? What if everybody just made up their own way of tagging these things, or what if each search engine required you to tag them their way? Yipes!

Believe it or not, schema.org is a collaboration of Bing, Google, Yahoo! and Yandex. It's an excellent and quite surprising example of fierce competitors working together for the common good. Not altruistically, I should point out: they were smart enough to realize that doing this in proprietary ways just plain wouldn't work.

Many people think schema.org is a W3C standard, but it's not. To put it bluntly, it works because those few big players made it work. But even though it's not a formal standard, it is so useful and becoming so ubiquitous that schema.org has been incorporated by reference in HTML5. After years in development, HTML5 was finally issued as a formal W3C Recommendation on October 28, 2014. It is already in wide use in all major browsers and e-readers, and it's a foundation of the Open Web Platform and of EPUB 3.

EPUB and Readium
At its core, EPUB 3 is "packaged web content." Its content documents are purely HTML5, expressed as XML (XHTML5). EPUB is firmly committed to maintaining alignment with HTML and web standards as they evolve. As an example, one of the recent updates in EPUB 3.0.1 is to accommodate schema.org, because that became an official part of HTML5. (Now you see how interlocking these things are becoming.)

The convergence around EPUB 3 in the ereader ecosystem is practically universal now. Even Amazon, the perennial outlier, prefers to get EPUB 3 for its KindleGen process. And as an example of things "just working" when folks align with standards, because Apple implemented its pop-up footnote feature in iBooks based on the EPUB 3 spec, any properly coded footnote (using the EPUB-based attribute @epub:type="footnote") just works. How cool is that!

The EPUB 3 standard itself is an example of competitors cooperating. The EPUB 3 Working Group included Apple and B&N and Kobo, Google and Microsoft, Adobe and Monotype, Pearson and Hachette, Penguin and Wiley, and scores of others-publishers, technology companies, service providers, libraries, and others across the publishing ecosystem.

That level of collaboration on a standard is great, but what I think is even more notable is the development of a free, open-source implementation of EPUB 3 that was done by a similarly collaborative group of erstwhile competitors: Readium.

This has evolved to what is now known as the Readium Foundation, an independent nonprofit corporation with a mission to develop product-quality open source technology to advance EPUB and the Open Web Platform for publishing. Its members are a diverse and international group-including, for example, Hachette, Gallimard, Adobe, Google, Kobo, Ingram/VitalSource, Deutsche Telekom, IBM, the New York Public Library, Intel, Penguin Random House, and many others.

There are currently three main initiatives from Readium:

Readium SDK, an EPUB 3-conformant rendering engine for native mobile and device apps.
Readium JS, an EPUB 3-conformant rendering engine for cloud and browser-based apps.
Readium LCP, a lightweight DRM system.

These initiatives have had a big impact on accelerating the adoption and implementation of EPUB 3. There are already 14 vendors shipping apps based on Readium SDK, and the Readium JS-based Readium for Google Chrome has over 300,000 users.

And remember, under the hood this is all OWP-based. This is real convergence.

EDUPUB
Building on top of EPUB is EDUPUB, the profile of EPUB 3 for educational content. While the formal development of the EDUPUB specification is being done as an activity of the IDPF EPUB 3 Working Group, this activity was prompted by and is integrated with the work of a loose collaboration of organizations known as the EDUPUB Alliance.

Initially launched a year ago by IDPF, IMS Global (an organization governing many key educational standards), and the W3C, the EDUPUB Alliance set out not to create a new standard. Instead, the concept was to enable existing and widely used standards to become interoperable in a way that enhances all of them.

For example, IMS Global governs three important standards used in the interchange of educational content and data: Question and Test Interoperability (QTI), Learning Tools Interoperability (LTI), and Caliper Analytics. As the EDUPUB profile of EPUB is being developed, it is being engineered to accommodate these standards, but it does not change them. (Using IMS Caliper Analytics, Question and Test Interoperability, and Learning Tools Interoperability with EPUB3: EDUPUB Best Practices is available for free download at http://www.imsglobal.org.)

The EDUPUB profile of EPUB is not, fundamentally, a separate standard from EPUB. Any publication or document conforming to the EDUPUB spec is, by definition, a conformant EPUB 3 publication or document. What EDUPUB does is to specify how to use EPUB to optimize it for educational content. It actually adds few additional requirements. Instead, it provides features like recommended metadata (e.g., designating "teacher edition" content, and providing accessibility information) and structural vocabulary for things that are important in an educational content, like assessments (quizzes, texts, exercises, etc.).

And an interesting byproduct of the development of the EDUPUB spec was that it led to advances to the EPUB spec in general. For example, educational content often needs "widgets" for interactive features (like those assessments). This led to the development of a more general specification for "Scriptable Components" in EPUB that is useful for any type of content, not just educational content. The same thing happened with "Distributable Objects": chunks of content within a publication that can be pulled out and distributed separately, from a chapter to a video to an exercise.

So this is an example not only of convergence but of expansion as well-all within the same standards-based, interoperable sphere.

The latest draft of the EDUPUB profile of EPUB, published by the IDPF on November 27, 2014, is available at http://www.idpf.org/epub/profiles/edu/spec/edupub-20141127.html.

EPUB-WEB
Perhaps the most exciting and visionary development in the context of the collaboration and convergence of standards is the recently announced concept of "EPUB-WEB." First broached as a presentation at the October 2014 Books in Browsers conference, it was published as a white paper, jointly authored by Markus Gylling (CTO of the IDPF and the leader of all EPUB development) and Ivan Herman of the W3C, entitled Advancing Portable Documents for the Open Web Platform: EPUB-WEB in an explicitly "Unofficial Draft" on November 21, 2014.

This white paper sets out a long-range vision-expected to take years to accomplish-that will ultimately result in an EPUB and a website as being two distinct "states" of the same thing. That is, instead of there needing to be a given set of content and resources for a website, and an almost identical set of content and resources packaged as an EPUB for offline access and distribution, there will ideally be no difference between them. Opening up such a document on a browser over the web would deliver a virtually identical experience to opening it up in a phone, tablet, or ereader-or on future platforms or devices not yet even dreamed of.

To those of us in the publishing technology space, this is Nirvana. In no way should you expect this vision to be realized anytime soon; in fact there is a chance it will never be realized at all. But it is realistic and concrete. The vision is solid and well articulated. And there is no better example of the benefits of growing collaboration and convergence in the publishing technology standards landscape.

Bill Kasdorf is vice president of Apex Content Solutions.

2 Comments

View Comments

Places:
EPUB 3.0.1

Bill Kasdorf Author's page