Glossary of Metadata Terms
Metadata is data that describes a book’s content and format and provides the information needed to buy and sell books. Here’s a handy glossary with some important terms and organizations so you too can speak fluent Metadata! These terms are selected from the glossary of The Metadata Handbook.
Barcode - an optical machine readable representation of data. The Bookland EAN is the most widely used international barcode format for publishing and encodes the ISBN and price. The Bookland EAN barcode is required for print books by many book retailers and wholesalers and is used for automated tracking of sales information and inventory control.
BISAC Subject Headings - the North American standard for categorizing books based on topical content or genre. They help determine where physical books are shelved and are very important for online bookselling in metadata organization and search. Bookseller databases use them to create lists of titles by subject and in algorithms that suggest similar titles to readers.
Book Industry Study Group (BISG) - an American, not-for-profit membership organization that supports the book publishing industry through the development of standards, best practices, research, and events. BookNet Canada performs a similar role in that country. BISG and BookNet Canada also administer Metadata Certification programs. These programs evaluate the quality of a publisher's or vendor's metadata files based on structure, content, and adherence to best practices. Compliance to industry standards is evaluated and ranked by expert advisers and results in the awarding of levels of certification that may be shared with trading partners to indicate proficiency in metadata creation and adherence to standards.
Classification - is categorization and organization based on shared qualities or characteristics. In book metadata, this involves categorizing books using defined codes or subject lists (controlled vocabularies) to describe a book's content.
Controlled Vocabulary - a selected list of words and phrases used to reduce ambiguity and ensure consistency in description. BISAC Subject Headings, ONIX Code Lists, and Thema are examples of controlled vocabularies. Thema provides global standards for subject heading codes and was released in April 2013. Controlled vocabularies also provide consistency in interpretation of metadata transmitted and processed electronically. Words and phrases may also be represented by codes that further reduce error in machine interpretation. For example, the BISAC Subject Headings are also represented by codes, such as BUS090010, which has a literal translation of BUSINESS and ECONOMICS/E-Commerce/Internet Marketing. The ONIX Contributor Code A01 indicates that the contributor name provided is the author of the book. Other codes indicate, editor, illustrator, and other contributor roles. Controlled vocabularies make it more likely that someone seeking information would retrieve a relevant set of items in a search.
Electronic Data Interchange (EDI) - a broad term for the online exchange of structured data relating to commerce. EDI standards, such as EDIFACT and EDItX, were developed to carry information regarding commercial transactions but not to carry product metadata with the fullness and form suitable for public display that can be carried in ONIX. EDIFACT or EDitX are used when the information supplied is for business-to-business transactions only and will not be displayed to the public as descriptive information about a book.
EDItEUR - the international group coordinating development of the Thema and ONIX standards for books, ebooks, and serials. They provide free ONIX documentation and support for ONIX implementation. They also maintain the EDIFACT and EDItX standards used for electronic communication of business-to-business transaction information.
GTIN (Global Trade Item Number/GTIN-13) - a universal product identifier system for products that are bought and sold in the marketplace. GTINs may be 8, 12, 13, or 14 digits long. The expansion of the ISBN to thirteen digits created conformance to the GTIN-13 standard, making ISBN consistent with other non-book products. All book and serial publications sold internationally are expected to carry GTIN-13.
Identifier - a language-independent label that uniquely "names" an object within an identification scheme. Language-independent means that the identifier is a numeric or alphanumeric code (rather than a book title or a personal name, for example) that always refers to the same thing. Identification schemes define the rules for constructing the identifier, including how many characters it contains and what those characters stand for. ISBN, ISSN, ISNI, and ISTC are examples of standard, publishing industry-approved identifiers. The use of the ISBN in publishing, for example, allows accurate communication about a particular book product without needing to state the title, publisher, binding, price, and other version-specific information in every transaction. It helps ensure that the correct version of a book is delivered to the customer and that sales information is accurately captured.
Individual vendors may assign proprietary identifiers (Amazon ASINs or vendor SKUs, for example) to their products, but these are useful only within the vendors' systems and are not internationally recognized or controlled. A proprietary product number should not take the place of an industry-approved standard identifier, although both numbers may certainly co-exist within a bookseller's system. Identifiers that are accepted and controlled globally, such as ISBN, ISSN, ISTC, and ISNI, are recognized and interpreted across multiple systems and bookseller e-commerce sites.
International Standard Name Identifier (ISNI) - The 16-digit ISNI is an ISO (defined in the International Organization of Standards entry below) standard for the identification of "Public Identities." ISNIs are assigned to the "Public Identities" of parties that participate in the creation, production, management, or distribution of cultural goods. The party can be a person, such as a book author, or a legal entity, such as a record label. It provides a tool for pulling together different forms of a name (such as linking pseudonyms to the appropriate identity), or to disambiguate multiple identities with the same name (John Smith, for example). Consistent use of ISNI as part of book metadata about contributor names makes it much easier for bookseller sites to correctly display all titles by the same author, even if the author's name is identical or similar to many other author names.
International Standard Text Code (ISTC) - a numbering system to uniquely identify text-based works, was published as an ISO standard in 2009. Unlike the ISBN, the ISTC identifies a "work" rather than a "product" and allows linking together of publications with the same basic content. For example, Sense and Sensibility is a "work" that is available as many different "products" that are packaged in many formats (paperback, hardcover, ebook, etc.) by various publishers. Each of these products should have a unique ISBN but should carry the same ISTC, allowing the entire range of choices to be easily retrieved and displayed to consumers on bookseller websites.
International Standard Book Number (ISBN) - the ISO-approved global standard for identifying a book product. ISBN use facilitates commerce activities across countries and systems by providing a unique identifier that always refers to the same product. The identifier provides reliable and unambiguous machine matching to a specific version of a book for buying and selling activities. Since January 2007, all ISBNs issued conform to a 13-digit standard. The expansion increased capacity and also allowed ISBN to merge with the GTIN (Global Trade Identification Number) data structure. The GTIN family includes the UPC (Universal Product Code) that is used in creating machine-readable barcodes. The ISBN is now consistent with the international product identification system used by multiple industries to buy, sell, and track non-book products.
International Organization of Standards (ISO) - an independent, non-governmental organization founded in 1947 "to facilitate the international coordination and unification of industrial standards." It is the largest developer of voluntary international standards, and the organization has published more than 19,000 international standards covering most aspects of technology and business. Organizational membership is made up of national standards bodies in approximately 130 countries. ISO-approved standards, such as ISBN, have been developed and tested by an expert technical committee consensus process to ensure their efficacy for international business transactions. When a standard has been ISO-approved, industries have assurance that when they provide electronic information conforming to the standard it will be consistently received and understood by business partners worldwide.
International Standard Serial Number (ISSN) -
A unique 8-digit code, ISSN is the international identifier for print and electronic serial publications such as newspapers, magazines, and other continuing resource of all kinds. As opposed to stand-alone books (monographs), serials are issued on a regular basis and have the same title (The New York Times, for example) but different content. The ISSN allows all the different issues of a serial publication to be reliably tracked, retrieved, and displayed. The ISSN International Centre was created in Paris in 1976 under the terms of an agreement between UNESCO and France, the host country. The ISSN International Centre coordinates the ISSN assignment and management activities of 88 member countries.
Metadata Best Practices - A best practice is a technique or method that consistently shows superior results and that is therefore used as an industry benchmark. Adoption of best practices is generally voluntary, but is encouraged by industries to promote efficiency and consistency in business and technology practices across multiple participants, systems, and business transactions. The North American publishing industry looks to Best Practices for Product Metadata, developed by the BISG Metadata Committee in coordination with BookNet Canada, for book metadata guidelines and recommendations.
ONIX - The ONIX family includes XML-based standards for Books, Serials, and Licensing Terms & Rights Information. ONIX for Books is the international standard intended to support computer-to-computer communication of book industry product information. EDItEUR coordinates development of ONIX standards.
Standard Address Number (SAN) - a unique seven-digit identifier signifying addresses of organizations involved in the publishing industry. SANs are used in electronic communications to accurately identify participants in commercial transactions.
Stock Keeping Unit (SKU) - a number or code used to identify each distinct product or service for sale, allowing businesses to track inventory and product availability. SKUs are often used to refer to different versions of the same product. Unlike ISBN and other nationally and internationally standardized identifiers, SKUs are usually assigned at the merchant level.
Structured Data - resides in fixed fields within a record or file. The metadata in ONIX files and records is structured data. Although data in XML files are not fixed in location like traditional database records, they are nevertheless structured because the data are tagged and can be accurately identified. In contrast, unstructured data is generally free-form text such as that found in word processing documents, web pages, and email messages.
Universal Product Code (UPC, UPC-12) - a unique numerical identifier for machine-readable encoding, currently used exclusively in barcodes. UPC is used mainly for media-music, movies, and video games, for example. ISBN is the required identifier encoded in book barcodes, but mass-market paperbacks may also carry a barcode encoded with UPC because they are commonly sold in outlets such as supermarkets, drugstores, and big-box chain stores.
XML (Extensible Mark-Up Language) - XML was designed to structure, store, and transport data electronically, whereas HTML (Hypertext Markup Language) was designed to facilitate web-based display of information. XML was designed to promote usability over the internet and is the most common tool for data transmission between applications. For example, XML-based formats have become the default for many office productivity tools, such as Microsoft Word and Apple's iWork. Information carried in standards created using XML format, such as ONIX, can be understood by most business partners and systems involved in publishing and bookselling.