Let’s Take “Search Inside the Book” to a Whole New Level

By Joe Wikert

Do you remember when Amazon introduced both “Look Inside” and “Search Inside” functionality for books? They were such simple yet revolutionary features at the time. Before Look/Search Inside it was impossible to do a simple flip test like you could at a brick-and-mortar store.

Fast-forward to today where we take Look/Search Inside features for granted, so much so that there’s been virtually no innovation on this front. I believe there’s a real opportunity here though to help consumers find what they’re looking for as well as significantly improve the overall content discovery and evaluation process.

Let’s start with a simple question: Why are Search and Look Inside both limited to individual books? What if my first problem is to figure out which book has the most in-depth coverage of topic xyz? Let’s say I want to do some research on the Pittsburgh Pirates, specifically looking for coverage of a former player named Dave Parker. How do I find the book with the most in-depth coverage of Parker?

The typical approach is to search on Amazon. The search results there are initially sorted by relevance and you might think that’s the end of the story. But all Amazon is really doing is searching the metadata associated with each book; they’re not searching the actual contents of the books to push titles with higher relevance to the top of the results. That means books with that name or phrase in the title often get pushed to the top.

Take a closer look at those search results and you’ll quickly appreciate just how ineffective the current Amazon solution is. You’ll need to skip past the first four results as they’re not books at all; I requested “books” only but the results reflect the challenges Amazon has with internal product types and definitions. Those are followed by a couple of titles that have nothing to do with Dave Parker the former baseball player but they happen to be authored by another guy named Dave Parker. This shows how much Amazon’s search prioritizes a book’s metadata; there are probably very few references to “Dave Parker” inside those books but these titles float toward the top of the results simply because of the author name. Next is a book about Dave Winfield, another former baseball player, which looks promising. The problem here is that it made it to the first page of results because the book’s co-author is Tom Parker, so when Amazon sees “Dave Winfield” and “Tom Parker” next to each other it thinks there’s a hit because of the former’s first name plus the latter’s last name. Ugh.

At this point you might think the solution is to go to Google Book Search. Take a look at Google's results and I think you’ll agree I’m no closer to finding the right book than I was at the start. To be fair, Google Book Search is a better solution than Amazon’s search but there are still some enormous holes. For example, although Google’s service is searching the book contents it’s still highly biased by the metadata. Just look at the author names of the first several titles in those search results and you’ll see what I mean. Also, Google is severely limited because their solution is tightly connected to their book preview service. That means Google will only show you some of the pages with hits, hiding many others and then completely cutting off your view once you reach a certain threshold.

What we really need is something like Google Book Search across an entire library, with full visibility into all the content, featuring an algorithm that’s smart enough to focus on true relevance and isn’t thrown off simply by metadata. The results would show two or three lines of the text surrounding each hit so the reader can appreciate the context throughout.

This uber-search would be powerful for some types of books and totally useless for others. For example, there’s absolutely no need for it in the fiction space but think about how useful it would be in non-fiction areas like business, science, technology, biography, cooking, etc. I see this as a service a publisher could place on their website, dramatically improving the current metadata-only search results you typically find.

In fact, this uber-search vision is a service my OSV colleagues and I are currently exploring with a third-party developer. Before we get too far along with it we wanted to describe it for the publishing community to see if anyone knows of a better solution that already exists. We haven’t found one yet but as we roll it out we’ll be sure to describe the process here so other publishers can learn from our experience and potentially embrace our solution as well.

4 Comments

View Comments

Joe Wikert Author's page Joe Wikert is Publishing President at Our Sunday Visitor (www.osv.com). Before joining OSV Joe was Director of Strategy and Business Development at Olive Software. Prior to Olive Software he was General Manager, Publisher, & Chair of the Tools of Change (TOC) conference at O’Reilly Media, Inc., where he managed each of the editorial groups at O’Reilly as well as the Microsoft Press team and the retail sales organization. Before joining O’Reilly Joe was Vice President and Executive Publisher at John Wiley & Sons, Inc., in their P/T division.