We know that the era of "big data" has already fomented great change in book publishing. But it's also making waves in book scholarship. Academics are exploring new and fascinating ways of analyzing literature not as specific works but as corpora: huge bodies of works spanning decades and even centuries.
In his new book, Macroanalysis: Digital Methods & Literary History (University of Illinois Press), Matthew L. Jockers, a University of Nebraska-Lincoln Assistant Professor of English, takes readers into what he modestly calls "this thing I'm doing." "To call it a field is perhaps premature," he says.
Key to understanding macroanalysis is noting the difference between close reading and distant reading. Close reading is the careful study of a single work, Moby Dick, perhaps, while distant reading is more of an aggregate survey of all the text in all the books written in, say, the 19th Century, or in 19th Century Ireland, or in 19th Century Ireland by women. "The primary goal of my work is to study literature in a much larger context than we've been accustomed to doing," says Jockers, "to get away from the study of landmark texts and look at the very big picture."
Following in the scientific — and not uncontroversial — spirit of his colleague, Italian literary scholar Franco Moretti, Jockers' work has focused on the corpus of 19th Century literature available for digital analysis because, at present, 20th Century works are fraught with copyright headaches, and 18th Century work, which is often degraded and published with different font conventions, has a tendency to confuse optical character recognition (OCR) technology.
Scholars like Jockers and Moretti hope, through use of their distant-reading methodologies, to puzzle out how elements such as style and theme evolved over time. Since the available 19th Century corpus is far from complete — as what survives from that era hardly represents all that was written — "I come to the best conclusions that I can derive given the material that I have," says Jockers.
As a scholar of Irish literature, Jockers has a particular interest in how nationality and place influence theme and attitude. He can, for example, "look at the way that Ireland gets expressed in literature. And I can break that down further to how a male Irish author writes about Ireland compared to the way a female author writes about Ireland. … [Or] when writers have set their fiction in Rome and they're talking bout Catholicism, what is the attitude being expressed toward that, and how does that change when the setting is no longer Rome but Dublin, Ireland?"
He's adamant that macroanalysis does not endeavor, nor is it equipped, to identify quality in a work, only trends and similarity, such as the ways that themes and subject matter, whaling for instance, are propagated through the corpora. "Moby Dick has a lot in common with Edgar Allan Poe's The Narrative of Arthur Gordon Pym of Nantucket," says Jockers, for instance. "It seems likely that Melville borrowed some of his material from Poe."
Jockers is hopeful that a new partnership with Book Lamp, home of the Book Genome Project (think Pandora.com), will unlock the 20th Century corpus, and that work being done at the Stanford Literary Lab to make the 18th Century corpus available will enable him to extend the trend arcs that he has identified in the 19th Century.
"What if we could write an algorithm that could ID the major characters in a book?" asks Jockers almost giddily. "How many are there in 1809 on average vs. 1904? How are they portrayed? Is there a relationship between male and female characters and the author's gender?" The possibilities seem endless.