Follow Datanami:
October 5, 2012

Perseus Gives Big Humanities Data Wings

Ian Armas Foster

“How do we think about the human record when our brains are not capable of processing all the data in isolation?” asked Professor Gregory Crane of students in a lecture hall at the University of Kansas.

But when he posed this question, Crane wasn’t referencing modern big data to a bunch of computer science majors. Rather, he was discussing data from ancient texts with a group of those studying the humanities (and one computer science major).

Crane, a professor of classics, adjunct professor of computer science, and chair of Technology and Entrepreneurship at Tufts University, spoke about the efforts of the Perseus Project, a project whose goals include storing and analyzing ancient texts with an eye toward building a global humanities model.

The next step in humanities is to create that Crane calls “a dialogue among civilizations.” With regard to the study of humanities, it is to connect those studying classical Greek with those studying classical Latin, Arabic, and even Chinese. Like physicists want to model the universe, Crane wants to model the progression of intelligence and art on a global scale throughout human history.

“We’ve looked at the light under linguistic lampposts in isolation. But the challenge is seeing this as a whole system of interacting cultures.”

The study of humanities, which includes history, literature, and to a lesser extent sociology, has struggled a tad over the last several years. According to Crane, the current per year National Endowment for the Humanities totals $150 million. By contrast, the National Science Foundation endows over $6.5 billion. Further, Harvard University experienced a 27% growth in STEM majors (Science, Technology, Engineering, and Mathematics) from 2005 to 2010. Put simply, the reality is that the funding, the jobs, and the economic growth lie in science and technology.

As such, a significant shift has to happen in humanities departments worldwide. Those who specialize in something like the study of Classical Greece may be intelligent, but their usefulness is dwindling. Instead, per Crane, cross-field knowledge is becoming more and more valuable, particularly in library professionals. “What defines library professionals is thinking across disciplines.”

To build the quantum computer, arguably the next huge leap in scientific computing, it will take remarkable advancements across the physics and computer science disciplines. Likewise, the next humanities leap will have to involve working across the various classical disciplines. Further, it will take an eschewing of the attitude many humanists have toward science and embracing the present digital age. Crane’s Perseus Project may be a significant means to that end.

In a sentence, the mission statement of the Perseus Project is such: “I’m not interested in the physical books. I’m interested in the words inside the books. I want a catalogue of every word, of every version, of every text ever produced.” Physical books may provide nostalgic value, but the words provide the intellectual value.

Surprisingly, the biggest barrier is not actually the amount of space occupied by the data of the ancient texts, but rather the language barriers. Currently, the Perseus Project covers over a trillion words, but those words are split up into 400 languages. To give a specific example, Crane presented a 12th century Arabic document. It was pristine and easily readable—to anyone who can read ancient Arabic.

It is markedly difficult to teach a computer to understand the semantics of one language, much less 400. Sure, there exist translating platforms but they remain imprecise even when it comes to modern language and do nothing for ancient language. With that being said, analytics systems can still point researchers in a couple of preferred directions. This way, those in the humanities still play an important role in interpretation.

Related Articles

The Evolving Art (and Business) of Data Curation

Humanities Researchers Dig for Data

Yale Computer Scientists to Explore Big Data Developments

Datanami