July 31, 2014

DARPA Seeks to Leapfrog Big Data with ‘Big Mechanisms’

George Leopold
darpa.png

A Defense Department research program seeks to “leapfrog” advanced big data analytics by developing automated search technologies that could help explain the causes and effects that drive complex systems.

The Defense Advanced Research Projects Agency (DARPA), which is credited with developing the Internet’s basic architecture, launched its “big mechanism” initiative earlier this year that aims to develop automated tools that could uncover causal models hidden in big data.

The classic example of a big mechanism is the 1854 map of London showing the association between a cholera outbreak and a polluted public water pump. This early example of big data has since been swamped by relentless waves of scientific data that make it nearly impossible to bridge the gap between tracking associated data points and discovering the cause-and-effect mechanisms behind big data.

“Having big data about complicated economic, biological, neural and climate systems isn’t the same as understanding the dense webs of causes and effects – what we call ‘big mechanisms’ – in these systems,” DARPA Program Manager Paul Cohen said in launching the research effort in February.

“Unfortunately, what we know about big mechanisms is contained in enormous, fragmentary and sometimes contradictory literatures and databases, so no single human can understand a really complicated system in its entirety,” Cohen added. “So computers must help us.”

DARPA’s Information Innovation Office released a preliminary request for proposals earlier this year to help develop technologies that could be used, for example, to scour research papers to extract details that could eventually be used to explain cause-and-effect relationships.

The DARPA office plans to initially use big mechanism tools to study the complex molecular interactions that cause cells to become cancerous. The proposed methodology includes using computers to scan research papers on cancer biology to extract data on cancer pathways. The data fragments could then be assembled into complete pathways of “unprecedented scale and accuracy,” the agency claimed, to determine how pathways interact.

In the last step, automation tools could help determine causes and effects that could be manipulated to develop potential cancer treatments.

“The language of molecular biology and the cancer literature emphasizes mechanisms,” Cohen said. “Papers describe how proteins affect the expression of other proteins, and how these effects have biological consequences. Computers should be able to identify causes and effects in cancer biology papers.”

More broadly, big mechanism tools could help understand complicated systems while aiding researchers struggling to keep up with a relentless stream of data generated by scientific journals. Researchers who are forced to specialize in narrow areas of science could use big mechanism tools to expand their perspective.

Under a proposed DARPA scheme, scientific journals would become part of a big mechanism database. “Every aspect of a big mechanism would be tied to the data that supports it or contradicts it,” the agency said.

“By emphasizing causal models and explanation, big mechanism may be the future of science,” Cohen asserted.

Since its creation in 1958, DARPA has always tried to think “big.” Earlier this year it launched an effort to take big data analytics to the next level through a “big code” project. That effort focuses on improving overall software reliability through a large-scale repository of software that drives big data.

Related items:

DARPA Launches ‘Big Code’ Initiative

Beyond Big Data: Addressing the Challenges of Big Applications