May 8, 2013

Bringing Dark Data to Light

Isaac Lopez

One of the chief challenges of big data, especially where established science and research is concerned, is bringing what is called “dark data” – the data that is hidden in tables and texts of the massive stores of scientific journals – to light.  A research team is working on this problem combining several research areas into a potentially powerful tool.

The tool is called GeoDeepDive, and is being put together by researchers from different areas including the Condor High Throughput Computing Group, the Hazy research group at the University of Wisconsin-Madison, and the group that have put together a geological utility called Macrostrat, led by Shanan Peters.

Using the Macrostrat graphical interface, which contains over 33,000 rock formations, the researcher have applied the Hazy group’s machine reading system to pour through 36,000 geological research documents, cataloging over 500,000 mentions of formation units. These units are broken down into individual bits of reference data and associated with the respective formations on the Macrostrat map. Combined, this tool becomes GeoDeepDive, where researchers can conduct research across space and time without having to hunt through the tomes of journal data.

The result is a research index that is geographically searchable through a map GUI that brings “dark data” to light. The GeoDeepDive implementation is specific to geologic research, however, the implications of the technology mash-up extend geoscience, as witnessed by the fact that the project is sponsored by such groups as DARPA, Google, Microsoft, the NSF, Raytheon, and American Family Insurance.


