Follow Datanami:
December 7, 2011

Pervasive Lends Supercomputing Center Analytical Might

Datanami Staff

Using supercomputing resources to solve computationally and data-intensive problems in fields like astronomy and cosmology is nothing new. However, with an increase in both data sizes and application complexity, even the top supercomputing dogs are hitting a few walls.

This problem of big data in cosmology, even with the powerful Longhorn Cluster at the Texas Advanced Computing Center (TACC) at one’s disposal, presented researchers seeking clues about the big bang theory with some serious challenges on all fronts—programming, crunching, visualizing and beyond.

According to Paul Navratil, a research associate at TACC and collaborator on a recent project to run a massive dark matter simulation that traces billions of particles through time and evolution, “The massive amount of data produced by these simulations overwhelms traditional three-dimensional visualizations.”

To overcome these challenges, the researchers looked to the expertise within their supercomputer center, local university, and to Pervasive Software to attempt to resolve these barriers. In addition to providing a solid history of analytics software and services, the 20 year old company is located within a stone’s throw from TACC and has been consistently involved with a number of University of Texas Austin projects, including a most interesting research case involving Netflix data.

Pervasive Software, TACC researchers and those from the nearby University of Texas at Austin helped the research team develop a new way of data mining scientific simulations by leveraging MapReduce and Hadoop to split the problem across the many nodes available on the Longhorn cluster. According to Aaron Dubrow from TACC, when adapted to their problem “the researchers developed a search mechanism by which they could identify regions of interest in the midst of chaotic visualizations.”

The team was able to target the dark matter “halos” at the center of their research question using he Longhorn visualization cluster at TACC, configured with Hadoop storage drives, the halo project programmatically identified the location and size of dense particle regions that indicate galaxy locations in the simulation.  These locations were then used to guide visualizations of the halo regions.

As Dubrow noted this week, facilities like TACC are finding it necessary to dedicate expert staff and systems “to explore data-driven science with the goal of finding needles of insight in digital haystacks.” He says that “in traditional computational fields like astrophysics, as well as in emerging applications like smart grids and genomics, early projects are showing the benefits of using advanced computing to find meaning in massive datasets.”