Follow Datanami:
January 7, 2016

New Databases Help Astronomers Probe the Universe

The observable universe became a bit more manageable this week when international astronomers moved to get their arms around their massive datasets through a partnership with the developers of the integrated Rule-Oriented Data System, or iRODS.

Data scientists who developed the open source iRODS framework said it would provide astronomers studying the origins of our galaxy and mysterious dark matter with a better way to query their huge data volumes, store and retrieve data and metadata as well as transport files and images.

The iRODS framework developed by data scientists at the Renaissance Computing Institute on the campus of the University of North Carolina at Chapel Hill will provide online databases for two astronomical (both in terms of subject matter and sheer volume) datasets: the Resolved Spectroscopy of Local Volume, or RESOLVE survey, and the Environmental Context (ECO) catalog.

Consortium officials said both databases would use iRODS’ data management software to store and retrieve information on galactic observations along with flexible image transport image files and corresponding metadata.

The RESOLVE and ECO databases were unveiled this week at the annual meeting of the American Astronomical Society in Kissimmee, Fla.

The RESOLVE survey combines new optical and radio spectroscopy with archival multi-wavelength photometry used to perform a “census” of gas, dark matter and stars in galaxies, researchers said. The survey spans nearly five orders of magnitude in spatial scale. The ECO catalog is designed to complement RESOLVE by providing a similar, but purely archival census within a much larger volume.

With the new databases, “We went from having raw data with no ability to query it to databases that are searchable, expandable and flexible with version control,” Sheila Kannappan, an associate professor of physics and astronomy at UNC-Chapel Hill and principal investigator for the RESOLVE project, noted in a statement.

The iRODS Consortium supports development of the open source data management software for data discovery, workflow automation, secure collaboration and storage virtualization. The membership organization also provides a production-ready iRODS distribution along with support and integration services. International researchers in life sciences, geosciences, and information management use iRODS to organize their data and make it accessible.

Among the consortium’s corporate members are DataDirect Networks, IBM (NYSE: IBM), EMC Corp. (NYSE: EMC) and Seagate Technology (NASDAQ: STX). Other members include the Atmospheric Science Data Center at NASA’s Langley Research Center and the U.K.-based Wellcome Trust Sanger Institute, which focuses on genome sequencing.

IBM joined the consortium in July 2015. “We see iRODS as a tool that can help our customers take advantage of all that data—80 percent of which is unstructured and omitted from traditional analysis,” Dave Turek, IBM’s vice president of HPC market engagement, noted in a statement announcing the company’s iRODS membership.

The Renaissance Computing Institute and the Data Intensive Cyber Environment Center at UNC-Chapel Hill formed the iRODS consortium in 2013.

Recent items:

DeepSQL Kicks Evolutionary Genetic Research Into High Gear

NIH Effort Looks to Compress Big Genomics Data