Follow Datanami:
July 21, 2015

EU Project Looks to Scale Earth Data

George Leopold

Planet Earth is the ultimate source of big data, from weather to natural resources to the activities of the 7 billion or so human beings wondering about the surface of our precious blue home.

A European Union big data project called EarthServer was launched in 2011 to establish a standards-based analytics framework for Earth science, or geospatial, data. The goal was to apply an integrated query language to crunch geospatial data, with data volumes scaling to exabyte levels.

During the first phase, project scientists demonstrated data portals with more than 230 terabytes of “spatio-temporal” data.

Now, EarthServer is launching a second phase extending through April 2018 that will seek to scale the initiative to 1-petabyte datacenters handling 3-D and 4-D data cubes. The data cube approach, which involves multidimensional array of values, is premised on the idea that a data “cube is worth more than a million images.”

“This power of data handling will be wrapped into direct visual interaction based on multidimensional visualization techniques,” the group emphasizes. A key dataset to be leveraged by EarthServer in phase two is NASA’s World Wind 3-D engine that allows users to zoom in from satellite altitude to any spot on Earth. The tool combines high-resolution LandSat imagery with elevation data to create 3-D visualizations.

By combining “agile analytics” with petabyte-size data cubes, the initiative expects to create a commodity “Big Earth Data service” that would reduce big data files to a “few whatever-size data cubes.”

EarthServer consortium members include Jacobs University, rasdaman GmbH of Germany, Plymouth Marine Laboratory in the U.K., the European Center for Medium-Range Weather Forecasts, MEEO of Italy and CITE S.A. of Greece. NASA and Australia’s National Computational Infrastructure also participate in the EarthServer project.

Jacobs University of Bremen, Germany, coordinates the EarthServer project. The group is working with international standards groups to promote analytics technology that could enable real-time scaling of petabyte-sized data cubes.

Perhaps nowhere else are data volumes exploding as fast as in the Earth sciences. Hence, researchers at Jacobs University proposed a standard last year that would extend the SQL database language to accommodate multidimensional data cubes. The proposed spec was submitted to the International Organization for Standardization (ISO) after the German researchers determined that SQL is unable to find, filter and process multidimensional arrays. Hence, the arrays must be maintained on separate databases.

Proponents of the new standard, dubbed SQL/MDA, for “multidimensional array,” said it is needed because big data in science and engineering is structured differently from, for example, business data often structured as simple tables. Multidimensional big data includes sensor outputs, imagery, simulations and statistical data.

The new ISO spec for extending SQL designed to incorporate multidimensional arrays is currently under development.

Recent items:

Big Data Forcing Update of SQL Standard

Making Hadoop Into a Real-Time SQL Database