Too many big data initiatives are science projects that take months of effort, risk failure and require highly trained data scientists with scarce skills. According to a CSC survey, 55 percent of big data projects aren’t completed and many others fall short of their objectives.Read more...
Big Data Forcing Update of SQL Standard
A proposed standard would extend the SQL database language to accommodate multidimensional “data cubes” generated by scientists and engineers. The effort promises to launch a new technology dubbed “array databases.”
Proponents of the new standard, dubbed SQL/MDA, for “multidimensional array,” said it is needed because big data in science and engineering is structured differently from, say, business data often structured as simple tables. Multidimensional big data includes sensor outputs, imagery, simulations and statistical data.
The proposed International Organization for Standards (ISO) spec would allow SQL databases to handle multidimensional data cubes consisting of one-dimensional sensor data, two-dimensional satellite imagery and three-dimensional geophysical data. The proposed standard could even accommodate four-dimensional weather data as well as large astrophysics simulations of the known universe.
Researchers at Jacobs University in Bremen, Germany, determined the SQL is unable to find, filter and process multidimensional arrays. As a result, the arrays are maintained on separate databases.
The group, led by computer science professor Peter Baumann, has been looking for ways to extend SQL databases. It said the solution is “array databases.”
In a recent demonstration, the researchers said more than 1,000 computers were link via the cloud to jointly crunch the result of a single database query. The hope is that “distributed query processing” can be used to parse multi-petabyte data cubes, the researchers said. The approach could help answer science and engineering problems once considered unsolvable due to a lack of database tools.
International datacenters will next use the distributed approach to gain insights into geospatial and temporal data cubes. Rasdaman, or “raster data manager,” database management systems have been installed at NASA, the European Space Agency, the British Geological Society and other research institutions to wring out the system.
As the number of remote and image sensors grows, it’s likely that big data generated by geoscience and engineering research will eventually outgrow the proposed SQL multidimensional array database. For example, NASA is currently planning new missions expected to stream more than 24 terabytes of data a day.
The torrent of science data is growing exponentially as new space communication networks come online. NASA recently demonstrated a laser communications network that could eventually beam real-time data back to Earth.
For now, a NASA official grappling with the agency’s big data challenges warned recently, “We regularly engage in missions where data is continually streaming from spacecraft on Earth and in space, faster than we can store, manage, and interpret it.”
Following a recent meeting in Beijing, an SQL working group within ISO agreed on the importance of revising the SQL database standard to accommodate multidimensional arrays. It accepted a proposal by the German researcher Baumann that will be used forge a new standard. If approved, the standard will be called ISO 9075 SQL/MDA, organizers said.
The current ISO 9075 information technology and SQL database language spec defines the data structures and basic operations of SQL data as well as specifying the syntax and semantics of the database language.