Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan


September 12, 2012

Pushing Parallel Barriers Skyward


As much data as there exists on the planet Earth, the stars and the planets that surround them contain astronomically more. As we discussed earlier, Peter Nugent and the Palomar Transient Factory are using a form of parallel processing to identify astronomical phenomena.

Some researchers believe that parallel processing will not be enough to meet the huge data requirements of future massive-scale astronomical surveys. Specifically, several researchers from the Korea Institute of Science and Technology Information including Jaegyoon Hahm along with Yongsei University’s Yong-Ik Byun and the University of Michigan’s Min-Su Shin wrote a paper indicating that the future of astronomical big data research is brighter with cloud computing than parallel processing.

Parallel processing is holding its own at the moment. However, when these sky-mapping and phenomena-chasing projects grow significantly more ambitious by the year 2020, parallel processing will have no hope.

How ambitious are these future projects? According to the paper, the Large Synoptic Survey Telescope (LSST) will generate 75 petabytes of raw plus catalogued data for its ten years of operation, or about 20 terabytes a night. That pales in comparison to the Square Kilometer Array, which is projected to archive in one year 250 times the amount of information that exists on the planet today.

“The total data volume after processing (the LSST) will be several hundred PB, processed using 150 TFlops of computing power. Square Kilometer Array (SKA), which will be the largest in the world radio telescope in 2020, is projected to generate 10-100PB raw data per hour and archive data up to 1EB every year.”

It may seem slightly absurd from a computing standpoint to plan for a project that does not start for another eight years. Eight years ago, the telecommunications world was still a couple of years away from the smartphone. Now the smartphones talk to us. The big data universe grows even faster, possibly as fast as the actual universe.

It is never a bad idea to identify possible paths to future success. Eight years from now, quantum computing may come around and knock all of these processing methods out of the big data arena. However, if that does not happen, cloud computing could potentially advance to the point where it can support these galactic ambitions.

“We implement virtual infrastructure service,” wrote Hahm et al in explaining their cloud’s test infrastructure, “on a commodity computing cluster using OpenNebula, a well-known open source virtualization tool offering basic functionalities to have IaaS cloud. We design and implement the virtual cluster service on top of OpenNebula to provide various virtual cluster instances for large data analysis applications.”

According to Hahm et al, the advantage essentially comes from using computing power from a cloud to act as one large computing entity, as opposed to carefully splitting up the task over parallel threads. It is akin to taking an integral of a function over the limits of integration as opposed to individually counting up all of the slices made up of height times little change in x.

“This massive data analysis application requires many computing time to process about 16 million data files. Because the application is a typical high throughput computing job, in which one program code processes all the files independently, it can gain great scalability from distributed computing environment. This is the great advantage when it comes with cloud computing, which can provide large number of independent computing servers to the application.”

To test this, the group analyzed data from SuperWASP, an England-based astronomical project with observatories in Spain and South Africa. Specifically, they examined 16 million light curves, which are designed to locate extra-solar planetoids based on differences in light emanated from the potential planet’s host star. According to Hahm et al, “In this experiment we can learn that the larger and less input data files are more efficient than many small files when we design the analysis on large volume of data.”

Looking at the graphs, the only significant difference that arises is when the merged big files match up with the many small files in terms of system CPU time. However, there still exists an advantage for cloud computing in terms of user CPU time and ‘wall-clock time,’ however those differences seem to be small enough such that cloud computing may not be the significant improvement Hahm et all hope it is.

“With the successful result of whole SuperWASP,” Hahm et al concludes, “data analysis on cloud computing, we conclude that data-intensive sciences having trouble with large data problem can take great advantages from cloud computing.” Perhaps cloud computing has the advantage over the petabyte scale. But it seems likely that something completely different will have to be developed between now and 2020 before an Exabyte can be processed in a year.

Related Stories

A Big Data Revolution in Astrophysics

World's Top Data-Intensive Systems Unveiled

NASA Resource Brings Big Science Data Home

Supercomputing Center Set to Become Big Data Hub

Astronomers Leverage "Unprecedented" Data Set

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

ISC'14

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia



Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event