No stranger to massive data in the HPC context, Cray points to one area that it’s had its paws in for a number of years. Climate data has always presented data processing and movement challenges, and according to the supercomputing company, the data sets used for international climate projects are “almost 70 times larger today than they were less than 20 years ago. “This constant expansion is creating storage, archival and sharing challenges for the data collectors.
Cray refers to this as “Big IO,” and believes that the work that’s being done now at the petascale level will have lasting impacts in science and business, especially where real time data collection and processing is a requirement. Cray indicates that they are particularly proud of the work they have been doing with the Korean Meteorological Administration (KMA).
“Big data” is nothing new to the KMA, who has been using supercomputing tools and sensor data for decades to do everything from weather modeling, to climate prediction, air pollution monitoring, and earthquake and marine meteorology for decades. In 2005, Cray and the KMA joined forces to launch the Earth Systems Research Center (ESRC) aimed at maximizing the utility of advanced high performance computing facilities. Since its inception, the partnership boasts it’s involvement in funding with 23 projects with the Korean academic community in such areas as severe weather and high resolution global modeling.
“Each operating [weather prediction and climate modeling] center usually has large sets of data (terabytes and petabytes) that need to move fast, both in terms of IO and throughput – and reliability,” says Cray in discussing their ideas on “Big IO.” Cray says that Big IO solutions are needed where real time analysis is needed.
“These efforts are increasingly necessary as the global climate changes and many government bodies begin to search for ways to adapt to the changing weather conditions,” says Cray. Real time weather modeling is both data and processor intensive with variable levels of terabytes to petabytes having to move in and out of the processor quickly in order to create the model.
But many enterprises jumping into big data today don’t really need the intense and persistent processing power of a weather modeler, right? That may be short sighted, suggest some data scientists. We may soon see a day when enterprises are using the power of automated data driven decision-making to help companies model their own business climates with machines making automated decisions based on the data.
In the meantime, Cray has jumped on the Hadoop bandwagon, announcing that its Xtreme line of supercomputers will include the Intel distribution for Apache Hadoop software, joining the company’s other big data solutions, including the Cray Sonexion storage system and YarcData’s uRiKa appliance for graph analytics.