Follow Datanami:
April 30, 2012

Inside Europe’s “Super Data Cluster”

Datanami Staff

Climate and atmospheric data is the key to solving complex global warming mysteries, understanding weather and global climate trends, and interpreting atmospheric events and their implications.

As the number of data sources grow in number and complexity, new data-intensive systems are required to make best use of the ever-increasing information.

Not long ago the “JASMIN super-data-cluster” in Europe emerged to manage the flood of climate and atmospheric data and aid research. The super-data-cluster will be used by researchers studying climate change and processing satellite data.

The researchers behind the “super-data-cluster” describe JASMIN (short for the painfully unwieldy title ‘Joint Analysis System Meeting Infastructure Needs’) describe the resource as a “petascale fast disk connected via low latency networks to significant amounts of data-analysis compute.”

It has been purpose-built to meet the needs of a distributed research team in the data-heavy climate research space—one that requires “flexible access to high volume and complex data as well as processing services for the climate and earth observation communities.”

Next — Climate’s Big Data Environment >>>

 

Inside Climate’s Data-Intensive Enviornment

JASMIN consists of one core system—the JASMIN super-data-cluster—and three satellite systems at Bristol, Leeds and Reading universities. Each of the satellite systems consists of significant disk (150, 100 and 500 TB respectively) and compute resources.

SGI and Panasas were at the heart of the installation of the super-cluster, which included resources from the Climate and Environmental Monitoring from Space (CEMS) project.

The CEMS project is part of a larger organization under the International Space Innovation Center (ISIC) Facility’s effort to increase collaboration between industry, academia and government in order to actively promote ways in which other market sectors can use space-derived data and technologies to develop new products and services to enhance their businesses.

The CEMS service has already enabled four consortia to win funding for ‘fast track’ projects from the recent ‘Space for Growth’ competition within the National Space Technology program.

. The JASMIN+CEMS cluster combines two machines into a single £4.5 million, 10 ton hardware system (75% JASMIN, 25% CEMS).

JASMIN+CEMS will replace the existing computing resources and NFS storage for 1.2 Petabytes of data in the Centre for Environmental Data Archival (CEDA), allowing data to be managed with fewer people and provide faster access to data, as part of a single efficient scientific data center. It will also support efficient data analysis by the UK and European climate and earth system science communities, basic processing of earth observation data, and new ways of supporting flexible access to high volume and complex climate and earth observation data.

The JASMIN+CEMS cluster includes 4.6 Petabytes of usable fast access Panasas parallel file storage. The important aspects of the data storage design are the 1 Tb/s aggregate bandwidth from data to processors which supports the processing of very large data volumes, and the lower total cost of ownership than competing solutions due to less need for manual intervention by operators to manage and expand the system.

NEXT — Behind the Big Climate’s Hardware >>>

 

Hardware for “Big Climate” Research

 The 1133 data blades constitute the second largest configuration that Panasas have provided to a single installation.

On that note, the largest configuration was of 1166 blades for the Roadrunner supercomputer at the Los Alamos National Laboratory in New Mexico, USA, which headed the list of the world’s top 500 computers in 2008. However, since storage technology has moved on in the last four years, JASMIN+CEMS has a larger total storage capacity than Roadrunner had then.

For processing, the JASMIN+CEMS cluster also includes 27*12 core and 1*48 core processors. The processors and data storage are configured for a combination of data serving and number crunching which can be switched from one to the other depending on demand.

The funding for both JASMIN+CEMS comes to STFC from the UK government investment of £145 million in e-infrastructure announced in 03 October 2011 by the Department for Business, Innovation and Skills – but JASMIN’s funding comes via NERC, and CEMS via ISIC and the UK Space Agency.

*Thanks to this slideshow for the charts*

Related Stories

7 Big Winners in the U.S. Big Data Drive

Supercomputing Center Set to Become Big Data Hub

SSDs and the New Scientific Revolution

A Floating Solution for Data-Intensive Science

Datanami