Follow Datanami:
December 8, 2011

Bar Set for Data-Intensive Supercomputing

Nicole Hemsoth

The San Diego Supercomputer Center (SDSC) formally introduced its new data-intensive supercomputer, Gordon, this week. The new machine, built by Appro, will come online in January, allowing researchers the resources they need to tackle some of today’s most data-heavy scientific and technical problems.

According to a statement from the SDSC, Gordon will set to work on “the most vexing data-intensive challenges, from mapping genomes for personalized medicine to rapidly calculating thousands of “what if” scenarios affecting everything from traffic patterns to climate change.”

The key to the big data capabilities of the new supercomputer at SDSC is in the 300 terabytes of flash memory. According to officials from SDSU, flash will be useful in research cases such as the large-scale study of human genomes. Since the super can hold onto over 100,000 entire human genomes within the flash system, this makes it well-poised to meet the growing demand in this area since there are not even close to 100,000 genomes even available yet.

SDSU claims that Gordon (which is also listed within the top 50 fastest systems according to the Top500 benchmark) will be able to tackle huge datasets up to 100 times faster than standard hard disk systems, at least for some queries. With 4 petabytes of disk storage, 64 TB of RAM, and 280 terflops, it’s easy to see why the center is touting this as a data-intensive powerhouse. The recent validation of these abilities yielded an unheard of 36 million IOPS.

As Michael Feldman over at HPCwire reported this week, there is another feature that makes this system noteworthy. As he points out, shared memory capabilities, as added by ScaleMP and its vSMP technology, send this super over the top in terms of big data performance. As Feldman writes, this “allows users to run large-memory applications on what they call a “supernode” — an aggregation of 32 Gordon servers and two I/O servers, providing access to 512 cores, 2 TB of RAM and 9.6 TB of flash. To a program running on a supernode, the hardware behaves as a big cache coherent server. As many as 32 of these supernodes can be carved from the machine at one time. According to ScaleMP founder and CEO Shai Fultheim, Gordon is the largest system in the world that is deployed with its technology.”

To put the concept of flash memory at this scale in context, the center pointed to the hypothetical example of seeing Gordon as the world’s largest thumb drive, with the capability to ingest about 200 movies per second from Netflix, or consume the entire catalog of about 100,000 Netflix movies (with room for another 200,000 titles).

In the words of Michael Norman, director of SDSC, the induction of Gordon into the ranks of America’s elite supercomputers marks the beginning of “the era of data intensive supercomputing.” He says that at their center, the amount of data being generated has doubled, leaving them overwhelmed and in need of the $20 million investment.

Related Stories

Interview: Cray CEO Sees Big Future in Big Data

Pervasive Lends Supercomputing Center Analytical Might

Live from SC11: Data Intensive System Showdown

Q&A: Appro Gets Ready for 16-core AMD “Interlagos” Processors

Datanami