World’s Top Data-Intensive Systems Unveiled
This week the top data-intensive systems were called out at the International Supercomputing Conference ISC), which just rounded out another stellar year in Germany.
The list of the top machines for crunching big data problems in scientific and technical computing application areas, called the Graph 500, is updated twice per year; once at ISC and again at the annual Supercomputing Conference, which will take place in Salt Lake City for 2012.
The list got its start at ISC in 2010 as a counterpart to the speed-driven results of the Top500. The Graph 500 benchmark evaluates machine performance against data-intensive analytics applications and evaluates the machine’s communications capabilities and computational power.
The challenge draws its name from the root of the problems it solves, which are called “graph-type” calculations—algorithms that are a core part of many analytics workloads in applications, such as those for cyber security, medical informatics, and data enrichment.
As officials from the Graph 500 team, which is backed by over 50 HPC experts from around the world, Data intensive supercomputer applications are increasingly important for HPC workloads, but are ill-suited for platforms designed for 3D physics simulations.
They claim that “current benchmarks and performance metrics do not provide useful information on the suitability of supercomputing systems for data intensive applications. A new set of benchmarks is needed in order to guide the design of hardware architectures and software systems intended to support such applications and to help procurements. Graph algorithms are a core part of many analytics workloads.”
Given its roots in the HPC community and relationship to the Top500 list, the list does not feature enterprise systems and has several similarities to the performance-driven list as many supercomputers designed for speed are also capable of some impressive graph crunching.
The top two data-intensive systems tied for first place and were definite front runners ahead of the other systems on the list. While the Graph 500 is still in its infancy (when compared to the legacy the Top500 list has), one can expect more teams to submit their systems for consideration against the still-evolving Graph 500 benchmark.
Let’s get started with the top two systems that tied for the claim to the most powerful data-intensive systems on the planet:
Next — And the Winners Are… >>>
#1 (Tie) Mira (Argonne National Lab)
In addition to coming in at number three on the Top500 supercomputer list this year, the Mira super also took the top slot on the Graph 500.
The Graph 500’s goal it to see how well a machine can deliver very high performance to dense linear algebra (compute-intensive floating-point) calculations. Argonne says that Mira was designed principally to deliver very high performance to such calculations since they are highly correlated to science and engineering applications.
Computers are routinely used to solve small graph problems (mapping optimal routes for a fleet of delivery trucks, for instance). Graph problems can become intractable, however, as they are scaled up to ever larger datasets (the difference between estimating the effect of climate on the economy of cities in a small region versus that of the entire country).
“Having machines that do well on these sorts of calculations will allow this very useful category of computing techniques to be applied to ever more areas,” said Kalyan Kumaran, who manages ALCF’s applications performance engineering team. “The range of problems that Mira can tackle is much wider than large-scale scientific simulation.”
Located at the Argonne Leadership Computing Facility (ACLF), the IBM BlueGene/Q system is capable of 10 quadrillion calculations per second. Argonne says its goal is to have Mira handle over 5 billion compute hours per year when the system is in full production.
Vesta, Mira’s testing and development rack, placed 6th on the Graph 500 list. Mira’s predecessor at Argonne, an IBM Blue Gene/P, placed 16th.
Next — Meanwhile, at LLNL….>>>
#2 (Tie) Sequoia (Lawrence Livermore National Lab)
The big news this week around Sequoia is that it is now the world’s fastest supercomputer, coming in at the head of the Top500 pack. However, LLNL stresses that it’s not just built for speed but for real-world data-intensive applications.
Mira, located at the Argonne Leadership Computing Facility (ALCF), and Sequoia, located at Lawrence Livermore National Laboratory, each achieved a score of over 3,500 GTEPS (giga traversed edges per second).
LLNL had multiple entires on this year’s Graph500 list, including several entries that used SSD Storage arrays to hold the graphs. The system Leviathan, with a single 40-core node, 1TB of memory, and 12TB of Flash storage is another of the lab’s star big data supercomputers.
NNSA Administrator Thomas D’Agostino told the Lawrence Livermore Indepdent that, “While Sequoia may be the fastest, the underlying computing capabilities it provides give us increased confidence in the nation’s nuclear deterrent as the weapons stockpile changes under treaty agreements, a critical part of President Obama’s nuclear security agenda. Sequoia also represents continued American leadership in high performance computing, key to the technology innovation that drives high-quality jobs and economic prosperity.”
Sequoia will provide a more complete understanding of weapons performance, notably hydrodynamics and properties of materials at extreme pressures and temperatures. In particular, the system will enable suites of highly resolved uncertainty quantification calculations to support the effort to extend the life of aging weapons systems.
#2 Darpa Trial Subset with IBM Development Engineering
Far below the other two systems in terms of system size and performance (compare 32, 768 nodes to just 1024) a DARPA System that has been designed in conjunction with IBM Development Engineering has come in second, just under the two tied systems at the national laboratories.
We were not able to uncover much about this system, other than to discover that it’s Power 775, Power7 8C that clocks in around 3.836 gHz.
IBM Development Engineering had a number of other systems that were submitted for Top500 consideration this year, including two iDataPLex systems, one that ranked at 175 (with 10128 cores) and another that ranked at 213 with 7248 cores.
While details are thin about this big data system, one can only imagine the types of data-intensive military and defense applications the IBM machine has been designed to crunch. One high-ranking system we do have details on, however is the one that hit the #3 slot—this one outside of U.S. borders…
#3 Oakleaf (University of Tokyo)
Fujitsu’s Oakleaf-FX system at the Information Technology Center at the University of Tokyo has been deemed one of the most capable data-intensive machines.
The Oakleaf-FX is one of the company’s PRIMEHPC FX10 supercomputers, which was designed for peak floating-point performance as well as energy efficiency with a reported 1.4 MW consumption for the entire system.
The university is using the supercomputer for a number of data-intensive science research projects, including those in areas as diverse as biology, materials science and astrophysics. The university said last year that it selected the supercomputer for its compatibility with the K supercomputer from Fujitsu (one of which is in the #2 slot on the Top500 list).
“The PRIMEHPC FX10 supercomputer system (Oakleaf-FX) will contribute to advancements in various types of research and development activities by users from both academia and industry,” Kengo Nakajima, Ph.D., Director of the Supercomputing Division at the IT Center at the University of Tokyo told ServerWatch. “Oakleaf-FX will be used for the HPC education program in the Graduate School of the University of Tokyo for future computational scientists. Oakleaf-FX will be operated so that priority is given to larger-scale jobs.”
Tokyo is also home to the fourth on the Graph 500 list with the HP Cluster platform at the GSIC Center at the Tokyo Institute of technology. This system is a data crunching powerhouse that gets a boost from three Tesla cards for each of the 1366 nodes.
To see more of the list, including the top ten, which includes more BlueGenes and supercomputing powerhouses at Argonne, Brookhaven and Lawrence Berkley National Lab, pop on over to the main listing.