Live from SC11: Data Intensive System Showdown
This week at the annual supercomputing conference (SC11) the focus wasn’t just on the famous Top500 list, which ranks the world’s fastest supercomputers. A great deal of buzz revolved around another benchmark that is landing on the radar—one that puts the performance of data-intensive systems to the test.
While the list of the top data-intensive systems isn’t quite approaching the 500 target in its name, there were more systems in the running this year. For instance, at last year’s announcement of the top systems, only 39 entries were gathered. This year the benchmark collected several more participants, bringing the total number of systems in the running up to 50.
While that’s not a huge number (not to mention that some of the true “big data” systems that would perform on par on this benchmark are tucked away in enterprise secrecy) there is clear value behind offering a way to compare HPC systems for their performance on data-intensive problems.
Unlike the metrics used for a list like the Top500, which measures for maximum floating point operations per second, this is specifically suited to the needs of graph algorithms which are an important element of many deep analytics-based problems.
The top performers are in the list below (with more details at Graph500.org). As side note, the United States was well represented in the benchmark, comprising 74% of all entries. Other countries had quite a showing as well, including Japan and Russia. IBM walked away with the most systems at the top, with other vendors, including Russian HPC vendor, T-Platforms, had quite a showing.
NNSA/SC Blue Gene/Q Prototype II (4096 nodes / 65,536 cores )
Lomonosov (4096 nodes / 32,768 cores)
TSUBAME (2732 processors / 1366 nodes / 16,392 CPU cores)
Jugene (65,536 nodes)
Intrepid (32,768 nodes / 131,072 cores)
Endeavor (Westmere) (320 nodes / 640 sockets / 3840 cores)
IBM BlueGene/Q (512 nodes)
Hopper (1800 nodes / 43,200 cores)
Franklin (4000 nodes / 16,000 cores) 10 ultraviolet (4 processors / 32 cores)
The Blue Gene Q at Lawrence Livermore National Laboratory tends to ranks high on a slew of other lists. Some benchmarking efforts, including the Green500, put it at the top of the pile for energy efficiency and of course, in addition to data-intensive prowess, this is one powerful supercomputer in terms of overall horsepower.
Far from just offering national labs and vendors a way to showcase their dual expertise in both compute and data intensive machines, the organizers behind Graph500.org say that this will help guide the design of hardware architectures and the software approaches that handle data-intensive applications and to better equip buyers with a sense of graph performance across vendor choices. While the organizers also admit that there are alterations that need to be made to the existing benchmark based on the feedback and experience since the initial list last year at ISC ’10, they expect to have a more robust ranking process to add validity to the rankings.
Among the improvements to the original benchmark are elements that address some of the common complaints about the limitation of the test. For instance, the team is working to address three application kernels in the next incarnation of the list, including concurrent search, optimization via focusing on a single shortest path approach, and making the benchmark edge-oriented, meaning associated with the maximal independent set.
Of even greater interest is the way they hope to tailor the benchmark to directly address specific verticals, some of which fall a bit outside of what one might call HPC. These vertical focus areas include cybersecurity, medical informatics, data enrichment, social networks and symbolic networks.
For those who wish to keep an eye on these developments, we’ll check back again to see which systems and vendors respond to the updated benchmark for the next list in June of 2012.