Understanding Data Intensive Analysis on Large-Scale HPC Compute Systems
Data intensive computing is an important and growing sector of scientific and commercial computing and places unique demands on computer architectures. While the demands are continuing to grow, most of present systems, and even planned future systems might not meet these computing needs very effectively. The majority of the world’s most powerful supercomputers are designed for running numerically intensive applications that can make efficient use of distributed memory. There are a number of factors that limit the utility of these systems for knowledge discovery and data mining.
Competitive pressures, exploding data and the ever-increasing need to integrate large-scale data often leave supercomputing research centers with few choices. The choice is not, if and when to build large data-intensive applications, but how quickly to have them available. Building them to accommodate exponentially growing data with a finite amount of time requires accelerated research analysis to meet the deadlines; a difficult situation even for leading supercomputing centers.
Appro, Intel and SDSC – Joined forces to build a Major Data Intensive Supercomputer
Appro, Intel and San Diego Supercomputing Center (SDSC) at the University of California worked very close together for three years (2009-2011) to build a major data intensive supercomputing design. This enabled Appro to skip an Intel® processor generation and move to the next-generation architecture, Appro Xtreme-X™ Data Intensive Supercomputer, called “Gordon” by SDSC. This partnership offered early access to future technology roadmaps such as processors, flash memory, interconnect network and system configuration planning that contributed to Gordon Supercomputer design innovations and expertise that was used in advance of the system being deployed. This early preparation resulted in a grant from the National Science Foundation (NSF) to allow this system to be built in advance and be available to offer a powerful supercomputer resource dedicated to solving critical science and societal problems using forward-looking HPC technology.
Among features such as reliability, availability, manageability and system configuration compatibility that were essential to build this successful data intensive supercomputer, this unique and innovative supercomputer also employs a vast amount of flash memory to help speed solutions now hamstrung by slower spinning disk technology. Also, new “supernodes” exploit virtual shared-memory software to create large shared-memory systems that reduce solution times and yield results for applications that now tax even the most advanced supercomputers.
Gordon supercomputer is available today and is able to deliver over 200 TFlops of peak performance based on the latest Intel™ Xeon™ processor E5 product family and achieving up to 35M IOPS from 300TB of Intel® Solid-State Drive 710 Series Storage. IOPS is an important measure for data intensive computing since it indicates the ability of a storage system to perform I/O operations on small transfer sizes of randomly organized data – something prevalent in database and data mining applications. Now, scientific applications can benefit of the fast interaction and manipulation of large volumes of structured data. The Gordon system and its smaller prototype, Dash, were specifically designed to handle these types of data intensive problems. Their unique architectural features bridge the “missing link” in the memory hierarchy and address the needs of an emerging class of applications whose working sets approach 1 Petabyte in size.
SDSC is proud to share that Gordon made its debut as the 48th fastest supercomputer in the world last November 2011, according to the Top500 list. Gordon’s Top500 result is notable in that the Tflop/s ranking that was achieved using about half the number of cores compared to most other systems. That’s because Gordon is among the first systems – and the first one commissioned by the NSF – to use Intel® Xeon® processor E5 Family, which perform twice as many operations per clock (eight versus four) of any system currently in use.
Gordon system is composed of 32 supernodes, each consisting of 32 compute nodes and two I/O nodes. A compute node contains 64 GB of DDR3 memory with two 8-core Intel® Xeon® processor E5 Product Family, with each core capable of eight floating-point operations per cycle. The aggregate performance of Gordon is in excess of 200 TFlops. Groups of 16 compute nodes can access a 4 PB parallel file system through the I/O nodes. The system delivers over 300 trillion bytes of high-performance Intel™ SSD 710 Series, flash memory solid state drives via 64 dual-socket Intel™ Xeon™ processor 5600 Series I/O nodes. The system is configured with 3D torus interconnect topology, coupled with the dual rail QDR network to provide a cost-effective, power efficient and fault-tolerant interconnect.
While Gordon is well suited for traditional supercomputing workloads, the system offers features that are an exceptional resource for data intensive problems such as the configuration of each I/O node to provide 16 enterprise flash drives with a combined capacity of 4TB. These drives have memory latencies that are roughly two orders of magnitude smaller than those for hard disk and also designed to provide higher sequential read/write bandwidths. Staging large data sets on the flash drives has been shown to lead to greatly reduced run times for a number of data intensive applications. In a multi-user production environment, the drives will likely be configured into four software RAID 0 devices to strike a balance between maximizing performance and limiting contention for resources. By contrast, the flash drives in I/O nodes dedicated to a single application can be set up as a single large RAID device.
Gordon is specifically designed to address data-intensive applications to predict analytics of data from genomics, climate science, astronomy, energy, biomedical informatics and healthcare, social networks and many others. Gordon is now a key part of the next-generation high-performance computers (HPC) network that is available to the research community through XSEDE, the National Science Foundation’s next-generation program for an open-access national computing grid. To learn more go to http://www.sdsc.edu/supercomputing/gordon/
To learn more about Appro Xtreme-X™ Supercomputer, visit: http://www.appro.com/products