Leverage Big Data
Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan


June 11, 2012

Understanding Data Intensive Analysis on Large-Scale HPC Compute Systems


Data intensive computing is an important and growing sector of scientific and commercial computing and places unique demands on computer architectures. While the demands are continuing to grow, most of present systems, and even planned future systems might not meet these computing needs very effectively. The majority of the world’s most powerful supercomputers are designed for running numerically intensive applications that can make efficient use of distributed memory. There are a number of factors that limit the utility of these systems for knowledge discovery and data mining.

Competitive pressures, exploding data and the ever-increasing need to integrate large-scale data often leave supercomputing research centers with few choices. The choice is not, if and when to build large data-intensive applications, but how quickly to have them available. Building them to accommodate exponentially growing data with a finite amount of time requires accelerated research analysis to meet the deadlines; a difficult situation even for leading supercomputing centers. 

 

Appro, Intel and SDSC – Joined forces to build a Major Data Intensive Supercomputer

Appro, Intel and San Diego Supercomputing Center (SDSC) at the University of California worked very close together for three years (2009-2011) to build a major data intensive supercomputing design. This enabled Appro to skip an Intel® processor generation and move to the next-generation architecture, Appro Xtreme-X™ Data Intensive Supercomputer, called “Gordon” by SDSC. This partnership offered early access to future technology roadmaps such as processors, flash memory, interconnect network and system configuration planning that contributed to Gordon Supercomputer design innovations and expertise that was used in advance of the system being deployed.  This early preparation resulted in a grant from the National Science Foundation (NSF) to allow this system to be built in advance and be available to offer a powerful supercomputer resource dedicated to solving critical science and societal problems using forward-looking HPC technology.

Among features such as reliability, availability, manageability and system configuration compatibility that were essential to build this successful data intensive supercomputer, this unique and innovative supercomputer also employs a vast amount of flash memory to help speed solutions now hamstrung by slower spinning disk technology. Also, new "supernodes" exploit virtual shared-memory software to create large shared-memory systems that reduce solution times and yield results for applications that now tax even the most advanced supercomputers.

Gordon supercomputer is available today and is able to deliver over 200 TFlops of peak performance based on the latest Intel™ Xeon™ processor E5 product family and achieving up to 35M IOPS from 300TB of Intel® Solid-State Drive 710 Series Storage. IOPS is an important measure for data intensive computing since it indicates the ability of a storage system to perform I/O operations on small transfer sizes of randomly organized data – something prevalent in database and data mining applications. Now, scientific applications can benefit of the fast interaction and manipulation of large volumes of structured data. The Gordon system and its smaller prototype, Dash, were specifically designed to handle these types of data intensive problems. Their unique architectural features bridge the “missing link” in the memory hierarchy and address the needs of an emerging class of applications whose working sets approach 1 Petabyte in size.

SDSC is proud to share that Gordon made its debut as the 48th fastest supercomputer in the world last November 2011, according to the Top500 list. Gordon’s Top500 result is notable in that the Tflop/s ranking that was achieved using about half the number of cores compared to most other systems. That’s because Gordon is among the first systems – and the first one commissioned by the NSF – to use Intel® Xeon® processor E5 Family, which perform twice as many operations per clock (eight versus four) of any system currently in use.

 

 Appro Gordom Supercomputer at SDSC (credit Alan Decker)


Gordon system is composed of 32 supernodes, each consisting of 32 compute nodes and two I/O nodes. A compute node contains 64 GB of DDR3 memory with two 8-core Intel® Xeon® processor E5 Product Family, with each core capable of eight floating-point operations per cycle. The aggregate performance of Gordon is in excess of 200 TFlops. Groups of 16 compute nodes can access a 4 PB parallel file system through the I/O nodes. The system delivers over 300 trillion bytes of high-performance Intel™ SSD 710 Series, flash memory solid state drives via 64 dual-socket Intel™ Xeon™ processor 5600 Series I/O nodes. The system is configured with 3D torus interconnect topology, coupled with the dual rail QDR network to provide a cost-effective, power efficient and fault-tolerant interconnect.

While Gordon is well suited for traditional supercomputing workloads, the system offers features that are an exceptional resource for data intensive problems such as the configuration of each I/O node to provide 16 enterprise flash drives with a combined capacity of 4TB. These drives have memory latencies that are roughly two orders of magnitude smaller than those for hard disk and also designed to provide higher sequential read/write bandwidths. Staging large data sets on the flash drives has been shown to lead to greatly reduced run times for a number of data intensive applications. In a multi-user production environment, the drives will likely be configured into four software RAID 0 devices to strike a balance between maximizing performance and limiting contention for resources. By contrast, the flash drives in I/O nodes dedicated to a single application can be set up as a single large RAID device.

Gordon is specifically designed to address data-intensive applications to predict analytics of data from genomics, climate science, astronomy, energy, biomedical informatics and healthcare, social networks and many others. Gordon is now a key part of the next-generation high-performance computers (HPC) network that is available to the research community through XSEDE, the National Science Foundation’s next-generation program for an open-access national computing grid. To learn more go to http://www.sdsc.edu/supercomputing/gordon/

To learn more about Appro Xtreme-X™ Supercomputer, visit: http://www.appro.com/products

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

Cray Supercomputer

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

NVIDIA

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event