Follow Datanami:
July 11, 2012

Midwest Growing Hub for HPC, Big Data

Gary Stiehr

With the announcement of a new particle discovery from the Large Hadron Collider (LHC), quite possibly the long-sought Higgs boson, we get a glimpse at how advanced networking combines with High Performance Computing (HPC) in the midwest to form a critical point for discovery and data dissemination.  

The LHC’s CMS experiment’s massive data sets flow into the U.S. through 10 Gb/s trans-Atlantic network links from the LHC at CERN to the Starlight high-performance network exchange facility in Chicago.  From there, it is processed by thousands of cores at Fermi National Accelerator Laboratory in Batavia, IL, before being stored in large-scale storage systems. Leveraging advanced networks, such as ESnet and Internet2, the data is transferred to more than 1700 U.S. scientists at 94 institutions for further analysis.  The midwest is leveraging its HPC, supercomputing and advanced networking expertise to continue to build momentum tackling Big Data challenges.

The midwest has long had a strong foundation providing supercomputing and HPC expertise to support world-class research.  The Argonne Leadership Computing Facility near Chicago and the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, as just two examples, help power some of the nation’s toughest computational research challenges.  

GPU expertise has been coming out of the midwest for years as the University of Illinois at Urbana-Champaign (UIUC) was the world’s first NVIDIA CUDA Center of Excellence. With FPGAs emerging on the Graph 500 as a power-efficient and high-performance technology for certain big data workloads, the FPGA R&D at Washington University in St. Louis will continue to develop valuable expertise and technologies.

As a result of the continued growth and investment in Big Data, a new conference, StampedeCon, has begun in the midwest. The midwest has a concentration of organizations who are creating HPC and big data technologies. For example, the St. Louis-based companies Global Velocity, Exegy and VelociData are creating FPGA-based technologies to handle massive amounts of streaming data for information security, financial market data and text mining. Also, St. Louis-based Appistry develops its Ayrris platform that is designed for resiliency and performance for big data analytics in genomics, defense, logistics and other industries.  In fact, Appistry has recently made some big advances in a number of markets.

Many organizations operating in the midwest are also leveraging HPC and big data as a strategic asset.  For example, Ohio-based Proctor & Gamble has enhanced product innovation and reduced costs by leveraging HPC resources, such as those at  the Ohio Supercomputing Center and NCSA’s Private Sector Program, as a core part of its modeling and simulation projects. While Ford and other midwest-based advanced manufacturers regularly leverage HPC-based modeling and simulation, Ford has discussed how it may also be able to leverage Big Data technologies to analyze its rich vehicle sensor information to further improve vehicle design.

Another midwest strength–life sciences research–yields a great example of a strategic usage of HPC to analyze Big Data assets for clues to cancer.  In 2008, the Genome Institute at Washington University in St. Louis was the first to sequence and analyze a cancer genome–the difference between the DNA of a cancer patient’s healthy cells and cancerous cells–which has kicked off a new era in cancer research.  

In fact, the New York Times recently published a front-page article detailing the recovery of a cancer patient who had his cancer genome analyzed at The Genome Institute in St. Louis.  

To enable discoveries from these large data sets, however, it has been essential to leverage High Performance Computing technologies. Some large-scale cancer genomics projects involve trillions of data points moving through sophisticated bioinformatics pipelines and require petabytes of high performance storage and thousands of CPU cores.  This expertise in using HPC was one important factor that led to The Genome Institute’s role in projects like the Pediatric Cancer Genome Project, which aims to decode the genomes of more than 600 childhood cancer patients.

The energy and opportunities emerging from midwest communities is keeping the region moving forward as HPC and supercomputing evolve to include increasingly data-intensive Big Data workloads.  For example, the St. Louis High Performance Computing group, STLhpc.net, was founded in March 2011.  In the first year, over 300 members from dozens of St. Louis organizations have connected online and at local HPC and Big Data events.  

With projects like the Kansas City Google Fiber project bringing ultra high-speed broadband network in Kansas City and the Loop Media Hub project aiming to do the same in St. Louis, the midwest is even integrating high-performance networking into its communities.  Startup companies in the midwest have HPC resources readily available through programs like Innovative Technology Enterprises at the University of Missouri – St. Louis and Blue Collar Computing at the Ohio Supercomputing Center.

As the data available to organizations continues to grow in quantity and complexity, the need for Big Data technologies and expertise is growing quickly.  Given the concentration of expertise in the midwest, a new national Big Data conference series, StampedeCon, will launch in St. Louis on August 1, 2012.   The conference will focus on the role of Big Data, its business value, potential cost savings, and Big Data use cases at Facebook, Nokia, Kraft Foods, Monsanto and more.

Gary Stiehr has over twelve years of experience managing and operating High Performance Computing (HPC) environments. He leads the Information Systems group at The Genome Institute at Washington University in St. Louis, which manages the HPC systems used to analyze the genetic basis of cancer and other diseases. Recently, he also started the StampedeCon Big Data conference series, which is connecting those who are interested in better understanding and leveraging Big Data and discussing what’s next for this emerging field.  He is also the founder of STLhpc.net, which aims to contribute to growth in St. Louis through the application of High Performance Computing.  Prior to joining The Genome Institute, Gary worked at Fermi National Accelerator Laboratory building and supporting the U.S. CMS Tier-1 regional computing center for the Worldwide Large Hadron Collider Computing Grid.

Datanami