Follow Datanami:
November 5, 2013

LLNL Introduces Big Data Catalyst

Isaac Lopez

The Lawrence Livermore National Laboratory (LLNL) this week announced its newest high performance (HPC) computing endeavor: a 150 teraflop per second, 7,776 core cluster named Catalyst, aimed at big data workloads.

Based on the Cray CS300 cluster supercomputer, Catalyst contains an impressive amount of power. Billed as a big data supercomputer, LLNL says that the construction of Catalyst is a departure from the classic simulation-based computing architectures that have been the blueprint of the last few decades.

Per the LLNL release:

The 150 teraflop/s (trillion floating operations per second) Catalyst cluster has 324 nodes, 7,776 cores and employs the latest-generation 12-core Intel Xeon E5-2695v2 processors. Catalyst runs the NNSA-funded Tri-lab Open Source Software (TOSS) that provides a common user environment across NNSA Tri-lab clusters. Catalyst features include 128 gigabytes (GB) of dynamic random access memory (DRAM) per node, 800 GB of non-volatile memory (NVRAM) per compute node, 3.2 terabytes (TB) of NVRAM per Lustre router node, and improved cluster networking with dual rail Quad Data Rate (QDR-80) Intel TrueScale fabrics. The addition of an expanded node local NVRAM storage tier based on PCIe high-bandwidth Intel solid state drives (SSD) allows for the exploration of new approaches to application checkpointing, in-situ visualization, out-of-core algorithms, and big data analytics. 

“Big Data unlocks an entirely new method of discovery by deriving the solution to a problem from the massive sets of data itself. To research new ways of translating Big Data into knowledge, we had to design a one-of-a-kind system,” said Raj Hazra, Intel vice president and general manager of the Technical Computing Group, in a statement. “Equipped with the most powerful Intel processors, fabrics, and SSDs, the Catalyst cluster will become a critical tool, providing insights into the technologies required to fuel innovation for the next decade.”

The new system is being called a “proving ground” due to its unique storage-heavy architecture combined with expanded DRAM and persistent NVRAM, which the national lab says will be well suited for big data problems, such as bioinformatics, business analytics, machine learning, and natural language processing. The new architecture, says the lab, “opens new opportunities for exploring the potential of combining floating-point-focused capability with data analysis in one environment.”

While the system sounds impressive for those who get to use it, the LLNL says it expects insights provided by Catalyst will become a basis for future commodity technology procurements– which could give enterprise observers a blueprint for their own future copycat systems.

Catalyst will be managed through the LLNL’s High Performance Innovation Center (HPCIC), an outreach organization with the aim of providing its clients with powerful computing resources for the purposes of increasing American competitiveness. LLNL says that the Catalyst supercomputer will be shared among the three collaborating partners with access rights based on level of investment. The HPCIC says its immediate aim is to offer access to Catalyst through its ongoing collaborations with American companies and research institutions.

While the system is available for limited use this month, it will be available for general use this December.

Related items:

SGI Aims to Carve Space in Commodity Big Data Market 

Why Cray is Clamoring for Your Code 

PSC Receives Grant for Data Exacell