November 22, 2011

Interview: Cray CEO Sees Big Future in Big Data

Nicole Hemsoth

In many ways, this was Cray’s year, in part because of an uptick in their “traditional” supercomputing business, but also because of their expanding emphasis on data-intensive computing.

As one of the founding companies behind high performance computing as we know it, the fact that Cray made news on its home turf during last week’s annual Supercomputing Conference (SC11) both on the expected (HPC) front, as well as in the big data arena is rather striking.

We caught up with Cray CEO, Peter Ungaro at the show this year to get his take on the closing gap between some of their work in high performance computing and data-intensive computing. Overall, as you’ll see below, Ungaro seems confident about the new approach that might help the stalwart supercomputing icon move into fresh new verticals.

As Ungaro noted during the interview, the company is planning a three-pronged approach to target new verticals that rely on purpose-built systems. Going forward, one can expect to see the company beef up its partnership strategy as well as expand on some of the hardware innovations that are making shared memory systems like the XMT important for big data applications.

Aside from key announcements about the Blue Waters deal and news related to their Sonexion storage play, Cray’s emphasis on data-intensive science and verticals definitely took center stage. At the heart of these conversations is, as mentioned earlier, the Seattle-based crew’s XMT—their super-scalable shared memory system that has been purpose-built for data mining and analytics at the massive scale.

The company generated some buzz around the XMT and its overall big data focus this year with a string of announcements and case studies in tandem with Pacific Northwest National Laboratory. The two set about proving the value of shared memory systems like XMT on a range of big data problems, including advanced contingency analyses and operations on seriously large social media data. While there is a pretty detailed list of case studies here, the lab was instrumental in serving as a proving ground for the value of supercomputing technologies turned on their side to meet the unique needs of data-intensive science and other problems.

In a nutshell, the XMT massively multithreaded system is the company’s big in for the big data set. According to Cray, the XMT is “purpose-built for parallel applications that are dynamically changing, require random access to shared memory and typically do not run well on conventional systems.” They say that the foundation to this big data readiness lies also in the multithreading capability that is well-matched to complex graph-oriented databases and tasks that include graph analysis, pattern matching and anomaly detection.

Such a system has great appeal for the Web 3.0 customers who are analyzing massive amounts of social and customer data, but for a company with deep networks in academia and government, this aligns them nicely with the goals of major national security, fraud detection and other areas that might open new inroads—or at least expand existing ones.

The XMT takes advantage of the Cray XT infrastructure, with the ability to scale from 16 to over 8,000 processors to provide over a million concurrent threads and a grand total of 64 terabytes of shared memory. They point to the separation between compute, service and I/O nodes as a key to efficiency as well as the boost provided by their own Threadstorm multithreaded processor, which hinges on the AMD Torrenza Innovation Socket technology. At the risk of rattling off too many figures, it might be best to just hand over the spec sheet.

While the system is said to be tailored to meet the needs of complex graph-based problems, one missing element here was a strong showing on this year’s burgeoning Graph 500 list. IBM definitely stole that show this year, with notable placements from SGI and others with a shared-memory, big data-ready focus. While the XMT did place in the middle at PNL, it will be interesting to see who picks the system up and submits in time for the June 2012 list.

Needless to say, Cray has a big data play wrapped up neatly in the XMT. However, they’ve been moving ahead with their big data ambitions in an effort to target new applications in other ways this year. For instance, back in May, the company announced a partnership with Sandia to create the Supercomputing Institute for Learning and Knowledge Systems (SILKS), which has a specific emphasis on creating the next generation of data-intensive supercomputers.

 According to Sandia, the goal is to “make sense of huge collections of data rather than carry out more traditional modeling and simulation of scientific problems.” Under the agreement, the team behind the project is using Cray technologies, specifically the company’s XMT, to efficiently tackle some of the most data-intensive problems in science and engineering.

Between news of the Blue Waters deal, its swift move into super storage, use cases around the XMT, and of course the SILKS project, Cray is unquestionably walking the line between HPC and big data. As Ungaro notes in the above video, there are big changes coming—and a big chance for Cray to enter new markets and find itself yet again.