Follow Datanami:
November 10, 2011

SGI Spins Ready-to-Roll Hadoop Clusters

Datanami Staff

SGI made a showing at this year’s Hadoop World conference in New York, touting an enterprise-ready approach to offering turnkey Hadoop clusters. The company announced their Hadoop Cluster Reference Implementation (RI) today, their first offering following their recent partnership with Cloudera.

Like other SGI products, the SGI Hadoop Cluster is inspired by their roots in high performance computing. SGI claims that this cluster will have a keen eye on high performance for analytics for customers in speed-conscious, data-heavy verticals like financial services.

SGI points to the efficiency of their ready-made Hadoop cluster, noting that customers that have to sift through vast amounts of unstructured data require systems that scale well—not just for performance, certainly, but due to skyrocketing datacenter energy costs. They claim that the key to their power-awareness lies both in their management framework (SGI Management Center) and their utilization of the Xeon-based servers that make up their popular Rackable and Cloud Rack server lines. According to SGI, customers can fine-tune operations for maximum efficiency based on the performance needs of their workloads.

According to Praveen K. Mandal, senior VP of engineering at SGI, the company has already produced “tens of thousands” of custom-made Hadoop nodes for their customers. He says that the SGI Hadoop Cluster RI will help new customer avoid the design and configuration bottlenecks of getting such systems up and running with full integration out of the box.

Among such Hadoop clusters that came around before the formal announcement today was one particular configuration that garnered them top ranking in the Terasort data processing and analysis benchmark.

Using one of their Hadoop clusters (backed by the Cloudera distribution, given their partnership) SGI claimed to beat the previous record with a 20-node Rackable C2005-TY6 with E5630-series Xeons, 48 GB of memory and 4x 1TB SATA HDDs. This system was able to chew through a 100GB Terasort job in 130 seconds. Basically, the Terasort benchmark tests system, MapReduce and HDFS layers, a task that SGI says it performed 81% better than Oracle with its Sun X2270 cluster of the same size.

With the hardware piece of the puzzle in place, SGI says that their ability to lean on their partnerships with a number of big data analytics vendors creates a win-win for enterprise customers. They point to their existing relationships with Kitenga, Datameer, Pentaho and Quantum4D, all of which deliver variations on business intelligence and analytics software that are uniquely suited to particular business types.

Providing integration and turnkey access to the analytics capabilities of both Pentaho and Datameer are compelling selling points for SGI’s Hadoop-flavored cluster offering. Datameer, for instance, is designed specifically to provide BI capabilities right out of Hadoop with full data integration and visualization provided via an uncluttered UI. Pentaho is another prime partnership since it provides advanced BI tools in a graphical ETL environment that is ripe for handling MapReduce jobs. Both, as stated previously, are literally designed to leverage Hadoop on a well-integrated cluster.

Their partnership package on the integrated software front with Kitenga, a small company that doesn’t tend to get a lot of mainstream “big data” coverage is a key addition for customers that rely heavily on the need for advanced visualization of large datasets. In many ways, this is also the case with their ability to offer Quantum4D, which is another advanced visualization and data modeling software suite.

The annual Supercomputing conference is just around the corner (SC11) in Seattle. At this “Superbowl for Supercomputing” there is a good chance we will see other vendors with their foot well into the high performance computing door spinning out similarly integrated offerings for big data enterprise customers.