A number of high performance computing (HPC) vendors have stepped up on the Hadoop platform with the intent of riding that train into more enterprise environments. From SGI to Convey, to more specialized system approaches from companies like Cray, the supercomputing folks are seeing oversized elephants at the head of their charge into big business datacenters.
Between the ingestion, storage, processing and distribution of large, multi-structured data volumes, there is bound to be complexity, especially when it’s happening within a Hadoop cluster.
However, according to HPC storage company, Data Direct Networks (DDN), all it takes to remedy the hassles is a Hadoop appliance that seamlessly knits together all the elements of such a cluster and put them under the blanket of a single management interface. And naturally, given their history, a high performance focus can't hurt either.
While that may be dramatic oversimplification of the hassle-hacking, the fact is, even with scads of new and tested distros and plenty of packaged appliances, there are still no real ways to get rolling with Hadoop without at least a few major headaches. For larger-scale installations, headaches translate directly to cost since support is usually needed and time to results is sliced.
DDN pointed to the challenges that are hindering Hadoop adoption in their argument for an appliance—in this case, one with HPC horsepower. The new hScaler, which they debuted at Strata, taps the company’s performance-oriented SFA12K storage system, Infiniband interconnects, a hardwired ETL engine that taps into over 200 common sources, and the Hortonworks Hadoop distro at the core.
hScaler marks DDN’s first foray into the Hadoop game with offerings that the company hopes will have pull outside of their traditional HPC hive. With emphasis on simplifying deployment and management, the system has some unique elements, including separation of the storage and compute for more variable workload options. The real appeal outside of functionality, though, contends Uday Mohan, whom we spoke with last week during the Strata conference, is the reduction in rack space via the dense configuration—a design that DDN says bumps power and cooling up 4x over a standard Hadoop cluster.
Practically speaking, one of the elements of all appliances that might have some curb appeal to users is the ability to have a single management interface. According Mohan, for users running Hadoop clusters, they are stuck with multiple management tools for different parts of the stack (network, storage, software, etc.). To reduce the general management headache DDN has stitched together a GUI for the entire stack, and says that having DDN as the single point of contact throughout saves incredible hassle since users would otherwise have to call different vendors to discover or correct problems.
On that note, DDN’s Rajiv Gorg says that there are four major challenges that are preventing Hadoop from its way into more enterprise environments. While they looked to the platform to solve scaling challenges, Gorg claims they traded in that capability for a different set of problems, including a challenge to build systems in the first place, inflexible configuration, difficult management and overall TCO.
While all these are important points, the one that is most pertinent for many enterprise uses cases falls under the inflexible configuration banner. According to Gorg, users can only scale along this rigid line that offers the same “fixed ratio of performance to capacity.” He argues that without the flexibility to address the very real need of different workloads operating across the same system, users are stuck along a single continuum without the ability to scale performance for one workload while opting for more capacity on another workload independently. For some users, having the ability to separate the storage from the compute corners could allow far more freedom in desigining specific workloads.
On the TCO side, all of the challenges he cited make the acquisition to operational costs very unbalanced, with a stated “60% operational cost over acquisition cost.” This factors in the density of their system, the possibility for downtime, failure, multiple support people and agreements—all when these could be remedied under the single unified appliance.
It’s also kind of cool to see Infinband whittling its way into the Hadoop hardware market. While there are some interesting debates about how much it can boost the traditional batchy workloads of a Hadoop system, there have been a number of announcements along these lines from HPC folks like Colfax, for instance, designed to appeal to the latency-driven crew.
And so the merger of HPC and big data continues—have a feeling that the upcoming International Supercomputing Conference in Leipzig (ISC 13) will be buzzing with more news of this variety—and the SC show in the states will be another big data show by the time it comes around again in November.
Never a dull moment.