Language Flags

Translation Disclaimer

HPCwire HPC in the Cloud Digital Manufacturing Report Green Computing Report


March 04, 2013

DDN Casts Hadoop with HPC Hook


A number of high performance computing (HPC) vendors have stepped up on the Hadoop platform with the intent of riding that train into more enterprise environments. From SGI to Convey, to more specialized system approaches from companies like Cray, the supercomputing folks are seeing oversized elephants at the head of their charge into big business datacenters.

Between the ingestion, storage, processing and distribution of large, multi-structured data volumes, there is bound to be complexity, especially when it’s happening within a Hadoop cluster.

However, according to HPC storage company, Data Direct Networks (DDN), all it takes to remedy the hassles is a Hadoop appliance that seamlessly knits together all the elements of such a cluster and put them under the blanket of a single management interface. And naturally, given their history, a high performance focus can't hurt either.

While that may be dramatic oversimplification of the hassle-hacking, the fact is, even with scads of new and tested distros and plenty of packaged appliances, there are still no real ways to get rolling with Hadoop without at least a few major headaches. For larger-scale installations, headaches translate directly to cost since support is usually needed and time to results is sliced.

DDN pointed to the challenges that are hindering Hadoop adoption in their argument for an appliance—in this case, one with HPC horsepower. The new hScaler, which they debuted at Strata, taps the company’s performance-oriented SFA12K storage system, Infiniband interconnects, a hardwired ETL engine that taps into over 200 common sources, and the Hortonworks Hadoop distro at the core.

hScaler marks DDN’s first foray into the Hadoop game with offerings that the company hopes will have pull outside of their traditional HPC hive. With emphasis on simplifying deployment and management, the system has some unique elements, including separation of the storage and compute for more variable workload options. The real appeal outside of functionality, though, contends Uday Mohan, whom we spoke with last week during the Strata conference, is the reduction in rack space via the dense configuration—a design that DDN says bumps power and cooling up 4x over a standard Hadoop cluster.

Practically speaking, one of the elements of all appliances that might have some curb appeal to users is the ability to have a single management interface. According Mohan, for users running Hadoop clusters, they are stuck with multiple management tools for different parts of the stack (network, storage, software, etc.). To reduce the general management headache DDN has stitched together a GUI for the entire stack, and says that having DDN as the single point of contact throughout saves incredible hassle since users would otherwise have to call different vendors to discover or correct problems.

On that note, DDN’s Rajiv Gorg says that there are four major challenges that are preventing Hadoop from its way into more enterprise environments. While they looked to the platform to solve scaling challenges, Gorg claims they traded in that capability for a different set of problems, including a challenge to build systems in the first place, inflexible configuration, difficult management and overall TCO.

While all these are important points, the one that is most pertinent for many enterprise uses cases falls under the inflexible configuration banner. According to Gorg, users can only scale along this rigid line that offers the same “fixed ratio of performance to capacity.” He argues that without the flexibility to address the very real need of different workloads operating across the same system, users are stuck along a single continuum without the ability to scale performance for one workload while opting for more capacity on another workload independently. For some users, having the ability to separate the storage from the compute corners could allow far more freedom in desigining specific workloads.

On the TCO side, all of the challenges he cited make the acquisition to operational costs very unbalanced, with a stated “60% operational cost over acquisition cost.” This factors in the density of their system, the possibility for downtime, failure, multiple support people and agreements—all when these could be remedied under the single unified appliance.

It’s also kind of cool to see Infinband whittling its way into the Hadoop hardware market. While there are some interesting debates about how much it can boost the traditional batchy workloads of a Hadoop system, there have been a number of announcements along these lines from HPC folks like Colfax, for instance, designed to appeal to the latency-driven crew.

And so the merger of HPC and big data continues—have a feeling that the upcoming International Supercomputing Conference in Leipzig (ISC 13) will be buzzing with more news of this variety—and the SC show in the states will be another big data show by the time it comes around again in November.

Never a dull moment.

Related Articles

DDN Talks to Scalability Wall

Direct Direct Tagged as Top Company Under $100 Million

Simplifying Big Data Storage Management

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There is 1 discussion item posted.

Network Congestion
Submitted by wmartinusa on Mar 12, 2013 @ 2:26 PM EDT


There is a wealth of papers on dl.acm.org identifying both cluster and node failures in hadoop data centers. (Wish I'd better bookmarked thenm, sorry). Most notable among them I thought was the scenario where network queuing factors and node resources at a HIGHLY visible (distributed to) node can degrade performance markedly.

Does the architecture in this article address cascading errors or the peak effects of requests that are routed to the same map-reduce node head?

" For some users, having the ability to separate the storage from the compute corners could allow far more freedom in desigining specific workloads." uses the term could ... not exactly even a commitment to in the works.

Post #1

 
Cray CS300-LC

Sponsored Links

Sponsored Whitepapers

Parallel Performance of the IMSL C Numerical Library with OpenMP

05/21/2013 | Rogue Wave Software

Download whitepaper containing benchmark results depicting the speedup achieved as a result of incorporating OpenMP directives in the IMSL C Numerical Library, for portable, cross platform analytics.

Download this Whitepaper...

Best Practices in Big Data Storage - Sponsored by Cleversafe, Cray, DDN, NetApp, & Panasas

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas

From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

SGI President and CEO, Jorge Titinger, on Big Data

SGI President and CEO, Jorge Titinger, talks about SGI's history and leadership in HPC and how that has converged into Big Data Solutions.

View Multimedia

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

View Multimedia

More Multimedia

SGI DataRaptor with MarkLogic Database

Job Bank

Datanami Conferences Ad

Featured Events

June 4-4, 2013
The Economist's Information Forum
San Francisco, CA
United States

June 10-13, 2013
Cloud & Big Data Expo
New York City, NY
United States

June 17-18, 2013
Forecast 2013
San Francisco, CA
United States

June 19-20, 2013
GigaOM Structure
San Francisco, CA
United States

June 26-27, 2013
2013 Hadoop Summit
San Jose, CA
United States

June 26-27, 2013
Big Data World Congress
London
United Kingdom

» View/Search Events

» Post an Event