Leverage Big Data
Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan
Webinar Powering Research with Knowledge Discovery & Data Mining

March 02, 2013

Is Hadoop All Grown Up Now?


In the most recent conversations about Hadoop, the attractive part of the story—finding more use cases for the platform in large-scale, mission-critical enterprise settings—is easy to tell. However, part of what enables that story involves...well, something of a less sexy angle to the tale.

We’ve reached the phase in Hadoop’s evolution where some of the discovery and wonder has given way to some rather dry, albeit demanding, details. In other words, Hadoop is all grown up now—or at least it’s arrived at the human equivalent of say, getting an MBA without completely knowing how much it will all pay off in the end.

While the issues that the post-adolescent platform is now growing into might not be as interesting as the wild youthful days of experimentation, they are core to the long-term viability and future growth of the still-evolving space. To bring back the human metaphor, Hadoop is still struggling to take the book learning of that MBA into the real world—although it’s starting to piece things together enough to find its way into the middle management level at a Fortune 500.

This week every vendor in the distro game, including newcomer to the Hadoop game, Intel, had commentary and release info on tweaks to add to Hadoop’s readiness for the world of big business.  The goal at each company (and within the Apache community itself) is to increase the platform’s ability to be enterprise-robust and compliance-ready.

One of the chapters in Hadoop’s higher learning tome that is applicable to actual business environments revolves around the (again, rather unsexy) world of data governance. Throw in a little disaster recovery and compliance and you have all the makings of a boring (apologies to those of you who are over-the-moon excited about governance), but ultimately useful resource.

According to distro giant Cloudera, which itself sought to enhance the platform’s enterprise viability, this is no longer a goal or long wait for Hadoop’s graduation—their pet elephant has already been working hard at several companies in a mission-critical role.

In Cloudera’s Charles Zedlewski’s opinion, the missing pieces around data governance were keeping the platform’s out of regulated and highly policy-driven industries until more recently. When it comes to data governance, he notes that this “is a bigger issue now than ever for Hadoop, in part because these big systems are holding many disparate datasets. So what you had to do before was segregate all that data, which goes against the real value of Hadoop, which is meant to consolidate these.”

Further, most large businesses in regulated industries have extensive reporting, auditing and compliance requirements that Cloudera says haven’t been tackled before their string of releases this week. Outside of those, on a practical operations level, most of the large-scale enterprise users they are targeting have stringent business continuity requirements. As part of this, disaster recovery would be a requirement.

To this end, the company announced some updates to its core components, including CDH, which now has rolling upgrades (part of this continuity benefit) and to Impala, which has being continuously improved on the performance side to help certain industries deliver on their SLAs around data processing speeds.

At the heart of their enterprise-focused announcements though, is that larger concept of data governance—something that, in addition to disaster recovery and rolling upgrades—provides what Cloudera says are industry firsts (although MapR would take issue on that—more on that next week). Through the new Cloudera Navigator, which is directed at governance, Zedlewski says large companies can address all the data in their cluster under stringent auditability and access management capabilities.

“We’ve been improving security for some time,” he admits. “But the audit and access stories really haven’t been that strong.” He says that in addition to continuing to boost Impala (which is slated for the GA books around April), this marks another major area of investment. It’s all about getting users to trust more workloads to Hadoop—something that has been hereto barred for regulated and policy-heavy industries like financial services and life sciences.

Zedlewski pointed to one recent example of these needs in their customer, Monsanto, which was part of the impetus behind the move to extend these data governance and compliance features.

The agriculture behemoth was using Hadoop to store and analyze genomic information for their seed division in an effort to identify seeds resistant to drought, pests and diseases. However, due to their strict internal policies on governing that important data, they had particular ways the info needed to be handled, which put a damper on their Hadoop plans before Cloudera was able to step in with some solutions. Much of the work for Monsanto found its way into the foundations for Navigator, according to Zedlewski.

For other businesses, however, it’s more than just a matter of remaining compliant with laws or internal regulations. Others have continuity requirements that govern disaster recovery, a matter which Cloudera addressed with their BDR offering, which is an automated disaster recovery system built into Hadoop.

What’s worth noting here is that many large-scale customers already have sophisticated compliance and governance systems in place. Zedlewski says that they have no interest in replacing these (for example, Oracle’s DataGuard or Audit Vault) nor do they want to take on integrators (like Informatica) with their navigator or CDH products. For Cloudera, not to mention its competitors, including MapR, Hortonworks, and even Intel and its long list of partners, it’s about the platform versus taking over the entire datacenter management…at least for now.

While some conversations I had this week with actual users and developers I met at Strata led me to  believe that Hadoop is far from enterprise-ready and is still in experiment phases at most companies, these types of core fixes are going a lot way toward extending (if not its capability) its reputation.

Related Articles

Six Super-Scale Hadoop Deployments

Cloudera CTO Reflects on Hadoop Underpinnings

Cloudera Runs Real-Time with Impala

MapR Traces New Routes for HBase

Hortonworks Dishes on Distro Differences

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

Cray Supercomputer

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

Leverage Big Data

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event