Follow Datanami:
October 30, 2013

Cutting Sees Hadoop Receding Into Background

The future of Hadoop is as an integrative center of the data universe, explained the “Father of Hadoop,” Doug Cutting, this week at the Strata + Hadoop Conference in New York.

Cutting, who was the original architect of Hadoop, which spun off from his Nutch project after reading the now famous and very consequential Google whitepaper on MapReduce, explained in an interview this week that he sees Hadoop receding into the background while programs leveraging the system as a resource managing conduit take center stage.

“I really see it becoming this point of integration for lots of different tools and applications,” explained Cutting. “It gives you a methodology to bring different tools to the same set of data so that they can share it and operate much more effectively, so that the focus is really around resource management – around security and these sorts of things…”

“I don’t see it being used independently much at all,” he continued. “Initially it was a standalone thing, and since then we’ve seen these tools grow around it, and I really think that in the future [tools] are going to be the dominant thing – the tools around Hadoop, and not Hadoop directly. Rather [Hadoop] provides that glue that holds it all together.”

Cutting, who is the Chief Architect at Hadoop distributor Cloudera, as well as a director of the Apache Software Foundation (ASF), explains a view that has gained momentum in Hadoop circles, especially with the release of YARN within Hadoop 2.0.

“When we set out to build Hadoop 2.0, we wanted to fundamentally re-architect Hadoop to be able to run multiple applications against relevant data sets,” explained Hortonworks founder/architect and Hadoop contributor, Arun Murthy earlier this year. “And do so in a way where multiple types of applications can operate efficiently and predictable within the same cluster – this is really the reason behind Apache YARN, which is foundational to Hadoop 2.0. By managing the resource requests across a cluster, YARN turns Hadoop from a single application system to a multi-application operating system.”

This take on Hadoop, however, is not without its critics, who say that projects like YARN will take years to provide the same features and level of scalability and maturity that are already available, and that the current Hadoop trajectory is in danger of reinventing the wheel. Despite this, the Hadoop community barrels on with its plans to transform the framework from a batch system to a real-time resource manager.

During the Strata event this week, Cutting’s company, Cloudera, built on their proclamations from earlier this Summer of Hadoop being the new center of gravity in the datacenter, by articulating the vision further through the launch of a version of Hadoop within their Cloudera Enterprise 5 that they’re packaging as an “Enterprise Data Hub.”

Per Datanami’s Alex Woodie:

“Cloudera says the new Enterprise Data Hub approach will give users “the flexibility to run a variety of enterprise workloads– including batch processing, interactive SQL, enterprise search and advanced analytics”–all on one infrastructure. Keeping it all centralized will not only eliminate redundancy, but it will also help in the areas of integration, business continuity, security, and governance, the company says.”

For his part, Cutting says that he believes that aside from mere first market advantage, Cloudera’s vision is what separates it from the rest of the industry assembly line cranking out Hadoop code. “I think we have a really strong vision for where we can take this both as a technology and as a business, and are building a company that will last a long time, along with a technological platform which will not only service, but enjoy great utilization and help lots of other industries thrive.”

Related items:

Hadoop Version 2: One Step Closer to the Big Data Goal

Cloudera Articulates a ‘Data Hub’ Future for Hadoop 

YARN to Spin Hadoop into Big Data Operating System

Datanami