Leverage Big Data
Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan
DataTorrent

November 06, 2013

OLTP Clearly in Hadoop's Future, Cutting Says


Think Hadoop is just for analytics? Think again, says Hadoop creator Doug Cutting, who last week predicted that, in the future, organizations will run all sorts of workloads on their Hadoop clusters, even online transaction processing (OLTP) workloads, the last bastion of the relational legacy.

Cutting didn't don a wig or fancy robe when he made his predictions about the future of Hadoop during a speech at the Strata + Hadoop World conference last week. He didn't wave a magic wand or use a crystal ball. Instead, the plain-speaking technophile made his points by tapping into his own vast repository of knowledge on the topic. Oh, and PowerPoints.

"I don't have a time machine. I can't see the future any better than you can," Cutting said. "I'm a guy who, in the past, looked at the present, looked at facts, and decided what to do next. I'm not attempting to look too far down the road."

But as chief architect for the leading Hadoop distributor Cloudera, it's in Cutting's job description to have some idea where it's headed. Besides, it was Cutting himself who set this ball into motion 10 years ago when he started writing this software product that's having such a big impact on the IT industry and, arguably, the world at large. Clearly, the guy has an opinion on the matter, and that opinion clearly matters.

The basic facts, as Cutting sees them, are pretty clear. It all starts with Moore's Law, which has given us continuous exponential increase in computing power for close to 50 years. "I wouldn't bet against it continuing to improve," he said. "We'll be able to store and process more data in the future than we can today."

Much of that data will be stored and processed in Hadoop, if Cutting's predictions about Hadoop turning into an operating system kernel for a data-centric platform turn out to be accurate. Obviously, Hadoop can't be a kernel in the same sense that Linux has a kernel or that Windows has a kernel. What Cutting means is that Hadoop will become the de facto standard on which developers will build applications in the future.

What started out as a limited, unsecure, and unreliable system for processing Java workloads has matured into a scalable, secure, and reliable platform for running all sorts of applications, Cutting said. "We saw initially higher level languages, Pig and Hive, that removed the requirement that you be a Java programmer to make use of this," he said. "Then we started to see, in parallel, the addition of real-time components. First HBase providing a NoSQL API, then Impala with interactive SQL, and more recently, search."

Hadoop is clearly just getting started, as this slide from Cutting's presentation demonstrates.

It doesn't take a data scientist to do a basic extrapolation of recent events around Hadoop, and see that it's going somewhere. "More and more types of workloads will be supported on top of Hadoop," Cutting said. "It's a clear trend. In the near future, we're seeing Spark in-memory streaming, graph--all kinds of new processing metaphors moving to this platform, providing you with new tools to combine, view, analyze, understand your data. And that, we can expect to continue."

If this sounds a lot like the "Enterprise Data Hub" future for Hadoop that Cloudera CEO Mike Olson shared with the world last week, that's because it is. "How far can we go with this? What's the limit here?" Cutting asked. "My belief is the sky is the limit. It's hard to imagine a kind of a workload that you can't move to this platform."

Obviously, there have to be limits, even if we can't see them. But according to Cutting--who had the foresight to see that a new software platform would be needed to solve the problems of the future--the limits do not extend past running OLTP. There's no reason why OLTP can't run on Hadoop, he said.

Than in itself is a change of tune for the highly scalable pachyderm. "Transactions are something that were long thought to be something out of scope for this style of platform," he said. "It's an important class of workload that is currently well served, but not by the Hadoop platform."

That will change, he predicted. In particular, Cutting cited the work that Google is doing in this regard. Google published a paper a year ago that described an internal system it built on their platform "that's very similar to Hadoop," and that can run OLTP. The paper "demonstrates that it's possible to bring OLTP to this style of platform," he said.

"In the past, when we've seen that it's possible, within a few years, it happens," he said. "The prediction we can make here is it's inevitable that we'll see just about every kind of workload move to this platform, even online transaction processing."

To be sure, there are vendors looking to build transaction processing on the Hadoop backbone. Just this week, we covered Splice Machine's plan to bring standard, SQL-compliant transactional capabilities to the NoSQL HBase database that resides atop Hadoop, but there are others.

Cutting cuts an unlikely figure for an IT superhero, but he wears his fame well. In a parallel universe, Hadoop's rise to prominence may have never come to pass. It's all very fatalistic, and, in a way, out of Cutting's hands. "In the early days, I expected there to be multiple systems like Hadoop, competing to potentially become a platform," he said. "And really nothing else has emerged. Hadoop has come to dominate the big data space, and it's becoming really the kernel of the de facto standard operating system for big data."

It may be a stretch to say that Hadoop single handedly started the big data revolution. After all, organizations have been pushing the limits of their data storage and data utilization capabilities for decades. But the idea that, with Hadoop, you never have to throw data away, ever, has had a fundamental impact on how we think about data, and on how we use can use data.

"We're in the middle of a revolution in data processing," Cutting concluded. "Revolutions are scary times. Folks aren't sure what's going to come next. They're not sure what allegiances to make, what path there is to follow. Hadoop I think provides a clear path that will endure into the future supporting wide varieties of workload and I think you can be comfortable adopting Hadoop for your data needs."

At least until the next big thing comes along.

Related Items:

Big Data Getting Big Bucks from VC Firms

Hadoop Version 2: One Step Closer to the Big Data Goal

Cloudera Articulates a 'Data Hub' Future for Hadoop

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In

Cray Supercomputer

Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

Leverage Big Data

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event