OLTP Clearly in Hadoop’s Future, Cutting Says
Think Hadoop is just for analytics? Think again, says Hadoop creator Doug Cutting, who last week predicted that, in the future, organizations will run all sorts of workloads on their Hadoop clusters, even online transaction processing (OLTP) workloads, the last bastion of the relational legacy.
Cutting didn’t don a wig or fancy robe when he made his predictions about the future of Hadoop during a speech at the Strata + Hadoop World conference last week. He didn’t wave a magic wand or use a crystal ball. Instead, the plain-speaking technophile made his points by tapping into his own vast repository of knowledge on the topic. Oh, and PowerPoints.
“I don’t have a time machine. I can’t see the future any better than you can,” Cutting said. “I’m a guy who, in the past, looked at the present, looked at facts, and decided what to do next. I’m not attempting to look too far down the road.”
But as chief architect for the leading Hadoop distributor Cloudera, it’s in Cutting’s job description to have some idea where it’s headed. Besides, it was Cutting himself who set this ball into motion 10 years ago when he started writing this software product that’s having such a big impact on the IT industry and, arguably, the world at large. Clearly, the guy has an opinion on the matter, and that opinion clearly matters.
The basic facts, as Cutting sees them, are pretty clear. It all starts with Moore’s Law, which has given us continuous exponential increase in computing power for close to 50 years. “I wouldn’t bet against it continuing to improve,” he said. “We’ll be able to store and process more data in the future than we can today.”
Much of that data will be stored and processed in Hadoop, if Cutting’s predictions about Hadoop turning into an operating system kernel for a data-centric platform turn out to be accurate. Obviously, Hadoop can’t be a kernel in the same sense that Linux has a kernel or that Windows has a kernel. What Cutting means is that Hadoop will become the de facto standard on which developers will build applications in the future.
What started out as a limited, unsecure, and unreliable system for processing Java workloads has matured into a scalable, secure, and reliable platform for running all sorts of applications, Cutting said. “We saw initially higher level languages, Pig and Hive, that removed the requirement that you be a Java programmer to make use of this,” he said. “Then we started to see, in parallel, the addition of real-time components. First HBase providing a NoSQL API, then Impala with interactive SQL, and more recently, search.”
|Hadoop is clearly just getting started, as this slide from Cutting’s presentation demonstrates.|
It doesn’t take a data scientist to do a basic extrapolation of recent events around Hadoop, and see that it’s going somewhere. “More and more types of workloads will be supported on top of Hadoop,” Cutting said. “It’s a clear trend. In the near future, we’re seeing Spark in-memory streaming, graph–all kinds of new processing metaphors moving to this platform, providing you with new tools to combine, view, analyze, understand your data. And that, we can expect to continue.”
If this sounds a lot like the “Enterprise Data Hub” future for Hadoop that Cloudera CEO Mike Olson shared with the world last week, that’s because it is. “How far can we go with this? What’s the limit here?” Cutting asked. “My belief is the sky is the limit. It’s hard to imagine a kind of a workload that you can’t move to this platform.”
Obviously, there have to be limits, even if we can’t see them. But according to Cutting–who had the foresight to see that a new software platform would be needed to solve the problems of the future–the limits do not extend past running OLTP. There’s no reason why OLTP can’t run on Hadoop, he said.
Than in itself is a change of tune for the highly scalable pachyderm. “Transactions are something that were long thought to be something out of scope for this style of platform,” he said. “It’s an important class of workload that is currently well served, but not by the Hadoop platform.”
That will change, he predicted. In particular, Cutting cited the work that Google is doing in this regard. Google published a paper a year ago that described an internal system it built on their platform “that’s very similar to Hadoop,” and that can run OLTP. The paper “demonstrates that it’s possible to bring OLTP to this style of platform,” he said.
“In the past, when we’ve seen that it’s possible, within a few years, it happens,” he said. “The prediction we can make here is it’s inevitable that we’ll see just about every kind of workload move to this platform, even online transaction processing.”
To be sure, there are vendors looking to build transaction processing on the Hadoop backbone. Just this week, we covered Splice Machine’s plan to bring standard, SQL-compliant transactional capabilities to the NoSQL HBase database that resides atop Hadoop, but there are others.
Cutting cuts an unlikely figure for an IT superhero, but he wears his fame well. In a parallel universe, Hadoop’s rise to prominence may have never come to pass. It’s all very fatalistic, and, in a way, out of Cutting’s hands. “In the early days, I expected there to be multiple systems like Hadoop, competing to potentially become a platform,” he said. “And really nothing else has emerged. Hadoop has come to dominate the big data space, and it’s becoming really the kernel of the de facto standard operating system for big data.”
It may be a stretch to say that Hadoop single handedly started the big data revolution. After all, organizations have been pushing the limits of their data storage and data utilization capabilities for decades. But the idea that, with Hadoop, you never have to throw data away, ever, has had a fundamental impact on how we think about data, and on how we use can use data.
“We’re in the middle of a revolution in data processing,” Cutting concluded. “Revolutions are scary times. Folks aren’t sure what’s going to come next. They’re not sure what allegiances to make, what path there is to follow. Hadoop I think provides a clear path that will endure into the future supporting wide varieties of workload and I think you can be comfortable adopting Hadoop for your data needs.”
At least until the next big thing comes along.