Hitting the Reset Button on Hadoop
Hadoop has seen better days. The recent struggles of Cloudera and MapR – the two remaining independent distributors of Hadoop software — are proof of that. After years of failing to meet expectations, some customers are calling it quits on Hadoop and moving on. But others in the industry think Hadoop will survive into the next decade, albeit with lower expectations.
Hadoop began over a decade ago as Nutch, a distributed search engine developed by Doug Cutting and Mike Cafarella that eventually was put into service at Yahoo. Running on a network of X86 servers equipped with hard drives, Hadoop disrupted the math on storage and helped usher in the big data era and brought parallel computing into the mainstream.
Use cases grew, and eventually Cloudera was positioning Hadoop as the preeminent platform for enterprise data analytics. Companies would be able to extract value from massive collections of data by utilizing Hadoop as an open and adaptable platform, the thinking went. This centralized approach would enable customers to eliminate standalone analytics systems and use a common set of tools and techniques to build a variety of applications, like fraud detection, customer 360, and product recommendations, all atop a single, all-encompassing data platform. Like Tom Petty wrote, the sky was the limit.
We watched as Hadoop technology evolved at an incredible rate. Hadoop version 1, with a batch-oriented, MapReduce development paradigm, quickly gave way in 2013 to Hadoop 2.0, which gave us a system capable of simultaneously running multiple workloads. The ecosystem flourished, as open source projects with names like Hive, Impala, Storm, Giraph, Spark, and Tez arose to tackle specific big data challenges and opportunities. Machine learning, real-time analytics, graph analytics – all of it would run under one platform. The yellow pachyderm was truly soaring.
Trouble in Hadoop-Land
However, cracks began to appear around 2015, when customers started complaining about software that wasn’t integrated and projects that never entered production. Distributions were shipping with more than 30 different sub-projects, and keeping all of this software integrated and in synch became a major challenge.
Also troubling was the fact that it was becoming exceedingly difficult to find folks with the high levels of technical skills required to build finished applications with Hadoop. Eventually Hortonworks, which had emerged as the number two distributor beyond Cloudera, realized the software was moving too quickly and it broke Hadoop down into “core” elements and slowed its release cadence. But the perception that Hadoop was complex and difficult to use stuck.
Meanwhile, public cloud vendors emerged with their own big data solutions, which used much of the same underlying technology in mainstream Hadoop implementations, but in an easier-to-digest package. Together, Amazon Web Services, Microsoft Azure, and Google Cloud have captured a large share of big data storage and processing.
When Cloudera finished its acquisition of Hortonworks earlier this year, executives described the converged company’s strategy to manage big data storage and workloads running across hybrid on-prem and multi cloud environments. But poor first quarter sales for MapR and Cloudera – plus doubts about Cloudera’s converged product roadmap – spooked investors.
That brings us up to the present. But what happens from here? Datanami talked to a few industry experts, and the consensus seems to be this: Hadoop isn’t dead yet, but it’s not likely to regain its position any time soon. The technology can continue, they say, but users should not expect Hadoop to be the silver bullet for all their big data needs. Basically, Hadoop expectations need a haircut.
Infoworks CEO Buno Pati takes a nuanced view of Hadoop technology and where it fits into the grand analytics scheme of things. As a partner at Centerview Capital, Pati realizes that Hadoop has not lived up to the hype. But he also isn’t ready to throw in the towel quite yet.
“I don’t attribute what’s happening recently to what’s going on in cloud. I attribute it to a knee jerk reaction to expectations that were poorly set and not met with Hadoop,” he says.
Pati likens Hadoop to a specialized operating system for distributed data storage and analytic compute workloads. Operating systems are complex beasts, and developing applications for them is usually left to the experts. However, Hadoop was largely sold as a platform that already contained those applications, which helped contribute to the mismatch between reality and expectations.
“With an operating system, you have a great deal of complexity. And the applications that ride on top of that operating system to make it usable by the common man is somewhat missing,” he says. “Yet it was presented as the solution to all problems, not recognizing that operating system nature of what Hadoop really is. And we all know that writing to an operating system directly is a complicated job and it’s a job for specialists.”
Cloudera actually exceeded revenue expectations in its most recent quarter. But what nailed the company to the cross – and what possibly ushered Tom Reilly and Mike Olson out the door – was its estimate of future revenue. However, that doesn’t change the fact that some companies are pulling out of Hadoop in favor of the cloud, whether for sound economic reasons or just a case of fear, uncertainty, and doubt (FUD).
“Hadoop is absolutely going away with cloud capabilities,” says Oliver Ratsezberger, CEO of Teradata, which was stung by the early Hadoop hype. “You don’t need HDFS. It was an inferior file system from the get-go. There’s now things like S3, which is absolutely superior to that.”
Ratzesberger was an early adopter of Hadoop, having used the technology while building software at eBay in the 2007-2008 timeframe. “We knew what it was good for and we knew what it was absolutely never built for,” he continues. “We now have customers – big customers just recently – in Europe who told me recently, the $250 million in Hadoop investments, they’re writing off, completely writing off, tearing it out of their data centers, because they’re going all cloud.”
Chris Lynch, the CEO of AtScale, tells a similar story about the crashing of Hadoop’s inflated expectations.
“I joined AtScale a year ago March,” he says. “The company was almost exclusively focused on doing what we do, but doing it for Hadoop users, and mostly working with Cloudera. I shifted the company’s focus into virtualizing all data stores, with an emphasis on the $120 billion legacy OLAP market. When we did that, the company went from struggling to [success]….The company couldn’t get funded, and then in eight weeks, I get $50 million led by Morgan Stanley, and we’ve had the most successful three quarters in the company’s history, subsequent to the change.”
Today, 75% of AtScale’s business comes from the cloud, while about 25% from on-prem Hadoop systems, Lynch says. The future is all about managing data in a hybrid manner, split between multiple clouds and on-prem, he says.
“We’re a microcosm of the market,” he says. “Customers want to get to the cloud. There’s been a lot of friction up until AtScale to do that. But our virtualization layer eliminates any disruption to get to the cloud.”
Hype Bubble Burst
Iguazio CEO Asaf Somekh knew something was amiss when New York Times columnist Tom Friedman dedicated a significant portion of his 2016 book “Thank You For Being Late” to Hadoop. “There’s a chapter about Hadoop and how Hadoop is going to solve all the problems of the world,” Somekh says. “This is when I said, Tom you went too far.”
For Somekh, Hadoop still has value and the software is good. But instead of positioning Hadoop as the solution for all data problems, Cloudera and others should go back to focusing on one core area: data warehousing. That’s where the Hadoop distributors really got out of their lanes.
“Instead of just focusing on replacing the data warehouse and keeping Hadoop for that focused area, which is huge on its own, they were actually trying to take it to applications that are more online and real time, which is a sexy area, but why do that if you already have an area that can actually do well?” he says.
However, there are other aspects to Cloudera’s struggles – things that have nothing to do with the Hadoop project or the surrounding tech ecosystem — that are also worth mentioning. For instance, the acquisition of Hortonworks was never going to be easy for Cloudera, says Patrick Osborne, vice president and general manager of big data for Hewlett Packard Enterprise.
“I think you need to separate the market from the way those companies are run, in my opinion,” Osborne says. “Bringing Cloudera and Hortonworks together – any merger of that size is challenging from a cost perspective.”
Near-Term Roadmap Challenges
Osborne says a little perspective should be brought to the discussion about the death of Hadoop. “Maybe that’s sensationalizing it a little,” he says. “At the end of the day, it’s about the time arc. Whether we like to say it or not, we have a lot of customers who are running mission-critical general ledgers on mainframes and NonStop systems and Unix. Those technologies are being bought, are being serviced. They make up part of the ecosystems. So people who tend to view these things in black and white and completely greenfield – it’s not super practical.”
With around 2,000 customers, Cloudera still has a solid base of customers who are going to rely on it to support software that’s already been installed and, in many cases, is relied upon for day-to-day management of the business. While some Hadoop customers will pull the plug, the idea that Hadoop clusters are going to be ripped out en masse really doesn’t hold water in terms of how computer platforms have historically lived and died.
“The need for Hadoop hasn’t gone away,” Infoworks’ Pati says. “I think the need is greater than it ever was, and it’s actually accelerating. And while the cloud is also accelerating , on-prem systems are not going away. Large companies, particularly in regulated industries, will have on-prem, hybrid multi-cloud environments that will require something, whether it’s Hadoop or something like Hadoop, to manage data and analytics in a distributed environment.”
As the face of Hadoop, Cloudera’s biggest challenge right now is uncertainty around the converged platform that will combine elements of CDH and HDP. Told that a new converged platform was forthcoming by the end of 2019, existing Hortonworks and Cloudera customers pulled back on incremental investments, which led to the poor first quarter, which led to the ouster of Reilly and Olson.
Convincing investors that it has a viable strategy will be tough, but it’s not impossible, says Lynch, who previously was CEO of Vertica and also has spent time as a venture capitalist.
“Having an idea and having the ability to implement it are two different things,” Lynch says. “The market doesn’t believe they have the ability to implement it. It doesn’t mean that they won’t and it doesn’t mean they can’t change the company with new leadership and an infusion of different DNA. But to date if you look at their moves, if you think of all the things they could have bought, and they bought Hortonworks. That’s probably what put these guys into retirement.”
A few years ago, at the height of inflated expectations around Hadoop, the market wondered if we had reached “peak Hadoop.” The question now, with props to Gartner and its Hype Curve, is whether Hadoop has hit bottom yet. When the hype has been fully purged, then it will be free to start the long climb back to respectability.
In the wild west of IT, the best technologies don’t always win, and sometimes mediocre products win greater market share on the basis of other factors. The question for big data architects trying to peer into the future is whether the overall market has the patience to wait for Hadoop Part Two, or whether Hadoop’s remarkable run has run out of gas.
Pati is optimistic about the potential for Hadoop, but realistic about the timeline and the overall chance of success. “I think it’s going to stay there for a bit. But in the long run there’s going to be a bounce back,” Pati says.
“The one thing we all know is that once you fail to meet expectations, you’re in the penalty box for a while,” he continues. “I used to run a public company and that was always my greatest fear. It takes a long time to build up value. And it takes almost nothing to tear it down. Then you stay in that penalty box for a while. I think that is unfortunately what Cloudera has to work its way through.”
Editor’s note: Cloudera claims to have 2,000 customers, not 4,000 customers, as the story previously stated. Datanami regrets the error.