Big Data • Big Analytics • Big Insight

December 13, 2013

Reaping the Fruits of Hadoop Labor in 2014

Alex Woodie

There’s been a lot of work poured into Hadoop over the last few years, culminating with the launch of Hadoop version 2 in October. As we head into 2014, commercial Hadoop vendors like Hortonworks and Cloudera will continue to invest in R&D, but you can also expect to see a stronger emphasis on converting that past investment into sales and profits. However, going forward, the business models for these top two Hadoop vendors are diverging.

On the technical front, 2013 has been a year of maturation for Hadoop. The launch of Hadoop version 2 brings us YARN and the possibility of running more interactive workloads in what had been MapReduce’s batch-oriented paradigm. We saw new releases of established Hadoop components like Pig, Hive, HBase, and Zookeeper, and the promise of bringing other Apache projects, such as Storm, Tez, and Accumulo, deeper under the Hadoop umbrella.

Next year, you can expect Cloudera (the biggest Hadoop vendor in size and marketshare) and Hortonworks (the open source Hadoop leader) to capitalize on the extensive R&D done in previous years. Both vendors expect numerous small Hadoop proof of concepts (POCs) to be converted into production clusters that real companies rely on to give them a competitive edge in their field. These are exciting times for the Hadoop community, and many are eagerly looking forward to enjoying the fruits of their labors.

But as Hadoop solidifies its role in the enterprise, there are cracks appearing in the foundation. It’s becoming increasingly clear that Hortonworks and Cloudera have different philosophies when it comes to core elements of a Hadoop business strategy. What’s the best way to build an enterprise Hadoop business?  How can one encourage the Hadoop ecosystem to continue to grow? And how can you do all this while respecting Hadoop’s open source roots?

Cloudera has gone further than other Hadoop vendors in articulating a business-oriented strategy for converting Hadoop R&D into a profitable business model. The company unveiled its “enterprise data hub” strategy at the Strata + Hadoop World conference in October, in which it envisions Hadoop at the center of a new data-focused architecture. Every type of data, whether it’s analytical or transactional in nature, goes through Hadoop on its way to somewhere else. (Hortonworks, MapR Technologies, and Pivotal, for what it’s worth, have similar strategies in play, but Cloudera has jumped out front in articulating the marketing message in the cleanest manner.)

As part of the shift to the “enterprise data hub” message, Cloudera CEO Tom Reilly surprised the Hadoop community a bit when he said that he sees Cloudera competing chiefly not against other Hadoop vendors, like Hortonworks and MapR, but fighting its biggest battles against the likes of IBM and EMC Pivotal, and other tier-one IT vendors in competing for enterprise IT dollars.

It’s a bold business move. Cloudera in effect has opened a two-front war. The first war, against Hortonworks and MapR and other smaller pure-play Hadoop startups, is a war that it believes it can easily win. After all, Cloudera already holds a dominant position in the enterprise Hadoop vendor field. According to a November report from IDC, Cloudera has about 32 percent of the Hadoop market, which we know from other sources represents about $200 million in software license and subscription sales in 2013. That number, by the way, is growing at an extremely healthy 60 percent annualized rate, according to IDC, and will make the total enterprise Hadoop software pie worth about $800 million by the end of 2016.

That $800 million is still a small piece of the overall big data market, which Wikibon analyst Jeff Kelly predicted would hit $18 billion in spending this year and grow to $47 billion by 2017. These are the bigger stakes that Cloudera clearly is reaching for with its broad “enterprise data hub” messaging and its strategy of competing directly against the mega vendors. Size absolutely matters in the IT business, where competition is cutthroat and winners get bigger and losers get bought. Cloudera’s business strategy charts a path to getting as big as it can as fast as it can (to become the first $1 billion Hadoop vendor, basically), so it can big enough to survive attacks from the megavendors and possibly become a megavendor of its own–to gobble up a larger share of not only the $47 billion big data market, but the broader $2-trillion IT market as a whole.

But Cloudera faces some tricky parts in navigating this path, chiefly in terms of not alienating the open source community that is the foundation of Hadoop, and in encouraging the growth of a broader Hadoop ecosystem and the community of commercial vendors that will build the next-generational applications and analytical tools that will deliver on Hadoop’s promise.

In a blog post, Hortonworks vice president of corporate strategy Shaun Connolly recently commented on Cloudera’s strategy and its relationship to the overall Hadoop community. He wrote that “the desire for more control and immediate profits…clearly discounts the power of community-driven open source innovation and its ability to outpace any single vendor’s agenda.”

Connolly hit a nerve in the Hadoop community. In an interview with Datanami, Connolly sounded surprised that there was such a response to his post. “I’m not sure if my words were overly sharp,” he said. “Some interpreted it as sharp. There was a fair amount of Cloudera folks who said they completely agree. I found that interesting.”

Connolly elaborated on his views, explaining that he sees two ways to chase the Hadoop opportunity. The first is to solidify the technology that underpins the market, and try to make the market as big as possible. The other is to create differentiated commercial technology, and compete with it against bigger, broader ecosystem. Hortonworks is taking the former approach, while Cloudera is taking the latter.

“[Cloudera] wants the biggest slice of the market, and they’re going to aggressively go after that, whereas I want to enable the market to be as big as possible,” he says. “I think [Cloudera's approach] will shrink the pie, because it gets people worried about whether are you going to compete with me. [Cloudera is saying] ‘I don’t care if the pie is as big as it can be, but if my slice of the pie is the majority of the pie, then I’m happy.'”

Connolly says Hortonworks is taking a slower and more prudent “rising tide floats all boats” approach. “Fighting a two-front war, if you will, against the megavendors but also against the enterprise Hadoop players is harder. I would rather just focus on making enterprise Hadoop what it is,” he says.

Trying to differentiate a proprietary Hadoop offering makes sense only when there aren’t solid open source projects in place, he says. Connolly looks to Red Hat as a model for what Hadoop can be. It took years for Linux to mature as an operating system. But maintaining a commitment to the open source operating system eventually succeeded for Red Hat, which is now a $1.2 billion business and embarking on a “hockey stick” growth curve.

With projects like Apache Hive, Apache HBase, and Apache Ambari, Hadoop is following in Linux’s open source footsteps, both at the kernel and the vendor community levels. “There’s a whole ecosystem that’s formed around data processing, operational management of clusters, as well as the core platform itself,” he says. “There’s easily another two to three years of work there minimum.”

Hadoop’s story up to this point is astonishing, quite frankly. Who could have predicted that an obscure technical paper written by Google nearly a decade ago would become the foundation of technology that has the capacity to change the industry? The big data industry is, of course, rife with hype. But underlying that hype is the reality that Hadoop’s core precept–that it’s better to move the application code instead of the data–is probably the right approach.

Hadoop has already proved itself to be a disruptive technology that’s shaking up the economics of data storage. Going forward, it’s anybody’s guess how it plays out. These are early innings still. But the moves that companies make now will foretell how Hadoop plays out over the next five years.

Related Items:

Hadoop Version 2: One Step Closer to the Big Data Goal

IDC Report: Cloudera Leading Hadoop Distro Choices

The Big Data Market By the Numbers