Making Sense of the ODP—Where Does Hadoop Go From Here?
It was no coincidence that Hortonworks and Pivotal unveiled Open Data Platform last week at the start of Strata + Hadoop World, which is Cloudera’s semi-annual parade to everything Hadoop. But now that the dust has settled on that bombshell, let’s look a little closer at the ODP, the organization’s key members, and what it means to the Hadoop stack and ecosystem going forward.
To recap: the ODP was unveiled one week ago by Pivotal, Hortonworks, IBM, and 12 other large companies with the stated goal of creating a single standard, called the ODP Core, for Hadoop and related products.
The current way of developing Hadoop applications is too slow and fragmented and it’s plagued by duplicated efforts, the group says. By centralizing the effort and creating a “test once” standard, it will “take the guesswork” out of developing software for Hadoop, thereby freeing up enterprises to build business apps on the platform instead of endlessly stitching code together and fixing it when it breaks.
Sounds great, right? Not to Cloudera‘s chief strategy officer and co-founder Mike Olson, who slammed the ODP in a blog post. “Pivotal and Hortonworks claim that the ODP is driven by an industry-wide longing for standardization in the Apache Hadoop ecosystem,” he writes. “I don’t believe them.”
“I have an engineer’s disdain for industry consortia in general, and for vendor-driven consortia in particular,” Olson says. “Far too often, these organizations aim not at promoting, but rather at slowing, innovation in the technology industry.”
While the ODP claims to champion open source code and open source development models, the consortium is actually not open at all, Olson says.
“As a vendor-driven consortium, membership is only for enterprises with serious money–it ought to be called the ‘Only Dollars Play’ alliance,” he writes.
Amr Awadallah, Cloudera’s CTO and a co-founder, also had some choice words for the ODP (although they don’t cut quite as deep as Olson’s). “We were recently invited to join a consortium called the Open Data Platform,”
Awadallah said during his Strata keynote last week. “We thought about that and after some deep thinking, we realized that consortiums already exists. It is the Apache Software Foundation.”
If you want to develop Hadoop software in the open, then the ASF is your best bet, says Awadallah.
“The ethos of Apache is…you join Apache by contributing code, by contributing to the platform and creating new innovations, which is the right way,” he says. “So we at Cloudera will continue to do that, continue to contribute to Apache–not just by code. We are taking whatever money we would have paid to that consortium and investing that in Apache as sponsor, and we continue to hire the great engineers working in the Apache ecosystem.”
ODP, Pivotal HD, and HDP
On the same day that the ODP was unveiled last week, Pivotal also announced a major change in its Hadoop strategy. That included electing to open source key components of its Big Data Suite–including the HAWQ SQL-on-Hadoop query engine, the GemFire NoSQL database, and the Greenplum massively parallel processing (MPP) data warehouse.
Pivotal also announced a close strategic partnership with Hortonworks, which is now providing level 2 and level 3 technical support for customers of Pivotal’s Hadoop distribution, called Pivotal HD. The two former competitors will also work together to integrate key parts of Pivotal’s technology, such as GemFire and HAWQ, into open source Hadoop.
Pivotal HD will continue to form part of its Big Data Suite, but don’t expect to see much innovation there. The next release of Pivotal HD is not excepted to look much different than the Hadoop distribution of its new partner, Hortonworks. Pivotal—which laid off about 60 employees working in the big data end of the house, including its top Hadoop developers—is essentially saying it will no longer attempt to innovate at that level. Instead, the company will be looking to differentiate itself based on its CloudFoundry business model.
Hortonworks, of course, has long championed itself as being the “purest” Hadoop distributor and having the purest Hadoop distributions. It touts the fact that it employs the most committers to Apache Hadoop and related sub-projects at the ASF. While Hortonworks appears to have scored a major coup in a rallying a group of large vendors and enterprises against its primary competitor, Cloudera, it’s not clear yet how Hortonworks plans to navigate any potential conflicts between the ASF and the ODP.
Definitely Not Down with ODP
If the ODP is not about openness, then what is it about? Some in the industry think that part of the reason for creating the consortium was to provide a shield to allow Pivotal to bow out of the Hadoop business with some of its pride still intact.
“It’s probably the best exit from a marketplace I’ve ever seen,” says Jack Norris, chief marketing officer at MapR Technologies, a Hadoop distributor that is not part of the ODP. “If you look at who’s left [at Pivotal], the CTO, the main Hadoop talent, they all left. There’s not a lot of people there.”
Norris seemed to lean toward the take that Gartner analysts Merv Adrian and Nick Heudecker expressed in a blog post last week. “What’s the real reason behind this? A governance body?” Norris said at Strata last week. “It seems to me Apache is working fine….My guess is there will be this announcement and we will not see a lot in the future. I don’t think it’s going to be a factor.”
A similar view was espoused by Mike Hoskins, CTO at Actian, another big data software vendor that isn’t in the ODP. “I read Mike Olson’s blog post and he’s 100 percent right,” Hoskins told Datanami during a briefing at the Strata show. “I’ve been around longer than you. Who needs another vendor-led open consortium? They come, they go. They’re all irrelevant. And the truth is…the Apache Software Foundation is going to play this role. It is the open data platform. It already exits.”
Hoskins then points to the other names on the ODP masthead, like IBM, SAS, Teradata, and EMC. “They’re kind of nose-at-the-glass, looking in,” Hoskins says. “We’re fighting to get our optimized YARN-certified engine [adopted by customers] because we play. The rest of these guys don’t play at all. So they want relevance. That’s the unfair analysis.”
Hold On Just a Second There
So Pivotal used the ODP to gracefully bow out, Hortonworks used the ODP to consolidate its position, and the rest of the companies joined the ODP in a desperate grasp for relevance. Seems pretty straightforward, right? Unfortunately, things are rarely simple and clear cut, especially in the enterprise software space.
While it’s true that ASF is supposed to be the place where Hadoop innovation occurs in the open, there are some questions whether it’s actually succeeding at that job. That leads us back to Hoskins.
“The fair analysis is Apache isn’t yet defining the box in a suitable way, so let a couple of adults get in the room–IBM, Hortonworks, EMC, and Pivotal–and let’s accelerate that process of defining a more common box,” he says. “Apache is young and immature and they haven’t yet said ‘These are the 17 building blocks [that you need to worry about]. We’ll test these, instead of testing these 400 individual blocks.'”
According to Matt Schumpert, the director of product management for Datameer, the ODP or something like it is critical to attract independent software vendors (ISVs) to the Hadoop platform.
“I’m a little surprised to see Cloudera pan that idea,” Schumpert told Datanami during a briefing at Strata last week. “I’m surprised they don’t recognize that there’s a strong need for that. If you’re an ISV building on Hadoop right now, it’s a real pain because there’s no standards. Things keep breaking and changing.”
While there is a widespread perception that the Apache Software Foundation provides a standard on which to build and test one’s applications, that perception doesn’t necessarily mesh with reality.
“They provide a release version, which anyone else can build upon and embed and freely release whatever they want, which is what all the vendors do,” Schumpert says.” They’re free to modify anything, and that makes the interfaces to the ISVs shifting sands.”