Follow Datanami:
May 20, 2016

Apache Foundation Keeps Eyes Wide Open with ODPi

If you’re looking for controversy in the Apache Hadoop community, you need look no further than the 2015 launch of the Open Data Platform Initiative (ODPi), which some perceived as an attempt to wrest control of Apache Hadoop from its open source roots. In fact, some Apache Software Foundation (ASF) leaders see potential good coming out of the ODPi, although there are valid concerns about negatives too.

Jim Jagielski, a founding member of the ASF and a member of its board of directors, shared his views on the ODPi during a candid Q&A session with the ODPi’s director of program management, John Mertic, at the recent Apache Big Data conference in Vancouver, British Columbia. Jagielski says ODPi has the potential to be both beneficial and detrimental to the overall Apache Hadoop and big data ecosystem.

On the plus side, Jagielski said that the ODPi’s goal to simplify the adoption of Hadoop and related projects by creating a single consistent specification that vendors in the ecosystem can write to is, on balance, a good thing.

ASF founder and director Jim Jagielski

ASF founder and director Jim Jagielski

“Quite frankly, the ASF isn’t really organized to be able to do a lot of that, for a lack of a better word, handholding,” Jagielski said. “So that’s the kind of area that ODPi can really shine, to be able to provide that sort of help for us.”

The ASF has always been pro-business, and its open source license was written in a way to encourage the adoption of Apache software in the enterprise. Anything that makes Apache software more amenable to enterprises is a good thing, provided it doesn’t impact how the software is developed, Jagielski said.

“These are some things, that in an ideal world, that we wish we could do, but we just can’t,” he said. “We either lack the resource to it or it’s not part of how we operate. Having entities such as the ODPi there to be able to fill in those holes is great.”

Last week, the ODPi announced that it has become a gold sponsor of the ASF. That level of sponsorship, which entails a $40,000 donation to the ASF, doesn’t get the ODPi any special access to the ASF or leverage to influence any projects. There’s a strict firewall between donations by corporate or even non-profit entities like the ODPi and the actual projects. (In fact, there’s no telling where that $40,000 donation will be spent, because the ASF doesn’t allow earmarking of funds that way.)ODPi_logo

While Jagielski clearly sees some areas where ODPi can have an impact, he cautioned that the group must refrain from overstepping its bounds. To that end, he shared some of his concerns about potential downsides of the ODPi.

The fact that ODPi is an industry consortium that seeks to wield the combined influence of a number of Hadoop ecosystem vendors is one of the biggest potential problems.

“Right now ODPi is like this superorganism,” Jagielski said. “So instead of there being small territorial wars between contributors inside the various projects, what you see is what could be, worst case scenario, a concerted effort from a single entity….to basically create a Hadoop ecosystem that ODPi wants and maybe not what the community wants. That quite frankly is one of the biggest concerns that we have.”

So far, the ODPi–which was founded by Hortonworks (NASDAQ: HDP), Pivotal, IBM (NYSE: IBM) and others reportedly already represents a majority of the committers for Apache Hadoop and related projects—and its members have not actually done anything to raise alarm bells.

The ASF will keep its eyes open regarding the potential for danger in the ODPi, Jagielski said. “Having entities such as the ODPi there to be able to fill in those holes is great. The problem is that when there is elbowing in holes that are already filled very well,” he said. “I don’t think it’s going to happen and people are aware that that’s the concern, and we’ll do what we can to be aware of it and avoid it.”

Fig. A: ASF engagement, as shared by ODPi's John Mertic

Fig. A: ASF engagement, as shared by ODPi’s John Mertic

The ODPi’s Mertic agreed that there’s a potential for the organization to evolve into something different in the future, that it could be hijacked to serve the needs of a group of special interests, but that it hasn’t happened yet. “That’s a challenge that I don’t think we know what the answer is, because we’ve not gotten to that point where it’s reared its ugly head,” he said.

While both the ASF and the ODPi are cognizant of the threat that ODPi could “go rogue” and the need to closely monitor the group, there are other areas where there is less agreement between the ASF and the ODPi. One of these involves the creation of what Jagielski called “an undue layer” between Hadoop developers and Hadoop users.

“Even though I think it’s great that there’s an organization, an entity like ODPi to help people like you and to have that sort of information,” he said, “open source in general thrives by a very tight feedback loop between the developers of the code and the project and the end users of that code. One of the concerns that we have…is making sure that that feedback loop is not curtailed in any way, that there’s a direct path.”

Mertic was a little less conciliatory here (even though he said he agreed). “The challenge is that the farther you get from the cluster, the engagement of the ASF drops,” he said, referring to a chart he showed earlier in the day during his keynote address (Fig.A). “What happens inside this cluster from an ODPi perspective, we really don’t care. That’s the distro vendors’ problem. If they want to supplement this with that or whatever, we’re not too concerned about that.

ODPi program director John Mertic

ODPi program director John Mertic

“What our main concern is–and I think as our specs start to evolve this will come through,” he continued, “is we want to make sure there’s a consistent experience for that SAS or Tableau or even open source vendors who do similar work, that they have a consistent baseline experience of their engagement with the cluster, that it will work the same, that it’ll act the same, that it will give back the same feedback.”

Mertic reiterated that the ODPi is not interested in developing or maintaining code. If one of its members modifies Apache Hadoop or a related project, ODPi bylaws require those changes to be incorporated into the core Apache project. ODPi is primarily about setting baseline specs for integrated Hadoop, narrowing the exposure to fast-moving releases of sub-projects (see Fig. B), doing integrating testing against those baselines, and introducing new use cases to the ASF.

This view, in fact, jibes nicely with how ASF views its role in the world, which is very clearly defined. “The very fact that ODPi exists isn’t in itself a bad or negative thing,” said Jagielski, whose day job is at CapitalOne. “In a lot of ways it’s a very good thing, because it’s doing exactly what we want. It’s providing a valuable service or niche for people out there.”

Since coming under the control of The Linux Foundation, the ODPi has set a more conciliatory tone and is reaching out to the ASF and the Apache Hadoop community to be clearer about its goals and to diffuse the any notions that people had that it has something hidden up its sleeve. That was the impetus of the gold sponsorship in the ASF and the show of community at Apache Big Data and ApacheCon last week, two conferences the ASF hires the Linux Foundation to produce.

Hadoop projects

Fig B. ODPi says Hadoop distributions vary greatly in terms of which sub-project releases they contain

Up to this point, the discussion has been mostly one-sided, with ODPi trying to explain itself, and ASF doing a lot of quiet nodding. While many ASF members have raised questions about ODPi–including Hadoop co-creator and Cloudera architect Doug Cutting, who told Datanami last week that the group “could be dangerous”–the ASF proper (if one dares speak about such a diverse and dispersed group as a singular entity) has been mostly neutral when it comes to the ASF.

The ODPi has a lot more to gain by not angering the ASF than the ASF has to gain by not angering ODPi, but so far, the ASF has treated ODPi lightly. The ASF has a lot of “street cred,” Jagielski said, and could essentially kill the ODPi movement with a single press release or announcement expressing displeasure with ODPi. It also has “the nuclear option” available to it if ODPi does something particularly nefarious: withdrawing commitment access for members of Apache Hadoop and related projects.

So far, it hasn’t had to do anything, and as things stand now, it won’t do anything about the ODPi. “As far as what ODPI does, as long as it really doesn’t negatively impact or harm the native Apache communities, it really doesn’t matter,” Jagielski said. “What they’re going to come up with, I’m not sure.  It doesn’t really matter to the ASF, as long as it doesn’t harm the community.”

Related Items:

Apache’s Wacky But Winning Recipe for Big Data Development

ODPi Offers Olive Branch to Apache Software Foundation

An Open Source Tour de Force at Apache: Big Data 2016

Datanami