March 28, 2016

ODPi Defines Hadoop Runtime Spec; Operations Up Next

Alex Woodie

Today the ODPi issued the first set of documents that describes a standard distribution of basic runtime components for Hadoop, including YARN, HDFS, and MapReduce. Going forward, the organization is preparing a management specification for Hadoop as it considers which Hadoop problem area it will tackle next.

The ODPi was founded a year ago on the eve of the Spring Strata + Hadoop World conference as the Open Data Platform initiative to help reign in some of the complexity that’s impacting Hadoop distributors, software vendors, and users. The problems stem primarily from the rapid proliferation of Hadoop ecosystem components and the ongoing development of existing ones.

With upwards of two dozen separate components making up a distribution, developers and quality assurance (QA) testers were struggling to ensure compatibility from Hadoop distribution to Hadoop distribution; among the various Hadoop components; and with third-party products. One major Hadoop distributor reputedly employs 40 people just to ensure compatibility among products. By standardizing the Hadoop stack, the ODPi hopes to boost compatibility, cut down on complexity and reduce the need for testing, which are becoming big problems that threaten to slow adoption of the platform.

The ODPi Runtime Specification issued today includes three components, including a document describing the standard, a reference implementation of Hadoop based on version 2.7 from the Apache Software Foundation, and a validation and test suite that customers and vendors can use to ensure their software is compatible with the new spec.

The big goal with this release was to cover some of basic stuff around Hadoop, says John Mertic, senior program manager for the ODPi. “We said, Let’s get some of the simple stuff out of the way,'” he tells Datanami. “As you look at it on the surface, it doesn’t look like a ton of meat. But if you look at the issues it addresses, it’s actually fairly useful.”

The spec defines a standard way that Hadoop should be setup and configured, such as naming of JAR files, the location of files, and the presence of standard APIs.

“It seems like it’s fairly obvious, but these are actually really big pain points that ISVs have been running into, and it helps [mitigate] many, many development and QA hours,” Mertic says. “Lets ensure here that vendors aren’t changing public APIs in wild fashion. Lets ensure that an ISV can look at a Hadoop cluster and be able to tell who the vendor that provided it, and a number of other simple things.”

Later this year, the ODPi plans to issue its Operations Specification that digs a little deeper into other parts of the Hadoop stack, in particular Apache Ambari, the management interface used by administrators to provision, manage, and monitor Hadoop clusters. The Operations Spec will define a standard way that Hadoop should be configured for security, for high availability, and for cloud or on-premise deployments, says Roman Shaposhnik, director of open source at Pivotal, which is one of the founding members of ODPi.

ODPi had initially planned to include Ambari in its initial release, but decided to wait. “Were taking a little bit more time to get it right,” Shaposhnik says. “So far everybody has been focused on operating Hadoop in a data center but there is a tremendous amount of a need to standardize how Hadoop gets managed in the cloud. And that actually again is what we’ll be trying to address in the management spec, and that’s why it’s taking more time for us and pushing it into the second release.”

Hortonworks (NASDAQ: HDP), one of the founding members of ODPi, relies heavily on the open source Ambari tool, while Cloudera, which is not an ODPi member, relies on its own Cloudera Manager software.

The ODPi expects to adopt additional Hadoop-related projects into its specification program with every release or every other release, Mertic says. It’s up to the ODPi members to vote on which open source projects make it into the release, he says. “We’re starting to see rumblings of areas to focus on,” he says. “Clearly Spark, HBase, and Hive come up. Those are things that I’ve heard and seen starting to get thrown around.”

ODPi has 25 members

Slightly different project names came up at an ODPi meeting held last year, where attendees were encouraged to vote informally on the projects they’d most like to see in the next release.

“The component that bubbled up to the top was Kafka,” Shaposhnik says. “Then of course everybody was talking about Spark. That’s obvious, I guess. But people were also saying it would sure be nice if we had a really nice compliant SQL on Hadoop solution.”

The ODPi currently has 25 members. Software companies must be paying members of ODPi to display the ODPi seal of approval on their products, but anybody can download and test their software or cluster for compatibility.

Making Sense of the ODP—Where Does Hadoop Go From Here?

Pivotal Throws in with Hortonworks and Open Source

Technologies: Frameworks, Middleware

Sectors: Other

Vendors: Cloudera, Hortonworks, IBM, pivotal

Tags: cloudera, complexity, Hadoop, Hortonworks, industry standards, ODPi, open source, pivotal

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

ODPi Defines Hadoop Runtime Spec; Operations Up Next

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

ODPi Defines Hadoop Runtime Spec; Operations Up Next

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link