September 27, 2016

ODPi Tackles Hive with Latest Hadoop Runtime Spec

Alex Woodie

ODPi today unveiled the second major release of its Runtime Specification that’s geared at setting a standard for Hadoop components to ensure greater interoperability among distributions and third-party products. New additions to the spec include Apache Hive and the Hadoop Compatible File System (HCFS). ODPi also announced more ISVs have completed interoperability testing.

Hadoop isn’t a relational database. But the familiarity that many business analysts have with SQL is helping to drive the popularity of SQL-on-Hadoop solutions, such as Hive. While every major vendor has its own flavor of SQL for Hadoop, Apache Hive, as the oldest relational data store for Hadoop, is arguably the most widely deployed, and (for better or for worse) continues to be the standard by which other SQL-on-Hadoop solutions are measured.

So it was no great surprise to see ODPi tackle Apache Hive consistency with its Runtime Specification 2.0, which it announced today in advance of the Strata + Hadoop World show taking place this week in New York City.

The Hive spec released by the ODPi is based on Apache Hive version 1.2, which is the latest release of the distributed relational data store for HDFS. The organization says the standard will “reduce SQL query inconsistencies across Hadoop Platforms” and ensure that core Hive functionality that will continue to behave in a predictable way as future versions of Hive are released.

Meanwhile, the addition of HCFS to the ODPi Runtime Specification is seen boosting interoperability with Hadoop distributors, other software vendors, and cloud service providers that want to use other file systems in their Hadoop clusters besides just HDFS.

HCFS was established by the Apache Hadoop project to define how other file systems can work with Hadoop components, such as MapReduce and Hive. According to the Apache Hadoop wiki, active development in the HCFS project currently includes GlusterFS, OrangeFS, SwiftFS, and GridGain. Other file systems that are involved in HCFS include Windows Azure BLOB Storage, the CassandraFS, CephFS, CleverSafe Object Store, Google Cloud Storage Connector, Lustre, the MapR FileSystem, Quantcast File System, and the Veritas Cluster File System.

While HDFS is the primary file system used in Hadoop clusters, it’s by no means the only one. MapR has extended HDFS to be compatible with NFS via its proprietary MapR File System, Amazon (NASDAQ: AMZM) uses its S3 object store to backend the Elastic MapReduce service, while IBM (NYSE: IBM) supports GPFS with its BigInsights distribution of Hadoop.

“The trend we are seeing amongst those provide Hadoop platforms is that a key piece of differentiation is the underlying filesystem,” says John Mertic, the Director of Program Management for the ODPi. “This is especially true for cloud vendors. It makes little sense for them to optimize for HDFS when they have a block/object store available that is much better to leverage for their infrastructure.”

Setting a standard implementation for HCFS will help storage and cloud vendors leverage their native storage solutions as part of an ODPi Runtime Compliant Hadoop Platform, and thereby reduce the incompatibilities that end-users face, ODPi says.

Meanwhile, ODPi announced that more big data software vendors have committed to running their products through the ODPi Interoperable Compliance Program. The new vendors commiting to submit their products to compliance testing include DataTorrent, Pivotal, SAS, Syncsort, WanDisco, Xaivent and Zettaset. Currently, Altiscale, ArenaData, Hortonworks, IBM, and Infosys Apache Hadoop Platforms are ODPi Runtime Compliant, the organization says.

Hortonworks, Pivotal, and IBM were among the founding members of the Open Data Platform (ODP) when it launched just before the Strata + Hadoop World show in February 2015. The organization’s goal was to fight the increasing complexity in the Hadoop stack by providing a set of standards for core Hadoop components. Vendors would benefit by getting a “test once, use everywhere” standard.

While ODPi has issued two release of the Runtime Specification, the group is still planning on releasing its first Operations Specification this year, Mertic says. Apache Ambari, which distributors like Hortonworks (NASDAQ: HDP) use as the main operations console for Hadoop, will be part of that spec, but not as the cornerstone piece, Mertic says.

“We spent much of the summer moving our focus from building a spec around Ambari to helping layout the best practices of installing, configuring, and managing applications on a Hadoop platform,” he tells Datanami via email. “Early feedback has been quite positive with this shift, but it has resulted in us going back to the drawing board to a large degree. Coupled with the delay in Ambari 2.4 release, we are definitely behind our initial planned release schedule but the results should have a greater impact on the Hadoop/Big Data ecosystem.”

ODPi Offers Olive Branch to Apache Software Foundation

ODPi Defines Hadoop Runtime Spec; Operations Up Next

Technologies: Frameworks

Sectors: Financial Services, Healthcare, Retail

Vendors: altiscale, ArenaData, DataTorrent, Hortonworks, IBM, Infosys, ODPi, pivotal, SAS, SyncSort, WANdisco, Xaivent, Zettaset

Tags: apache, Hadoop, HCFS, HDFS, Hive, Linux Foundation, ODPi, standards

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

ODPi Tackles Hive with Latest Hadoop Runtime Spec

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 15, 2024

April 12, 2024

April 11, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

ODPi Tackles Hive with Latest Hadoop Runtime Spec

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 15, 2024

April 12, 2024

April 11, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link