September 6, 2013

Stinger Looking to Tez to Cross 100x Performance Line for Hive

Isaac Lopez

In an effort to put some real-time sting into Apache Hive, a coalition of developers announced project “Stinger” earlier this year – an effort aimed at a 100x increase in Hive’s performance. Currently in phase two of the effort, the group says that they’ve made significant progress, with more on the way.

Earlier this year, we reported on the preliminary results of the project, staffed by collaborating developers from SAP, Yahoo!, Microsoft, Twitter, Facebook and Hortonworks. At the time, the group said they had achieved 35x- 45x performance improvements for common analytical queries using Hive. In a recent article on the Hortonworks web site, developer Carter Shanklin gave an update on the project, explaining that they are nearing the end of phase two, and are making preparations for the Tez-aided push that they expect will take them over the 100x mark.

When Hortonworks launched their Hortonworks HDP 2.0 beta this summer, YARN got all of the accolades and attention. However, hiding in the release behind all the YARN hoopla was a second release of Stinger improvements, which in some ways are more notable than the YARN stuff, given the pervasiveness of the Hive querying tool.

Among these additions was the preview of a new vectorized query engine which Shanklin says makes the map stages far more efficient, boosting performance by another 5x- 10x. According to Shanklin, using TPC-DS Query 95, a complex query that includes a 3-way fact table join, they were able to achieve a 60% speedup on Hive 11 from Hive 10 – with a 4x speed up from there in HDP 2.0 on 200 GB of data. Not bad, but it’s still far from the 100x that they’re promising. That bump, says Shanklin, will come by way of Apache Tez.

While they’ve made great progress on their initiative, Shanklin indicated that a major keystone that the group is aiming for is the integration of Hive on Apache Tez. Launched into incubation at the same time as the Stinger initiative, Apache Tez is an application framework built on YARN which allows the execution of directed acyclic graphs (DAG) of tasks. As developer, Arun Murthy explained, through DAGs, Tez generalizes the MapReduce paradigm to a more powerful framework enabling projects such as Apache Hive, Pig, and Cascading to meet requirements for human-interactive response times and extreme throughput at the petabyte scale.

Shanklin says that Tez is where they believe the threshold of the 100x performance improvement for Hive will ultimately be crossed, turning Hive into a query framework that will respond more in line with “human time,” (i.e. queries in the 5-30 second range) without needing to change the HiveQL interface.

While not currently ready for prime time, Shanklin says that they are inching closer and expect to release this next phase of the project in beta form soon, which of course, is welcome news for developers stuck waiting for their queries to come through while their list of discovery questions pile up.

In the meantime, we’ll continue to follow the progress being made, and look forward to hearing about how these performance improvements make a difference in future applications.

Hortonworks Proposes New Hadoop Incubation Projects

Hortonworks Levels Up With $50 Million Haul

Applications: Research Analytics

Technologies: Frameworks

Sectors: Other

Vendors: Hortonworks

Tags: Hadoop, Hive, Stinger

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Stinger Looking to Tez to Cross 100x Performance Line for Hive

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Stinger Looking to Tez to Cross 100x Performance Line for Hive

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link