February 25, 2016

SQL-on-Hadoop Test: Each Engine Has ‘Sweet Spots’

George Leopold

(Bakhtiar Zein/Shutterstock)

Business intelligence has emerged as the top workload for Hadoop, ahead of data science and ETL. That has prompted bench markers to zero in on the performance of leading SQL-on-Hadoop engines for BI use cases.

AtScale, a BI-on-Hadoop specialist, released new benchmark results this week for leading SLQ-on-Hadoop engines, claiming its results reveal which engines are best suited to particular BI scenarios.

Overall, the benchmark survey found that the leading engines varied “depending on the type of query, size of data and other factors.” Each engine has its own “sweet spot,” and “enterprises will find that a blended usage of all engines might fit their company’s goals best,” according to AtScale, which is based in San Mateo, Calif.

The benchmark tested workloads running on Hive, Cloudera Impala and Spark. The results appeared to be a wash depending on the scenario. For example, AtScale said Hive, which is widely viewed as the default for SQL on Hadoop, did not by itself provide the fastest performance in all scenarios.

Meanwhile, the benchmark provided another boost for Spark, which has been making significant inroads in the enterprise. AtScale found that recent upgrades to the cluster computing engine boosted performance on smaller datasets. “We were surprised to find significant performance improvements between Spark 1.5 and 1.6,” AtScale said.

Industry analysts note that increased Hadoop adoption is focused on storage and scale-out capabilities. A shift to analytical workloads on Hadoop requires a deeper understanding of SQL-on-Hadoop tools, they add, particularly as Hadoop is used to tackle BI workloads.

The benchmark tests found that each of the SQL-on-Hadoop engines is sufficiently stable to support BI workloads. Performance results varied depending on the size of datasets and the number of concurrent users.

Spark SQL and Impala performed best on smaller datasets consisting of tables with as many as several million rows of data. Meanwhile, Impala outperformed Hive and Spark SQL in concurrent user testing. Hence, AtScale said enterprises planning to connect large numbers of business intelligence users to their Hadoop platforms should consider Impala as the primary processing engine.

The bench marker attributed the growing ability of SQL-on-Hadoop engines to handle BI workloads to flourishing open source-source innovation. That level of innovation will likely grow as companies like Cloudera make good on plans to donate its Impala project to the Apache Software Foundation. Impala is currently listed as an Apache incubator project.

In a blog post earlier this month, Cloudera said its Impala team has boosted its scale and stability, enabling deployment of Impala clusters with hundreds of nodes and running millions of queries while pushing “concurrency to thousands of users.” It also introduced new features like nested data types and tighter security.

Cloudera engineers also confirmed AtScale’s assertion that one engine does not fit all analytics scenarios. ” Despite Impala’s significant performance lead as an analytic database, Hive and Spark SQL continue to provide important capabilities for other use cases and users alongside Impala,” Cloudera acknowledged.

The Hadoop benchmark study is available here.

Recent items:

Picking the Right SQL-on-Hadoop for the Job

Spreading Spark Enterprise-Wide

Applications: Enterprise Analytics

Technologies: Frameworks, Processors

Sectors: Financial Services, Healthcare, Retail

Tags: benchmarking, Cloudera Impala, Hadoop, Hive, processing enginees, Spark, SQL on Hadoop

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

SQL-on-Hadoop Test: Each Engine Has ‘Sweet Spots’

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

SQL-on-Hadoop Test: Each Engine Has ‘Sweet Spots’

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link