November 19, 2015

Cloudera Targets Hadoop SQL Workloads with CDH 5.5

Alex Woodie

Cloudera is aiming to improve how SQL workloads run on Hadoop with today’s release of Cloudera Enterprise 5.5, which brings support for Spark SQL, support for JSON data types in Impala, better security on Impala and Hive, and the beta of a new SQL workload optimization tool.

SQL has been lingua franca for accessing and manipulating data within databases for decades, and so it should come as surprise that SQL is big on Hadoop, even though it’s not a database, per se. The prevalence of SQL skills and SQL interfaces in existing products make it a logical choice to use in Hadoop, even if it can’t do everything. That’s why you have machine learning and graph analytics engines, too.

Cloudera has put a lot of time and money into developing Impala, which essentially provides a Hadoop-based implementation of the type of powerful SQL engines found in massively parallel processing (MPP) databases from the likes of Teradata, Greenplum, and Netezza. The fact that Cloudera is now contributing Impala to the Apache Software Foundation (ASF) shows that it’s serious about driving adoption.

The importance of Impala to Cloudera is clear, which is why you see Cloudera adding support nested data types, such as JSON, to Impala. But not all SQL engines are equal, which is why Cloudera is also bringing support for Spark SQL to the platform, along with Spark’s MLlib machine learning library. That’s also why you see the company continuing to improve on Hive, the Hadoop project’s original SQL engine. (Hive and Impala both get column-level access controls with CDH 5.5.)

With the launch of Cloudera Enterprise 5.5 (which includes CDH 5.5, , Cloudera Manager 5.5 and Cloudera Navigator 2.4) Cloudera is making an extra effort to help users understand which tools are best situated for which workloads. “Hadoop doesn’t need to limit users to one tool that does everything,” says Cloudera product marketing manager Alexandra Gutow. “In fact we discourage that. One tool is never going to do everything well.”

Having so many SQL tools makes figuring out which one to use somewhat difficult. With today’s release of Clouder Enterprise 5.5, Cloudera is introducing a beta of a new service, called Cloudera Navigator Optimizer, that’s designed to help customers gain a greater understanding of SQL workloads running on other systems, and which SQL engine to use if they move those workloads to Hadoop.

Gutow describes Navigator Optimizer as a cloud-based service that generates optimization strategies for Hadoop. Users upload SQL logs from other applications into the tool, and the software, which is based on software Cloudera obtained in its acquisition of Xplain.io, identifies inefficiencies in the workloads.

Several CDH customers participated in a closed alpha of Navigator Optimizer that generated some interesting usage patterns. For instance, ETL workloads tended to run in the wee morning hours, followed by traditional BI queries. After noon, the pattern featured heavy ad-hoc queries by data analysts and data scientists against EDWs, while complex hand-written queries often dominated the hours before midnight.

“We have a lot of customers who are looking to get started with Hadoop and right now there’s not a lot of visibility into what the existing workloads are in the system,” Gutow says. “This has been what’s driving the Navigator Optimizer tool to build workload optimization strategies to address this.

While it’s not a “query optimizer” in the classic sense, the Navigator Optimzer can help a company devise a plan for migrating some workloads, such as ETL and ad-hoc query processing, off “legacy” systems and onto new Hadoop clusters.

“It will identify where the complexities may lie…and ultimately provide the recommendations for which of the workloads are going to run the best, consume the least development time, and going to give you the best results for Hadoop,” Gutow says.

Applications: Enterprise Analytics

Technologies: Middleware

Sectors: Other

Vendors: Cloudera

Tags: SQL Hadoop

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Cloudera Targets Hadoop SQL Workloads with CDH 5.5

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Cloudera Targets Hadoop SQL Workloads with CDH 5.5

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link