April 18, 2018

Presto Use Surges, Qubole Finds

Alex Woodie

Don’t look now, but Presto, the SQL engine developed by Facebook as a follow-on to Hive, is starting to catch on in a big way. According to a new survey of big data-as-a-service customers by Qubole, Presto logged impressive usage gains during 2017, and outgrew Hive and Spark across many metrics.

Qubole sees big data workloads growing across the board. And Hadoop/Hive remains the big daddy of big data engines. But Presto notched bigger gains than the other two big data engines across several measurements, according to Qubole’s “2018 Big Data Activation Report,” which reflects data collected from 200 Qubole users.

Here is some data from Qubole’s report, which was released yesterday:

Presto grew 420% in terms of compute hours on Qubole’s cloud platform from January 2017 to 2018. That beat Hadoop/Hive, which grew 102%, and Apache Spark, which grew 298%.

The number of Presto users increased, according to Qubole, which found that the year-over-year increase in the number of users that ran commands on Presto went up by 255%, compared to 171% for Spark, and 136% for Hadoop/Hive.

The number of commands run by users also went up considerably across the board, to 58 million commands last year. The number of Spark commands issued on the Qubole cloud increased by 439%, while Presto commands grew by 365% and Hadoop/Hive commands increased 129%.

In aggregate, customers are running 24x more commands per hour in Presto than Spark, Qubole found, and 6x more commands in Presto than in Hadoop/Hive. This data reflects the different things that each engine is good at, Qubole says.

“The impressive growth in Presto suggests greater efficiency and self-sufficiency, as it is a powerful tool for use cases with interactive and ad hoc SQL analytics where joins and highly concurrent simple queries are common,” Qubole writes in its report. “This engine doesn’t require a user to constantly tune cluster configurations, thereby reducing drastically the time-to-insights.”

Presto is a distributed in-memory SQL query engine originally developed by Facebook and released in 2013 to be a faster and more flexible alternative to Apache Hive (Qubole co-founders Ashish Thusoo and Joydeep Sen Sarma created Hive while managing Facebook’s Data Service Team). While it’s not tied to Hadoop, Presto can run on a Hadoop cluster, if needed. And in addition to processing data residing in HDFS, Presto has the capability to work with data stored in relational databases (Facebook is a big MySQL user) as well as object stores like Amazon S3.

Presto has other advantages, including arguably a more complete coverage of the ANSI SQL standard, according to Justin Borgman, CEO of Starburst, the commercial open source company behind Presto that was recently spun out of Teradata. A new query optimizer that’s in the works is expected to boost the speed quotient for Presto even more.

Qubole noted several other trends that emerged from its users’ anonymized data, including:

A strong correlation between the size of its customers and the relative self-sufficiency of the users, measured in terms of the ratio between administrators-per-user;
Customers use of Amazon EC2 Spot Instances also grew by nearly a factor of 5x across the three engines, suggesting greater customer sophistication at minimizing cloud costs;
Growing adoption of new big data tools, including Apache Airflow, which saw a nearly 30% increase in usage, as well as TensorFlow, XGBoost, Pandas, and MLlib.

You can register for a copy of the report here.

Teradata Bets Big on Presto for Hadoop SQL

Facebook’s Super Hive-Killing Query Machine Now Yours

Applications: Enterprise Analytics

Technologies: Frameworks

Sectors: Financial Services, Manufacturing, Retail

Vendors: Amazon, Facebook, qubole, Starburst, Teradata

Tags: AirFlow, big data, Hive, presto, qubole, Spark, sql, TensorFlow

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Presto Use Surges, Qubole Finds

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

April 17, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Presto Use Surges, Qubole Finds

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

April 17, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link