Follow Datanami:
September 23, 2019

Presto Moves Under Linux Umbrella

An SQL query engine developed by Facebook and moved earlier this year to a non-profit development group is now being hosted by the Linux Foundation.

The new Presto Foundation is seen as a way to scale the popular distributed SQL query engine launched by Facebook engineers in 2012 as a follow-on to Apache Hive. Along with Facebook (NASDAQ: FB), founding members of the Presto Foundation are Alibaba (NYSE: BABA), Twitter (NYSE: TWTR) and Uber (NYSE: UBER).

Presto became an Apache Foundation project in 2013 and has since attracted a growing list of backers that also include LinkedIn and Netflix (NASDAQ: NFLX). The architecture allows users to query a range of data sources, among them MySQL, PostgreSQL and Apache Kafka.

Backers tout Presto as an exabyte-scale data processor that scales to large clusters of machines while accessing a variety of data repositories. Facebook used Presto for interactive queries against several internal data stores, including its 300-petabyte data warehouse. The social media giant estimates more than 1,000 employees use it daily to run more than 30,000 queries scanning more than a petabyte of data per day.

The interactive query engine is also being used for batch and interactive tasks, according to Nezih Yigitbasi, Presto engineering manager at Facebook.

The Linux Foundation said Monday (Sept. 23) the Presto group would operate under a “neutral governance model” that would diversify the underlying development community to accelerate scaling. Each of the four founding member companies will oversee the new Presto Foundation.

Along with the ability to scale to large clusters, the distributed system can query data where it is stored, including Hive, Cassandra, relational databases and proprietary data stores, thereby reducing data movement and latency. Presto’s in-memory and distributed query processing is said to reduce query latencies from sub-seconds to minutes.

A survey of big data service users released last year found that Presto is outpacing Hive and Spark. While both remain the leaders among big data engines, Presto notched bigger gains across several key metrics, according to user survey released by Qubole.

The original goal of the Presto project was creating a dependable query engine capable of scaling to the exabyte range. “From the beginning, we stressed the importance of code quality, architectural extensibility and open collaboration with the community,” said Martin Traverso, Presto’s co-creator.

The new community under the Linux Foundation umbrella will help scale efforts “to solve the increasing problem of massive distributed data processing at Internet scale,” said Michael Dolan, the foundation’s vice president of strategic programs.

Recent items:

Presto Backers Bolster Its Open Source Origins

Presto Use Surges, Qubole Finds