Will the Presto Community Ever Be United Again?
If you haven’t noticed yet, there’s drama in Presto-land. There are now two versions of the open source SQL query engine, PrestoDB and PrestoSQL, and each of them have different software foundations behind them. The folks involved with both projects say they want to unite behind a single Presto, but there are significant differences in how they get there.
To quickly recap, Presto is a distributed SQL query engine that emerged out of Facebook in 2013. Whereas Apache Hive (also out of Facebook) excelled at batch analytics, Presto’s forte is running ad hoc analytics against an assortment of data stores. Since it’s just a query engine, as opposed to a full database, it excels in federated environments, which means the data to be queried remains where it’s stored (HDFS, S3, PostgreSQL, etc.) and the query is brought to the data.
This description matches both Presto versions that are now at odds with each other. On the one side we have PrestoDB, which is the original version of Presto, as Facebook originally designed it. It is backed by the Presto Foundation, which launched in September 2019 under the auspices of The Linux Foundation. A company called Ahana was recently founded to offer professorial technical support for PrestoDB and develop the PrestoDB ecosystem with support from the Presto Foundation.
On the other side is PrestoSQL, which is a fork of the original PrestoDB tree. PrestoSQL is governed by the Presto Software Foundation, and it’s backed by Starburst, the company that was spun out of Teradata to develop and support PrestoSQL. Starburst employs the three original creators of Presto, who left Facebook to create PrestoSQL, which Starburst’s co-founder says is the community version of Presto.
Datanami recently talked with Starburst and Ahana to get their takes on the situation.
According to Justin Borgman, the co-founder and CEO of Starburst, the three creators of Presto–Martin Traverso, Dain Sundstrom, and David Phillips–decided to leave Facebook in 2018 to ensure that the Presto project was well managed independent of Facebook.
“These three creators really cared about a specific way of how the community would be governed. They wanted to make sure that it was purely meritocratic and that there’s an independent structure outside of Facebook,” Borgman said. “That was the only reason for the split. It wasn’t really technical difference necessarily, other than governance.”
Borgman says there are some technical differences between PrestoSQL and PrestoDB, but they’re not major differences. All three Presto creators are board members at the Presto Software Foundation, a non-profit organization that was created in January 2019.
“Facebook has their own version. That’s really the whole thing,” Borgman said. “They’re obviously at insane scale. They make their own hardware. They’re perfectly capable of running Presto themselves. We try to serve the rest of the market.”
PrestoDB, meanwhile, continues the original version of Presto as it was created at Facebook, according to the co-founders of Ahana, Steven Mih and Dipti Borkar. Mih, who is the CEO of Ahana, says the goal of the Presto Foundation (where he is a board member) is to promote the development of PrestoDB in a transparent and open manner.
“Facebook has a whole set of developers, and so does Uber, that are part of the Presto Foundation,” Mih tells Datanami. “They make up the technical steering committee now and they’re driving that forward in an open way. They don’t help each other right now [under PrestoSQL and the Presto Software Foundation] and we would like to see that start to work out in that fashion.”
Mih says the Presto Software Foundation lacks the sort of transparency that developers demand of open source projects today. He says he helped ratify the charter of the Presto Foundation, which was created in September 2019, to adhere to the open source guidelines provided by The Linux Foundation.
“What are the goals?” he asks. “It’s pretty simple, three things. It’s having an open, neutral, and unified Presto community.”
In addition to eliminating the confusion over the dueling project names and foundations (not to mention the nearly identical-looking foundation websites), Mih hopes to foster an “open and transparent” community that is not dominated by a single vendor.
By comparison, the Presto Software Foundation is dominated by one company, Starburst, says Mih, who is afraid that Starburst will implement a second license, in addition to the Apache 2.0 license, as Starburst’s investors have done at Confluent.
Mih’s hope is that “there’s not one company or individual who can just say we’re going to change this license,” he says. “They really want to make sure it’s in the community.”
Ahana emerged from stealth in early June with $2.25 million in seed funding from GV (formerly Google Ventures). Last week the company announced its first products: the PrestoDB Amazon Machine Image (AMI) on the Amazon Marketplace and a PrestoDB container on DockerHub. These products are free, and Ahana will provide technical support for a fee.
“These are the only free and completely open source Presto offerings on AWS,” says Borkar, Ahana’s chief product officer, who previously worked at Alluxio with Mih. “In addition to that, we’re also pushing out sandboxes. Presto is essentially the distributed federated query engines. But along with that you need a catalog and a couple of other things. So we’ve bundled the Hive metastore. We’ve bundled a few other data sources, like TPC-DS, etc. as part of the sandbox. It’s very easy for users to get started.”
There is a high level of complexity with PrestoDB, Borkar says, and Ahana is trying to simplify it, but without deviating from the original branch of Presto. “Big data is a very complex environment. Presto was kind of born in that environment and there’s a lot of adjacent components to that, which to put together for end users is still fairly complicated,” she says. “Our mission is to simplify that for users of PrestoDB.”
Starburst’s Borgman doesn’t dispute that Presto can be complex, but he questions Ahana’s motivation in joining the Presto community at this point in time.
“So now the goal here is to bring everybody back together under one foundation,” Borgman says. “I think Ahana probably was trying to capitalize on that confusion or whatever you want to call it. Again, I think there’s nothing there, and I think coming back together sort of shows that. That’s supported by the highest levels of Facebook and we’ve been having these discussions for a long time.”
Borgman says members of the broader Presto community, including representatives at Facebook, are working behind the scenes to reconcile the differences and create a unified Presto community, and that an announcement could come soon.
“That’s definitely what we’re working toward. We’re ironing out the details,” Borgman says. “Quite frankly, none of this would have been a discussion if Ahana hadn’t attempted to make it one, I guess I would say. But certainly, that’s what we’re all building toward. I think you’ll see something along those lines.”
On June 1, the day before Ahana came out of stealth, Starburst actually joined the Presto Foundation, the foundation behind the original PrestoDB version that is backed by Ahana, Mih says.
Mih claims that Starburst has made no effort to participate in the community. Being a member of the Presto Foundation “requires them to participate and merge back PrestoSQL or start to give ownership back to the foundation, because you can’t have a company driven project and a community driven project under [The] Linux [Foundation],” he says.
“It’s not so much as they said they wanted to join and they’re not. They’ve just been absent,” Mih says. “There’s been technical steering committee meetings. There’s contributions happening. But we haven’t seen any contribution to the original PrestoDB project [from Starburst]. We see instead the continued development of the fork.
“We would love it to come together,” Mih continues. “We would be glad. And we hope that happens. But unfortunately there’s still more confusion.”
Starburst asserts that Mih’s claim is false. The company was present at the one technical steering committee meeting that has been held since it joined The Presto Foundation on June 1, and it was there, a company spokesperson says.
The Boston, Massachusetts company also claims that Mih’s characterization of The Presto Software Foundation being controlled by one company or vendor is false. “PrestoSQL is not driven by one company,” the Starburst spokesperson says. “Many companies are active on PrestoSQL, including Netflix, LinkedIn, Lyft, Varada, Qubole, Salesforce, and Treasure Data to name just a few.”
It doesn’t seem likely that Starburst, which announced its own $42 million Series B round of venture capital financing last month, will go along with Ahana and Presto Foundation requests.
“They’re actually not even members of the Presto community,” Borgman says of Ahana, referring to the fact that Ahana doesn’t employ any committers to either Presto projects (which Mih says will change soon). “They’ve never written a line of code. They don’t have any contributors. I think they were trying to be opportunists on the success of Presto.”
Disputes over competing projects are nothing new in open source. Cloudera and Hortonworks were legendary rivals in the Apache Hadoop community before the two companies merged in early 2019. It seems that every popular open source project in the big data ecosystem – from Elasticsearch to MongoDB to Apache Kafka and Apache Cassandra – has its enthusiastic supporters, as well as competitors eager to replicate that success.
Now it appears that it’s Presto turn for open source community drama. As the project increases in popularity and companies invest in Presto deployments, it seems natural that people would naturally gravitate to the project.
Whether or not Presto can survive a split like we’re currently seeing with PrestoDB and PrestoSQL, however, remains to be seen.
Editor’s note: This story was corrected and updated. Starburst has not implemented a dual license, as the story previously stated. Datanami regrets the error.
If you haven’t noticed yet, there’s drama in Presto-land. There are now two versions of the open source SQL query engine, PrestoDB and PrestoSQL, and each of them have different software foundations behind them. Read more…
September 23, 2021
- AtScale Expands Semantic Layer Solution for Microsoft Excel
- CNCF End User Technology Radar Provides Insights into DevSecOps
- At Annual OCEANS 2021, Sofar Ocean Debuts First-of-Its-Kind Maritime Open Standard, Bristlemouth
- Elastic Announces the General Availability of Elastic App Search Web Crawler, New Features for Elastic Enterprise Search
- Securonix Achieves FedRAMP In-Process Authorization
- EDJX and Cubic Corporation Partner to Launch the Internet of Military Things Edge Platform
September 22, 2021
- GigaOm Names Moogsoft an Industry Leader in “Radar for AIOps Solutions” Report
- Clearsense Acquires Plug-and-Play AI Analytics Firm
- Purdue University Global Launches Master of Science in Data Analytics
- Dihuni OptiReady CognitX Deep Learning Servers and Workstations Powered by NVIDIA Ampere Architecture-based GPUs
- Scality Awarded New U.S. Patent for Breakthrough Technology in Hyper-Scale Data Protection
- MicroAI to Bring AI Training to Renesas MCUs
- Recent Gartner VP Analyst Sanjeev Mohan Joins Okera as a Strategic Advisor
- C3 AI Reinvents Enterprise Software UX With C3 AI Data Vision
September 21, 2021
- Healthcare Analytics Summit 21 Virtual Kicks Off Today
- Tesco Selects Teradata Vantage to Drive Enterprise-Wide Analytics at Scale
- Ketch Secures $20 Million in Series A1 Funding, Accelerating its Rapid Growth
- Yandex Spins Off ClickHouse into Standalone Company
- Analytics Vidhya Announces $5.5 Million Strategic Investment from Fractal, Aims to Train Half a Million Full Stack AI Professionals
- Nutanix Cloud Platform Breaks Down Silos in Hybrid Multicloud Operations
Most Read Features
- One on One with Google Cloud Product Director Irina Farooq
- Big Data File Formats Demystified
- Tabular Seeks to Remake Cloud Data Lakes in Iceberg’s Image
- What’s the Difference Between AI, ML, Deep Learning, and Active Learning?
- Who’s Winning In the $17B AIOps and Observability Market
- SambaNova Brings Custom Silicon To Bear on High-End AI Workloads
- In Search of the Modern Data Stack
- COVID-Driven Cloud Surge Takes a Toll on the Data
- Rethinking Education in an AI-First World
- Did Rockset Just Solve Real-Time Analytics?
- More Features…
Most Read News In Brief
- LinkedIn Open Sources Tech Behind 10,000-Node Hadoop Cluster
- Data and AI Salaries Continue Upward March, O’Reilly Says
- Gartner Shuffles the Technology Deck with Latest ‘Hype Cycle’ Report
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- Who’s Winning in Open Source Data Tech
- Can Apple Right its Privacy and Security Cart?
- Hands-Off: Manual Data Integration Tasks Plummeting, Gartner Says
- Why Is SAS Going Public?
- Apollo CEO Bullish on GraphQL’s Potential in the Enterprise
- Why Young Developers Don’t Get Knowledge Graphs
- More News In Brief…
Most Read This Just In
- TIBCO NOW 2021 Showcases Limitless Power of Data
- Cribl Raises $200M in Series C Funding on Traction with Global Enterprise Customers
- Toloka Launches Data Research Grants, Announces First Eight Recipients
- Anaconda Announces Support for Pyston, Hiring Lead Developers Kevin Modzelewski and Marius Wachtler
- MariaDB Announces SIS Provider Campus Cloud Services Migration to MariaDB SkySQL
- Transaction Processing Performance Council (TPC) Launches an Artificial Intelligence Benchmark (TPCx-AI)
- Kinetica Fuses Streaming and Contextual Analysis At Scale
- OneStream Previews New AI and ML Capabilities at Splash 2021
- JetBrains Launches Public Early-Access Program for JetBrains DataSpell IDE
- Aporia Launches Self-Serve Machine Learning Platform Open to Public
- More This Just In…
Sponsored Partner Content
October 5 - October 7
October 12 - October 14
October 19London United Kingdom
October 27 - October 28
November 29 - December 3
December 6 - December 10San Diego CA United States