Will the Presto Community Ever Be United Again?
If you haven’t noticed yet, there’s drama in Presto-land. There are now two versions of the open source SQL query engine, PrestoDB and PrestoSQL, and each of them have different software foundations behind them. The folks involved with both projects say they want to unite behind a single Presto, but there are significant differences in how they get there.
To quickly recap, Presto is a distributed SQL query engine that emerged out of Facebook in 2013. Whereas Apache Hive (also out of Facebook) excelled at batch analytics, Presto’s forte is running ad hoc analytics against an assortment of data stores. Since it’s just a query engine, as opposed to a full database, it excels in federated environments, which means the data to be queried remains where it’s stored (HDFS, S3, PostgreSQL, etc.) and the query is brought to the data.
This description matches both Presto versions that are now at odds with each other. On the one side we have PrestoDB, which is the original version of Presto, as Facebook originally designed it. It is backed by the Presto Foundation, which launched in September 2019 under the auspices of The Linux Foundation. A company called Ahana was recently founded to offer professorial technical support for PrestoDB and develop the PrestoDB ecosystem with support from the Presto Foundation.
On the other side is PrestoSQL, which is a fork of the original PrestoDB tree. PrestoSQL is governed by the Presto Software Foundation, and it’s backed by Starburst, the company that was spun out of Teradata to develop and support PrestoSQL. Starburst employs the three original creators of Presto, who left Facebook to create PrestoSQL, which Starburst’s co-founder says is the community version of Presto.
Datanami recently talked with Starburst and Ahana to get their takes on the situation.
According to Justin Borgman, the co-founder and CEO of Starburst, the three creators of Presto–Martin Traverso, Dain Sundstrom, and David Phillips–decided to leave Facebook in 2018 to ensure that the Presto project was well managed independent of Facebook.
“These three creators really cared about a specific way of how the community would be governed. They wanted to make sure that it was purely meritocratic and that there’s an independent structure outside of Facebook,” Borgman said. “That was the only reason for the split. It wasn’t really technical difference necessarily, other than governance.”
Borgman says there are some technical differences between PrestoSQL and PrestoDB, but they’re not major differences. All three Presto creators are board members at the Presto Software Foundation, a non-profit organization that was created in January 2019.
“Facebook has their own version. That’s really the whole thing,” Borgman said. “They’re obviously at insane scale. They make their own hardware. They’re perfectly capable of running Presto themselves. We try to serve the rest of the market.”
PrestoDB, meanwhile, continues the original version of Presto as it was created at Facebook, according to the co-founders of Ahana, Steven Mih and Dipti Borkar. Mih, who is the CEO of Ahana, says the goal of the Presto Foundation (where he is a board member) is to promote the development of PrestoDB in a transparent and open manner.
“Facebook has a whole set of developers, and so does Uber, that are part of the Presto Foundation,” Mih tells Datanami. “They make up the technical steering committee now and they’re driving that forward in an open way. They don’t help each other right now [under PrestoSQL and the Presto Software Foundation] and we would like to see that start to work out in that fashion.”
Mih says the Presto Software Foundation lacks the sort of transparency that developers demand of open source projects today. He says he helped ratify the charter of the Presto Foundation, which was created in September 2019, to adhere to the open source guidelines provided by The Linux Foundation.
“What are the goals?” he asks. “It’s pretty simple, three things. It’s having an open, neutral, and unified Presto community.”
In addition to eliminating the confusion over the dueling project names and foundations (not to mention the nearly identical-looking foundation websites), Mih hopes to foster an “open and transparent” community that is not dominated by a single vendor.
By comparison, the Presto Software Foundation is dominated by one company, Starburst, says Mih, who is afraid that Starburst will implement a second license, in addition to the Apache 2.0 license, as Starburst’s investors have done at Confluent.
Mih’s hope is that “there’s not one company or individual who can just say we’re going to change this license,” he says. “They really want to make sure it’s in the community.”
Ahana emerged from stealth in early June with $2.25 million in seed funding from GV (formerly Google Ventures). Last week the company announced its first products: the PrestoDB Amazon Machine Image (AMI) on the Amazon Marketplace and a PrestoDB container on DockerHub. These products are free, and Ahana will provide technical support for a fee.
“These are the only free and completely open source Presto offerings on AWS,” says Borkar, Ahana’s chief product officer, who previously worked at Alluxio with Mih. “In addition to that, we’re also pushing out sandboxes. Presto is essentially the distributed federated query engines. But along with that you need a catalog and a couple of other things. So we’ve bundled the Hive metastore. We’ve bundled a few other data sources, like TPC-DS, etc. as part of the sandbox. It’s very easy for users to get started.”
There is a high level of complexity with PrestoDB, Borkar says, and Ahana is trying to simplify it, but without deviating from the original branch of Presto. “Big data is a very complex environment. Presto was kind of born in that environment and there’s a lot of adjacent components to that, which to put together for end users is still fairly complicated,” she says. “Our mission is to simplify that for users of PrestoDB.”
Starburst’s Borgman doesn’t dispute that Presto can be complex, but he questions Ahana’s motivation in joining the Presto community at this point in time.
“So now the goal here is to bring everybody back together under one foundation,” Borgman says. “I think Ahana probably was trying to capitalize on that confusion or whatever you want to call it. Again, I think there’s nothing there, and I think coming back together sort of shows that. That’s supported by the highest levels of Facebook and we’ve been having these discussions for a long time.”
Borgman says members of the broader Presto community, including representatives at Facebook, are working behind the scenes to reconcile the differences and create a unified Presto community, and that an announcement could come soon.
“That’s definitely what we’re working toward. We’re ironing out the details,” Borgman says. “Quite frankly, none of this would have been a discussion if Ahana hadn’t attempted to make it one, I guess I would say. But certainly, that’s what we’re all building toward. I think you’ll see something along those lines.”
On June 1, the day before Ahana came out of stealth, Starburst actually joined the Presto Foundation, the foundation behind the original PrestoDB version that is backed by Ahana, Mih says.
Mih claims that Starburst has made no effort to participate in the community. Being a member of the Presto Foundation “requires them to participate and merge back PrestoSQL or start to give ownership back to the foundation, because you can’t have a company driven project and a community driven project under [The] Linux [Foundation],” he says.
“It’s not so much as they said they wanted to join and they’re not. They’ve just been absent,” Mih says. “There’s been technical steering committee meetings. There’s contributions happening. But we haven’t seen any contribution to the original PrestoDB project [from Starburst]. We see instead the continued development of the fork.
“We would love it to come together,” Mih continues. “We would be glad. And we hope that happens. But unfortunately there’s still more confusion.”
Starburst asserts that Mih’s claim is false. The company was present at the one technical steering committee meeting that has been held since it joined The Presto Foundation on June 1, and it was there, a company spokesperson says.
The Boston, Massachusetts company also claims that Mih’s characterization of The Presto Software Foundation being controlled by one company or vendor is false. “PrestoSQL is not driven by one company,” the Starburst spokesperson says. “Many companies are active on PrestoSQL, including Netflix, LinkedIn, Lyft, Varada, Qubole, Salesforce, and Treasure Data to name just a few.”
It doesn’t seem likely that Starburst, which announced its own $42 million Series B round of venture capital financing last month, will go along with Ahana and Presto Foundation requests.
“They’re actually not even members of the Presto community,” Borgman says of Ahana, referring to the fact that Ahana doesn’t employ any committers to either Presto projects (which Mih says will change soon). “They’ve never written a line of code. They don’t have any contributors. I think they were trying to be opportunists on the success of Presto.”
Disputes over competing projects are nothing new in open source. Cloudera and Hortonworks were legendary rivals in the Apache Hadoop community before the two companies merged in early 2019. It seems that every popular open source project in the big data ecosystem – from Elasticsearch to MongoDB to Apache Kafka and Apache Cassandra – has its enthusiastic supporters, as well as competitors eager to replicate that success.
Now it appears that it’s Presto turn for open source community drama. As the project increases in popularity and companies invest in Presto deployments, it seems natural that people would naturally gravitate to the project.
Whether or not Presto can survive a split like we’re currently seeing with PrestoDB and PrestoSQL, however, remains to be seen.
Editor’s note: This story was corrected and updated. Starburst has not implemented a dual license, as the story previously stated. Datanami regrets the error.
October 19, 2021
- Snowflake Launches Media Data Cloud
- SolarWinds Introduces Database Mapper and Task Factory
- Tintri Expands VMstore Portfolio of NVMe-based Platforms
- Cockroach Labs Introduces CockroachDB Serverless
- AnalyticsIQ Marketing Data Now Available on AWS Data Exchange
- Query.AI Closes Oversubscribed $15 Million Series A Round
- Couchbase Introduces Couchbase Capella Hosted Database-as-a-Service on AWS
- SambaNova Introduces Enterprise Grade GPT AI-Powered Language Model
- Paradigm4 Launches flexFS for Geospatial Data in the Cloud
October 18, 2021
- Fujitsu Analyzes Japanese Election Data with Foundry from Palantir Technologies
- WANdisco Announces General Availability of LiveData Platform for Azure
- Akridata Joins National Exascale Day Celebrations
October 15, 2021
- Elastic And Optimyze Join Forces to Deliver Continuous Profiling Platform
- Coveo Acquires Qubit
- Aicadium and SambaNova Partner to Bring AI Hardware Solution to Singapore
October 14, 2021
- Kinetica Now Accessible as a Service on Microsoft Azure
- Deloitte Launches CognitiveSpark for Marketing AI Solution
- Alation Acquires Artificial Intelligence Vendor Lyngo Analytics
- WeRide Relies on Alluxio for its Hybrid Cloud Storage Gateway for ML and AI
- FUJI Launches Sustainable Data Storage Initiative
Most Read Features
- Google Cloud Gives Spanner a PostgreSQL Interface
- One on One with Google Cloud Product Director Irina Farooq
- What Is Data Science? A Turing Award Winner Shares His View
- Big Data File Formats Demystified
- We’re In the Moneyball 3.0 Era. Here’s What It Means for Live Sports
- SambaNova Brings Custom Silicon To Bear on High-End AI Workloads
- Who’s Winning In the $17B AIOps and Observability Market
- What’s the Difference Between AI, ML, Deep Learning, and Active Learning?
- Five Real-World Applications for Sports Analytics
- OpenTelemetry Gains Momentum as Observability Standard
- More Features…
Most Read News In Brief
- Data and AI Salaries Continue Upward March, O’Reilly Says
- LinkedIn Open Sources Tech Behind 10,000-Node Hadoop Cluster
- Bigeye Observes $45 Million in Funding
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- Gartner Shuffles the Technology Deck with Latest ‘Hype Cycle’ Report
- Why Is SAS Going Public?
- Feature Stores Emerging as Must-Have Tech for Machine Learning
- Sisu Nabs $62M to Grow Data Analytics Biz
- Logistics Operators Look to Data, Technology for Advantage
- An Interactive Analytics Whiteboard for COVID Times
- More News In Brief…
Most Read This Just In
- TIBCO NOW 2021 Showcases Limitless Power of Data
- Databricks Acquires Low-code/No-code Company to Expand its Lakehouse Platform
- Toloka Launches Data Research Grants, Announces First Eight Recipients
- BriefCam Introduces Video Analytics Enabled on Deep Learning Cameras from Axis Communications
- NetApp to Acquire CloudCheckr and Expand its Spot by NetApp CloudOps Platform
- PrivaceraCloud 4.0 Enables Governed Data Sharing Across the Open Cloud
- Nutanix Cloud Platform to Deliver Strengthened Data Services for Unstructured and Structured Data
- Datatron Awarded U.S. Patent for Methodology for Modeling Machine Learning and Analytics
- OneTrust Enhances First-Party Data Solution to Strengthen Holistic Consent and Preference Management Platform
- Ketch Secures $20 Million in Series A1 Funding, Accelerating its Rapid Growth
- More This Just In…
Sponsored Partner Content
October 27 - October 28
November 29 - December 3
December 6 - December 10San Diego CA United States
February 7, 2022 - February 9, 2022Houston TX United States
June 26, 2022 - June 30, 2022Hollywood FL United States