Will the Presto Community Ever Be United Again?
If you haven’t noticed yet, there’s drama in Presto-land. There are now two versions of the open source SQL query engine, PrestoDB and PrestoSQL, and each of them have different software foundations behind them. The folks involved with both projects say they want to unite behind a single Presto, but there are significant differences in how they get there.
To quickly recap, Presto is a distributed SQL query engine that emerged out of Facebook in 2013. Whereas Apache Hive (also out of Facebook) excelled at batch analytics, Presto’s forte is running ad hoc analytics against an assortment of data stores. Since it’s just a query engine, as opposed to a full database, it excels in federated environments, which means the data to be queried remains where it’s stored (HDFS, S3, PostgreSQL, etc.) and the query is brought to the data.
This description matches both Presto versions that are now at odds with each other. On the one side we have PrestoDB, which is the original version of Presto, as Facebook originally designed it. It is backed by the Presto Foundation, which launched in September 2019 under the auspices of The Linux Foundation. A company called Ahana was recently founded to offer professorial technical support for PrestoDB and develop the PrestoDB ecosystem with support from the Presto Foundation.
On the other side is PrestoSQL, which is a fork of the original PrestoDB tree. PrestoSQL is governed by the Presto Software Foundation, and it’s backed by Starburst, the company that was spun out of Teradata to develop and support PrestoSQL. Starburst employs the three original creators of Presto, who left Facebook to create PrestoSQL, which Starburst’s co-founder says is the community version of Presto.
Datanami recently talked with Starburst and Ahana to get their takes on the situation.
According to Justin Borgman, the co-founder and CEO of Starburst, the three creators of Presto–Martin Traverso, Dain Sundstrom, and David Phillips–decided to leave Facebook in 2018 to ensure that the Presto project was well managed independent of Facebook.
“These three creators really cared about a specific way of how the community would be governed. They wanted to make sure that it was purely meritocratic and that there’s an independent structure outside of Facebook,” Borgman said. “That was the only reason for the split. It wasn’t really technical difference necessarily, other than governance.”
Borgman says there are some technical differences between PrestoSQL and PrestoDB, but they’re not major differences. All three Presto creators are board members at the Presto Software Foundation, a non-profit organization that was created in January 2019.
“Facebook has their own version. That’s really the whole thing,” Borgman said. “They’re obviously at insane scale. They make their own hardware. They’re perfectly capable of running Presto themselves. We try to serve the rest of the market.”
PrestoDB, meanwhile, continues the original version of Presto as it was created at Facebook, according to the co-founders of Ahana, Steven Mih and Dipti Borkar. Mih, who is the CEO of Ahana, says the goal of the Presto Foundation (where he is a board member) is to promote the development of PrestoDB in a transparent and open manner.
“Facebook has a whole set of developers, and so does Uber, that are part of the Presto Foundation,” Mih tells Datanami. “They make up the technical steering committee now and they’re driving that forward in an open way. They don’t help each other right now [under PrestoSQL and the Presto Software Foundation] and we would like to see that start to work out in that fashion.”
Mih says the Presto Software Foundation lacks the sort of transparency that developers demand of open source projects today. He says he helped ratify the charter of the Presto Foundation, which was created in September 2019, to adhere to the open source guidelines provided by The Linux Foundation.
“What are the goals?” he asks. “It’s pretty simple, three things. It’s having an open, neutral, and unified Presto community.”
In addition to eliminating the confusion over the dueling project names and foundations (not to mention the nearly identical-looking foundation websites), Mih hopes to foster an “open and transparent” community that is not dominated by a single vendor.
By comparison, the Presto Software Foundation is dominated by one company, Starburst, says Mih, who is afraid that Starburst will implement a second license, in addition to the Apache 2.0 license, as Starburst’s investors have done at Confluent.
Mih’s hope is that “there’s not one company or individual who can just say we’re going to change this license,” he says. “They really want to make sure it’s in the community.”
Ahana emerged from stealth in early June with $2.25 million in seed funding from GV (formerly Google Ventures). Last week the company announced its first products: the PrestoDB Amazon Machine Image (AMI) on the Amazon Marketplace and a PrestoDB container on DockerHub. These products are free, and Ahana will provide technical support for a fee.
“These are the only free and completely open source Presto offerings on AWS,” says Borkar, Ahana’s chief product officer, who previously worked at Alluxio with Mih. “In addition to that, we’re also pushing out sandboxes. Presto is essentially the distributed federated query engines. But along with that you need a catalog and a couple of other things. So we’ve bundled the Hive metastore. We’ve bundled a few other data sources, like TPC-DS, etc. as part of the sandbox. It’s very easy for users to get started.”
There is a high level of complexity with PrestoDB, Borkar says, and Ahana is trying to simplify it, but without deviating from the original branch of Presto. “Big data is a very complex environment. Presto was kind of born in that environment and there’s a lot of adjacent components to that, which to put together for end users is still fairly complicated,” she says. “Our mission is to simplify that for users of PrestoDB.”
Starburst’s Borgman doesn’t dispute that Presto can be complex, but he questions Ahana’s motivation in joining the Presto community at this point in time.
“So now the goal here is to bring everybody back together under one foundation,” Borgman says. “I think Ahana probably was trying to capitalize on that confusion or whatever you want to call it. Again, I think there’s nothing there, and I think coming back together sort of shows that. That’s supported by the highest levels of Facebook and we’ve been having these discussions for a long time.”
Borgman says members of the broader Presto community, including representatives at Facebook, are working behind the scenes to reconcile the differences and create a unified Presto community, and that an announcement could come soon.
“That’s definitely what we’re working toward. We’re ironing out the details,” Borgman says. “Quite frankly, none of this would have been a discussion if Ahana hadn’t attempted to make it one, I guess I would say. But certainly, that’s what we’re all building toward. I think you’ll see something along those lines.”
On June 1, the day before Ahana came out of stealth, Starburst actually joined the Presto Foundation, the foundation behind the original PrestoDB version that is backed by Ahana, Mih says.
Mih claims that Starburst has made no effort to participate in the community. Being a member of the Presto Foundation “requires them to participate and merge back PrestoSQL or start to give ownership back to the foundation, because you can’t have a company driven project and a community driven project under [The] Linux [Foundation],” he says.
“It’s not so much as they said they wanted to join and they’re not. They’ve just been absent,” Mih says. “There’s been technical steering committee meetings. There’s contributions happening. But we haven’t seen any contribution to the original PrestoDB project [from Starburst]. We see instead the continued development of the fork.
“We would love it to come together,” Mih continues. “We would be glad. And we hope that happens. But unfortunately there’s still more confusion.”
Starburst asserts that Mih’s claim is false. The company was present at the one technical steering committee meeting that has been held since it joined The Presto Foundation on June 1, and it was there, a company spokesperson says.
The Boston, Massachusetts company also claims that Mih’s characterization of The Presto Software Foundation being controlled by one company or vendor is false. “PrestoSQL is not driven by one company,” the Starburst spokesperson says. “Many companies are active on PrestoSQL, including Netflix, LinkedIn, Lyft, Varada, Qubole, Salesforce, and Treasure Data to name just a few.”
It doesn’t seem likely that Starburst, which announced its own $42 million Series B round of venture capital financing last month, will go along with Ahana and Presto Foundation requests.
“They’re actually not even members of the Presto community,” Borgman says of Ahana, referring to the fact that Ahana doesn’t employ any committers to either Presto projects (which Mih says will change soon). “They’ve never written a line of code. They don’t have any contributors. I think they were trying to be opportunists on the success of Presto.”
Disputes over competing projects are nothing new in open source. Cloudera and Hortonworks were legendary rivals in the Apache Hadoop community before the two companies merged in early 2019. It seems that every popular open source project in the big data ecosystem – from Elasticsearch to MongoDB to Apache Kafka and Apache Cassandra – has its enthusiastic supporters, as well as competitors eager to replicate that success.
Now it appears that it’s Presto turn for open source community drama. As the project increases in popularity and companies invest in Presto deployments, it seems natural that people would naturally gravitate to the project.
Whether or not Presto can survive a split like we’re currently seeing with PrestoDB and PrestoSQL, however, remains to be seen.
Editor’s note: This story was corrected and updated. Starburst has not implemented a dual license, as the story previously stated. Datanami regrets the error.
April 9, 2021
- Alteryx Global Inspire 2021 Conference to Showcase New Products in Analytics and Data Science
- Trifacta Announces New Community and Certification Programs for Data Workers
- Gary Hagmueller Joins Dgraph Labs as CEO
- Elastic and Confluent Partner to Develop Enhanced Experience for Kafka and Elasticsearch Users
- New UK Fellowship Programme Will Fund 6 Fellows in AI and Data Science to Support Life Science Research
April 8, 2021
- Full Agenda Released for TigerGraph’s Graph + AI Summit 2021
- AWS Announces General Availability of Amazon Lookout for Equipment
- Collibra and EVO Banco Boost Data-Driven Digital Banking
- Swarm64 and EDB Partner to Extend EDB Postgres Advanced with Faster Query Performance
- Alation Delivers Cloud-Based Platform for Data Intelligence
- Comet Raises $13M Series A for Model Development and Monitoring
- Graphistry and Pavilion Partner to Accelerate Graph Analytics Using RAPIDS and NVIDIA GPUs
- New Services in SAP HANA Cloud Lower TCO in Data-Intensive, Highly Regulated Industries
- Snorkel AI Launches Application Studio, Raises $35M Led by Lightspeed Venture Partners
- Streamlit Transforms How Data Scientists Share Data, Raises $35M Led by Sequoia
April 7, 2021
- OmniSci Announces Opportunities to Accelerate Geospatial, GPU Database Knowledge At GTC 2021
- MinIO Enables IT to Manage Kubernetes-native Object Storage
- Ataccama Report: Nearly 8 in 10 Businesses Struggle with Data Quality
- MANTA Partners with Neo4j to Provide Enhanced Graph Technology for Data Pipeline Analysis
- Qlik Collaborates with AWS to Accelerate Cloud Analytics with SAP Data
Most Read Features
- Big Data File Formats Demystified
- Synthetic Data: Sometimes Better Than the Real Thing
- Experts Disagree on the Utility of Large Language Models
- A ‘Glut’ of Innovation Spotted in Data Science and ML Platforms
- He Couldn’t Beat Teradata. Now He’s Its CEO
- Is Python Strangling R to Death?
- Why Data Science Is Still a Top Job
- Who’s Winning In the $17B AIOps and Observability Market
- Big Data Predictions: What 2020 Will Bring
- What’s the Difference Between AI, ML, Deep Learning, and Active Learning?
- More Features…
Most Read News In Brief
- Can PHP Script a Big Data Comeback?
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- AWS Adds Explainability to SageMaker
- Global DataSphere to Hit 175 Zettabytes by 2025, IDC Says
- AWS Tackles Real-Time Data Transformation with S3 Object Lambda
- The AI Inside NASA’s Latest Mars Rover, Perseverance
- The Union of Salesforce, Tableau Yields Hybrid ‘Business Science’
- Databricks Edges Closer to IPO with $1B Round
- Data Salaries Get a COVID Bump
- Informatica Accelerates DataOps with Spark, GPUs
- More News In Brief…
Most Read This Just In
- Moody’s Analytics Wins Award for Best Use of AI in Banking or FinTech
- Aiven Raises $100M Series C to Expand Global Open Source Innovation
- Alluxio Advances Analytics and AI with NVIDIA Accelerated Computing
- GrafanaCONline Returns June 7-17, CFP Is Open Now
- AWS Announced Strategic Partnership with Hugging Face NLP Startup
- y42 Raises $2.9M to Provide a Scalable and Affordable Data Stack to Companies of All Sizes
- ThoughtSpot Acquires SeekWell to Operationalize Analytics, Push Cloud Data Insights to Business Apps
- Trifacta Announces Industry’s First Data Engineering Cloud
- PrivaceraCloud’s New Integrations Automate Identity Management Across Multi and Hybrid-Cloud Environments
- New Tool Increases Transparency and Understanding into Machine Behaviors
- More This Just In…