Will the Presto Community Ever Be United Again?
If you haven’t noticed yet, there’s drama in Presto-land. There are now two versions of the open source SQL query engine, PrestoDB and PrestoSQL, and each of them have different software foundations behind them. The folks involved with both projects say they want to unite behind a single Presto, but there are significant differences in how they get there.
To quickly recap, Presto is a distributed SQL query engine that emerged out of Facebook in 2013. Whereas Apache Hive (also out of Facebook) excelled at batch analytics, Presto’s forte is running ad hoc analytics against an assortment of data stores. Since it’s just a query engine, as opposed to a full database, it excels in federated environments, which means the data to be queried remains where it’s stored (HDFS, S3, PostgreSQL, etc.) and the query is brought to the data.
This description matches both Presto versions that are now at odds with each other. On the one side we have PrestoDB, which is the original version of Presto, as Facebook originally designed it. It is backed by the Presto Foundation, which launched in September 2019 under the auspices of The Linux Foundation. A company called Ahana was recently founded to offer professorial technical support for PrestoDB and develop the PrestoDB ecosystem with support from the Presto Foundation.
On the other side is PrestoSQL, which is a fork of the original PrestoDB tree. PrestoSQL is governed by the Presto Software Foundation, and it’s backed by Starburst, the company that was spun out of Teradata to develop and support PrestoSQL. Starburst employs the three original creators of Presto, who left Facebook to create PrestoSQL, which Starburst’s co-founder says is the community version of Presto.
Datanami recently talked with Starburst and Ahana to get their takes on the situation.
According to Justin Borgman, the co-founder and CEO of Starburst, the three creators of Presto–Martin Traverso, Dain Sundstrom, and David Phillips–decided to leave Facebook in 2018 to ensure that the Presto project was well managed independent of Facebook.
“These three creators really cared about a specific way of how the community would be governed. They wanted to make sure that it was purely meritocratic and that there’s an independent structure outside of Facebook,” Borgman said. “That was the only reason for the split. It wasn’t really technical difference necessarily, other than governance.”
Borgman says there are some technical differences between PrestoSQL and PrestoDB, but they’re not major differences. All three Presto creators are board members at the Presto Software Foundation, a non-profit organization that was created in January 2019.
“Facebook has their own version. That’s really the whole thing,” Borgman said. “They’re obviously at insane scale. They make their own hardware. They’re perfectly capable of running Presto themselves. We try to serve the rest of the market.”
PrestoDB, meanwhile, continues the original version of Presto as it was created at Facebook, according to the co-founders of Ahana, Steven Mih and Dipti Borkar. Mih, who is the CEO of Ahana, says the goal of the Presto Foundation (where he is a board member) is to promote the development of PrestoDB in a transparent and open manner.
“Facebook has a whole set of developers, and so does Uber, that are part of the Presto Foundation,” Mih tells Datanami. “They make up the technical steering committee now and they’re driving that forward in an open way. They don’t help each other right now [under PrestoSQL and the Presto Software Foundation] and we would like to see that start to work out in that fashion.”
Mih says the Presto Software Foundation lacks the sort of transparency that developers demand of open source projects today. He says he helped ratify the charter of the Presto Foundation, which was created in September 2019, to adhere to the open source guidelines provided by The Linux Foundation.
“What are the goals?” he asks. “It’s pretty simple, three things. It’s having an open, neutral, and unified Presto community.”
In addition to eliminating the confusion over the dueling project names and foundations (not to mention the nearly identical-looking foundation websites), Mih hopes to foster an “open and transparent” community that is not dominated by a single vendor.
By comparison, the Presto Software Foundation is dominated by one company, Starburst, says Mih, who is afraid that Starburst will implement a second license, in addition to the Apache 2.0 license, as Starburst’s investors have done at Confluent.
Mih’s hope is that “there’s not one company or individual who can just say we’re going to change this license,” he says. “They really want to make sure it’s in the community.”
Ahana emerged from stealth in early June with $2.25 million in seed funding from GV (formerly Google Ventures). Last week the company announced its first products: the PrestoDB Amazon Machine Image (AMI) on the Amazon Marketplace and a PrestoDB container on DockerHub. These products are free, and Ahana will provide technical support for a fee.
“These are the only free and completely open source Presto offerings on AWS,” says Borkar, Ahana’s chief product officer, who previously worked at Alluxio with Mih. “In addition to that, we’re also pushing out sandboxes. Presto is essentially the distributed federated query engines. But along with that you need a catalog and a couple of other things. So we’ve bundled the Hive metastore. We’ve bundled a few other data sources, like TPC-DS, etc. as part of the sandbox. It’s very easy for users to get started.”
There is a high level of complexity with PrestoDB, Borkar says, and Ahana is trying to simplify it, but without deviating from the original branch of Presto. “Big data is a very complex environment. Presto was kind of born in that environment and there’s a lot of adjacent components to that, which to put together for end users is still fairly complicated,” she says. “Our mission is to simplify that for users of PrestoDB.”
Starburst’s Borgman doesn’t dispute that Presto can be complex, but he questions Ahana’s motivation in joining the Presto community at this point in time.
“So now the goal here is to bring everybody back together under one foundation,” Borgman says. “I think Ahana probably was trying to capitalize on that confusion or whatever you want to call it. Again, I think there’s nothing there, and I think coming back together sort of shows that. That’s supported by the highest levels of Facebook and we’ve been having these discussions for a long time.”
Borgman says members of the broader Presto community, including representatives at Facebook, are working behind the scenes to reconcile the differences and create a unified Presto community, and that an announcement could come soon.
“That’s definitely what we’re working toward. We’re ironing out the details,” Borgman says. “Quite frankly, none of this would have been a discussion if Ahana hadn’t attempted to make it one, I guess I would say. But certainly, that’s what we’re all building toward. I think you’ll see something along those lines.”
On June 1, the day before Ahana came out of stealth, Starburst actually joined the Presto Foundation, the foundation behind the original PrestoDB version that is backed by Ahana, Mih says.
Mih claims that Starburst has made no effort to participate in the community. Being a member of the Presto Foundation “requires them to participate and merge back PrestoSQL or start to give ownership back to the foundation, because you can’t have a company driven project and a community driven project under [The] Linux [Foundation],” he says.
“It’s not so much as they said they wanted to join and they’re not. They’ve just been absent,” Mih says. “There’s been technical steering committee meetings. There’s contributions happening. But we haven’t seen any contribution to the original PrestoDB project [from Starburst]. We see instead the continued development of the fork.
“We would love it to come together,” Mih continues. “We would be glad. And we hope that happens. But unfortunately there’s still more confusion.”
Starburst asserts that Mih’s claim is false. The company was present at the one technical steering committee meeting that has been held since it joined The Presto Foundation on June 1, and it was there, a company spokesperson says.
The Boston, Massachusetts company also claims that Mih’s characterization of The Presto Software Foundation being controlled by one company or vendor is false. “PrestoSQL is not driven by one company,” the Starburst spokesperson says. “Many companies are active on PrestoSQL, including Netflix, LinkedIn, Lyft, Varada, Qubole, Salesforce, and Treasure Data to name just a few.”
It doesn’t seem likely that Starburst, which announced its own $42 million Series B round of venture capital financing last month, will go along with Ahana and Presto Foundation requests.
“They’re actually not even members of the Presto community,” Borgman says of Ahana, referring to the fact that Ahana doesn’t employ any committers to either Presto projects (which Mih says will change soon). “They’ve never written a line of code. They don’t have any contributors. I think they were trying to be opportunists on the success of Presto.”
Disputes over competing projects are nothing new in open source. Cloudera and Hortonworks were legendary rivals in the Apache Hadoop community before the two companies merged in early 2019. It seems that every popular open source project in the big data ecosystem – from Elasticsearch to MongoDB to Apache Kafka and Apache Cassandra – has its enthusiastic supporters, as well as competitors eager to replicate that success.
Now it appears that it’s Presto turn for open source community drama. As the project increases in popularity and companies invest in Presto deployments, it seems natural that people would naturally gravitate to the project.
Whether or not Presto can survive a split like we’re currently seeing with PrestoDB and PrestoSQL, however, remains to be seen.
Editor’s note: This story was corrected and updated. Starburst has not implemented a dual license, as the story previously stated. Datanami regrets the error.
October 27, 2020
- Qlik Shines in BARC’s The BI & Analytics Survey 21
- Orion Governance Named a 2020 Gartner Cool Vendor in Graph Technologies
- dotData Announces Enhancement of MLOps Capability with dotData Stream and Amazon SageMaker Integration
- MemSQL Changes Name to SingleStore
- Weka Announces Cloud-native, Unified Storage Solutions for the Entire Data Lifecycle
- Brivo Selects Logi Analytics to Unlock New Enterprise Cybersecurity Capabilities
- MonetDB Solutions Announces the Release of MonetDB/e, an SQL Engine for Embedded Data Analytics
October 26, 2020
- The Linux Foundation’s AI Foundation and ODPi Merge to Drive Open Source Collaboration
- LexisNexis InterAction and Foundation Software Group Partner to Help Firms Improve Business Development
- DNA Behavior Announces Launch of Next Gen Behavior Tech Stack Platform
- Oracle Expands Support for UK Public Sector with Dual-Region Government Cloud
- Millions of Users’ Unencrypted Location Data Being Shared with Twitter-Owned MoPub
- C3.ai, Microsoft and Adobe Combine Forces to Re-invent CRM with AI
October 23, 2020
- GoodData Adds Enhanced Self-Service Tools to Drive Business Intelligence Adoption
- IBM and R3 Collaborate to Expand Blockchain Capabilities and Services Across Hybrid Cloud
- Amperity and Zendesk to Help Brands Offer Customer Personalization
- Quantum Tape Systems Safeguard Scientific Data for British Antarctic Survey
October 22, 2020
- Minitab Launches Launches New Solutions to Help Organizations Accelerate Digital Transformation
- AccuWeather Sponsors Climate Change Machine Learning Research Competition at University of Toronto
- Precisely Delivers First End-to-End Data Integrity Suite for Confident Business Decisions
Most Read Features
- Big Data File Formats Demystified
- Systemic Data Errors Still Plague Presidential Polling
- Do You Need a Chief Data Scientist?
- How to Build a Better Machine Learning Pipeline
- Data Culture ‘Disconnect’ Identified in New Index
- VC Ben Horowitz Dishes on Hadoop, AI, and Data Culture
- How Geospatial Data Drives Insight for Bloomberg Users
- Is Python Strangling R to Death?
- 10 Big Data Statistics That Will Blow Your Mind
- Understanding Your Options for Stream Processing Frameworks
- More Features…
Most Read News In Brief
- Qubole is Latest Acquisition Target
- Testing Data Literacy on Main Street
- Informatica Likes Its Chances in the Cloud
- Pandemic Driving ‘Back to Basics’ in Big Data, Study Suggests
- TigerGraph Offers Free Graph Database for On-Prem Analysis
- Researchers Demonstrate Less-than-One Shot Machine Learning
- AI Startup Uses FPGAs to Speed Training, Inference
- Palantir Looks to Build on Snowflake’s IPO Success
- Splunk Makes a Whirlwind of News at .conf20
- Domo Launches Election Tracker Comparing 2016, 2020 Polling Data
- More News In Brief…
Most Read This Just In
- Datanami Reveals Winners of Fifth Annual Readers’ and Editors’ Choice Awards
- Tableau Launches Free Data Literacy Training Program
- NASA, ICIJ, ATPCO, Lyft and More Choose Neo4j for their Knowledge Graphs
- Hazelcast to Provide Additional Capabilities to IBM Cloud Pak for Multicloud Management
- Fujitsu Enters Strategic Alliance with Palantir Technologies
- Alida Integrates Stratifyd AI-powered Analytics Engine into New CXM Platform
- Collibra Launches New Partner Program
- COVID-19 Info Dashboards Come to the CDC with Georgia Tech Help
- KNIME and H2O.ai Accelerate and Simplify End-to-end Data Science Automation
- Data Science Professor Receives $1.25 Million Grant from Department of Defense
- More This Just In…