Will the Presto Community Ever Be United Again?
If you haven’t noticed yet, there’s drama in Presto-land. There are now two versions of the open source SQL query engine, PrestoDB and PrestoSQL, and each of them have different software foundations behind them. The folks involved with both projects say they want to unite behind a single Presto, but there are significant differences in how they get there.
To quickly recap, Presto is a distributed SQL query engine that emerged out of Facebook in 2013. Whereas Apache Hive (also out of Facebook) excelled at batch analytics, Presto’s forte is running ad hoc analytics against an assortment of data stores. Since it’s just a query engine, as opposed to a full database, it excels in federated environments, which means the data to be queried remains where it’s stored (HDFS, S3, PostgreSQL, etc.) and the query is brought to the data.
This description matches both Presto versions that are now at odds with each other. On the one side we have PrestoDB, which is the original version of Presto, as Facebook originally designed it. It is backed by the Presto Foundation, which launched in September 2019 under the auspices of The Linux Foundation. A company called Ahana was recently founded to offer professorial technical support for PrestoDB and develop the PrestoDB ecosystem with support from the Presto Foundation.
On the other side is PrestoSQL, which is a fork of the original PrestoDB tree. PrestoSQL is governed by the Presto Software Foundation, and it’s backed by Starburst, the company that was spun out of Teradata to develop and support PrestoSQL. Starburst employs the three original creators of Presto, who left Facebook to create PrestoSQL, which Starburst’s co-founder says is the community version of Presto.
Datanami recently talked with Starburst and Ahana to get their takes on the situation.
According to Justin Borgman, the co-founder and CEO of Starburst, the three creators of Presto–Martin Traverso, Dain Sundstrom, and David Phillips–decided to leave Facebook in 2018 to ensure that the Presto project was well managed independent of Facebook.
“These three creators really cared about a specific way of how the community would be governed. They wanted to make sure that it was purely meritocratic and that there’s an independent structure outside of Facebook,” Borgman said. “That was the only reason for the split. It wasn’t really technical difference necessarily, other than governance.”
Borgman says there are some technical differences between PrestoSQL and PrestoDB, but they’re not major differences. All three Presto creators are board members at the Presto Software Foundation, a non-profit organization that was created in January 2019.
“Facebook has their own version. That’s really the whole thing,” Borgman said. “They’re obviously at insane scale. They make their own hardware. They’re perfectly capable of running Presto themselves. We try to serve the rest of the market.”
PrestoDB, meanwhile, continues the original version of Presto as it was created at Facebook, according to the co-founders of Ahana, Steven Mih and Dipti Borkar. Mih, who is the CEO of Ahana, says the goal of the Presto Foundation (where he is a board member) is to promote the development of PrestoDB in a transparent and open manner.
“Facebook has a whole set of developers, and so does Uber, that are part of the Presto Foundation,” Mih tells Datanami. “They make up the technical steering committee now and they’re driving that forward in an open way. They don’t help each other right now [under PrestoSQL and the Presto Software Foundation] and we would like to see that start to work out in that fashion.”
Mih says the Presto Software Foundation lacks the sort of transparency that developers demand of open source projects today. He says he helped ratify the charter of the Presto Foundation, which was created in September 2019, to adhere to the open source guidelines provided by The Linux Foundation.
“What are the goals?” he asks. “It’s pretty simple, three things. It’s having an open, neutral, and unified Presto community.”
In addition to eliminating the confusion over the dueling project names and foundations (not to mention the nearly identical-looking foundation websites), Mih hopes to foster an “open and transparent” community that is not dominated by a single vendor.
By comparison, the Presto Software Foundation is dominated by one company, Starburst, says Mih, who is afraid that Starburst will implement a second license, in addition to the Apache 2.0 license, as Starburst’s investors have done at Confluent.
Mih’s hope is that “there’s not one company or individual who can just say we’re going to change this license,” he says. “They really want to make sure it’s in the community.”
Ahana emerged from stealth in early June with $2.25 million in seed funding from GV (formerly Google Ventures). Last week the company announced its first products: the PrestoDB Amazon Machine Image (AMI) on the Amazon Marketplace and a PrestoDB container on DockerHub. These products are free, and Ahana will provide technical support for a fee.
“These are the only free and completely open source Presto offerings on AWS,” says Borkar, Ahana’s chief product officer, who previously worked at Alluxio with Mih. “In addition to that, we’re also pushing out sandboxes. Presto is essentially the distributed federated query engines. But along with that you need a catalog and a couple of other things. So we’ve bundled the Hive metastore. We’ve bundled a few other data sources, like TPC-DS, etc. as part of the sandbox. It’s very easy for users to get started.”
There is a high level of complexity with PrestoDB, Borkar says, and Ahana is trying to simplify it, but without deviating from the original branch of Presto. “Big data is a very complex environment. Presto was kind of born in that environment and there’s a lot of adjacent components to that, which to put together for end users is still fairly complicated,” she says. “Our mission is to simplify that for users of PrestoDB.”
Starburst’s Borgman doesn’t dispute that Presto can be complex, but he questions Ahana’s motivation in joining the Presto community at this point in time.
“So now the goal here is to bring everybody back together under one foundation,” Borgman says. “I think Ahana probably was trying to capitalize on that confusion or whatever you want to call it. Again, I think there’s nothing there, and I think coming back together sort of shows that. That’s supported by the highest levels of Facebook and we’ve been having these discussions for a long time.”
Borgman says members of the broader Presto community, including representatives at Facebook, are working behind the scenes to reconcile the differences and create a unified Presto community, and that an announcement could come soon.
“That’s definitely what we’re working toward. We’re ironing out the details,” Borgman says. “Quite frankly, none of this would have been a discussion if Ahana hadn’t attempted to make it one, I guess I would say. But certainly, that’s what we’re all building toward. I think you’ll see something along those lines.”
On June 1, the day before Ahana came out of stealth, Starburst actually joined the Presto Foundation, the foundation behind the original PrestoDB version that is backed by Ahana, Mih says.
Mih claims that Starburst has made no effort to participate in the community. Being a member of the Presto Foundation “requires them to participate and merge back PrestoSQL or start to give ownership back to the foundation, because you can’t have a company driven project and a community driven project under [The] Linux [Foundation],” he says.
“It’s not so much as they said they wanted to join and they’re not. They’ve just been absent,” Mih says. “There’s been technical steering committee meetings. There’s contributions happening. But we haven’t seen any contribution to the original PrestoDB project [from Starburst]. We see instead the continued development of the fork.
“We would love it to come together,” Mih continues. “We would be glad. And we hope that happens. But unfortunately there’s still more confusion.”
Starburst asserts that Mih’s claim is false. The company was present at the one technical steering committee meeting that has been held since it joined The Presto Foundation on June 1, and it was there, a company spokesperson says.
The Boston, Massachusetts company also claims that Mih’s characterization of The Presto Software Foundation being controlled by one company or vendor is false. “PrestoSQL is not driven by one company,” the Starburst spokesperson says. “Many companies are active on PrestoSQL, including Netflix, LinkedIn, Lyft, Varada, Qubole, Salesforce, and Treasure Data to name just a few.”
It doesn’t seem likely that Starburst, which announced its own $42 million Series B round of venture capital financing last month, will go along with Ahana and Presto Foundation requests.
“They’re actually not even members of the Presto community,” Borgman says of Ahana, referring to the fact that Ahana doesn’t employ any committers to either Presto projects (which Mih says will change soon). “They’ve never written a line of code. They don’t have any contributors. I think they were trying to be opportunists on the success of Presto.”
Disputes over competing projects are nothing new in open source. Cloudera and Hortonworks were legendary rivals in the Apache Hadoop community before the two companies merged in early 2019. It seems that every popular open source project in the big data ecosystem – from Elasticsearch to MongoDB to Apache Kafka and Apache Cassandra – has its enthusiastic supporters, as well as competitors eager to replicate that success.
Now it appears that it’s Presto turn for open source community drama. As the project increases in popularity and companies invest in Presto deployments, it seems natural that people would naturally gravitate to the project.
Whether or not Presto can survive a split like we’re currently seeing with PrestoDB and PrestoSQL, however, remains to be seen.
Editor’s note: This story was corrected and updated. Starburst has not implemented a dual license, as the story previously stated. Datanami regrets the error.
August 7, 2020
- Sumo Logic Expands its Observability Suite with Added Solutions
- Google Cloud Delivers Enhancements to Looker that Optimize Performance, Accelerate Application Development
- Terbium Labs and DarkOwl Announce Partnership
- Mode Analytics Raises $33M in Series D Funding, Led by H.I.G. Growth Partners
August 6, 2020
- Online Applied Data Analytics Program Focuses on Data Decision-Making for Working Professionals
- Informatica and Google Cloud Expand Strategic Partnership with Deeper Integrations
- Swarm64 Announces Strategic Partnership with Command Prompt
- Confluent Launches Confluent Cloud in All Three Major Cloud Marketplaces
- HPE, SAP Partner to Deliver SAP Hana Enterprise Cloud with HPE Greenlake Cloud Services
- Zencity Raises $13.5M in Funding
- NIH Harnesses AI for COVID-19 Diagnosis, Treatment, and Monitoring
August 5, 2020
- Isima Introduces Self-Service, Hyper-Converged Data Platform
- Brussels Hospital Manages COVID-19 Outbreak with Qlik
- Jupiter Announces Launch of ClimateScore Global
- Research: 83% of IT Leaders are Not Fully Satisfied with their Data Warehousing Initiatives
- The NLP Summit 2020 Program Announced
August 4, 2020
- Yellowbrick Data and Emtec Enter Partnership
- cnvrg.io AI OS Delivers Accelerated ML Workloads with Support of NVIDIA A100 Multi-Instance GPU
- Domo Enhances its COVID-19 Global Tracker with Google and Apple Mobility Trends
- Quantum ActiveScale Software Verified as Veeam Ready Object Solution
Most Read Features
- Big Data File Formats Demystified
- How to Build a Better Machine Learning Pipeline
- Big Data Apps Wasting Billions in the Cloud
- Is Python Strangling R to Death?
- How COVID-19 Is Impacting the Market for Data Jobs
- To Centralize or Not to Centralize Your Data–That Is the Question
- Is Hadoop Officially Dead?
- What’s the Difference Between AI, ML, Deep Learning, and Active Learning?
- Understanding Your Options for Stream Processing Frameworks
- Tracking the Spread of Coronavirus with Graph Databases
- More Features…
Most Read News In Brief
- Researchers Explore Link Between American Individualism and Poor COVID-19 Response
- Left for Dead, R Surges Again
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- Why Gartner Dropped Big Data Off the Hype Curve
- HPE Acquires MapR
- Global DataSphere to Hit 175 Zettabytes by 2025, IDC Says
- Kepler AutoML Targets Next-Gen Business Analysts
- Gartner: Augmented Analytics Ready for Prime Time
- Spark 3.0 Brings Big SQL Speed-Up, Better Python Hooks
- War Unfolding for Control of Elasticsearch
- More News In Brief…
Most Read This Just In
- UBS Launches Big Data Shareholder Activism Tool
- FortressIQ Launches Adaptive Computer Vision-Based Firewall for Data Privacy
- Cloudera Foundation Announces Grant Partnership with Urban Institute
- Orange and Google Cloud to Form Partnership in Data, AI and Edge Computing Services
- Syniti Acquires Virtyx Technologies
- KNIME Analytics Platform 4.2 is Now Available
- Hazelcast, Sorint Expand Partnership to Address In-Memory Computing Adoption
- Privacera Raises $13.5M in Series A Funding
- MariaDB Platform X5 Adds New Distributed SQL
- TileDB Closes $15M Series A to Expand its First Universal Data Engine
- More This Just In…