The Delta Lake Project Turns to Linux Foundation to Become the Open Standard for Data Lakes
AMSTERDAM and SAN FRANCISCO, Oct. 16, 2019 — The Linux Foundation, the nonprofit organization enabling mass innovation through open source, today announced that it will host Delta Lake, a project focusing on improving the reliability, quality and performance of data lakes. Delta Lake, announced by Databricks earlier this year, has been adopted by thousands of organizations and has a thriving ecosystem of supporters, including Intel, Alibaba and Booz Allen Hamilton. To further drive adoption and contributions, Delta Lake will become a Linux Foundation project and use an open governance model.
Every organization aspires to get more value from data through data science, machine learning and analytics, but they are massively hindered by the lack of data reliability within data lakes. Delta Lake addresses data reliability challenges by making transactions ACID compliant enabling concurrent reads and writes. Its schema enforcement capability helps to ensure that the data lake is free of corrupt and not-conformant data. Since its launch in October 2017, Delta Lake has been adopted by over 4,000 organizations and processes over two exabytes of data each month.
“Bringing Delta Lake under the neutral home of the Linux Foundation will help the open source community dependent on the project develop the technology addressing how big data is stored and processed, both on-prem and in the cloud,” said Michael Dolan, VP of Strategic Programs at the Linux Foundation. “The Linux Foundation helps open source communities leverage an open governance model to enable broad industry contribution and consensus building, which will improve the state of the art for data storage and reliability.”
Databricks’ cofounders are the original creators of the open source Apache Spark project, the unified analytics engine that has become the defacto standard for large-scale data processing. Databricks’ CEO and cofounder Ali Ghodsi expressed excitement in going through this journey again with the Delta Lake project. “Our team has continued to create and contribute to open source projects because we know it is the fastest, most comprehensive way to innovate. To address organizations’ data challenges we want to ensure this project is open source in the truest form. Through the strength of the Linux Foundation community and contributions, we’re confident that Delta Lake will quickly become the standard for data storage in data lakes.”
Delta Lake will have an open governance model that encourages participation and technical contribution and will provide a framework for long-term stewardship by an ecosystem invested in Delta Lake’s success.
Although initially designed to work with Apache Spark, Delta Lake has developed a thriving community which is adding support for other open source data systems.
“As a major cloud provider, Alibaba has been a leader, contributor, consumer, and supporter for various open source initiatives, especially in the big data and AI area. We have been working with Databricks on a native Hive connector for Delta Lake on the open source front, and we are thrilled to see the project joining the Linux Foundation. We will continue to foster and contribute to the open source community,” said Yangqing Jia, VP of Big Data / AI at Alibaba.
“Intel and Databricks have a long history of working together to advance Apache Spark technology with innovative data analytics and AI solutions and to enable enterprise readiness. Databricks Delta Lake contribution to the Linux Foundation is an important open source storage technology that can help the ecosystem improve reliability for data lakes. We look forward to joining in the Delta Lake project and continuing our collaboration with Databricks and the Apache community,” said Wei Li, Vice President, Intel Architecture, Graphics and Software and General Manager, Machine Learning Performance.
“The Starburst team is excited about the development of Delta Lake and have already developed a native connector for Presto that is currently in beta testing. We believe this will enable companies creating or migrating their data lakes to the cloud the ability to finally realize the value that they were prom ised years ago and perform interactive SQL analytics on data lakes directly,” said Justin Borgman, CEO, Starburst.
“Booz Allen Hamilton is very excited about the potential of Delta Lake technology, especially its promise to provide an open, scalable data platform to enable a broad range of analytics – SQL analytics that powers reporting and dashboarding to data science and machine learning with R & Python. We are looking forward to making significant contributions to the Delta Lake project. We are starting with native integrations of Apache Nifi with Delta Lake,” said Dan Tucker, VP at Booz Allen Hamilton.
About The Linux Foundation
Founded in 2000, the Linux Foundation is supported by more than 1,000 members and is the world’s leading home for collaboration on open source software, open standards, open data, and open hardware. Linux Foundation’s projects are critical to the world’s infrastructure including Linux, Kubernetes, Node.js, and more. The Linux Foundation’s methodology focuses on leveraging best practices and addressing the needs of contributors, users and solution providers to create sustainable models for open collaboration. For more information, please visit us at linuxfoundation.org.
Source: The Linux Foundation