Follow Datanami:
June 4, 2024

Databricks Nabs Iceberg-Maker Tabular to Spawn Table Uniformity

(Maksim-Kabakou/Shutterstock)

Databricks today announced the acquisition of Tabular, the commercial outfit behind the Apache Iceberg table format, which competes with Databricks’ own Delta format, paving the way for Databricks customers to enjoy more uniformity and less incompatibilities in their data lakehouse environments. The deal was valued at more than $1 billion, Databricks confirmed.

Open table formats have become the new battleground for control of data lakehouses, those data platforms that blend the scalability and flexibility of data lakes with the ACID transactionality and reliability of traditional data warehouses.

Apache Hudi, Apache Iceberg, and Databricks’ Delta have been locked in a three-way race for dominance among open table formats. Hudi was developed at Uber, while Netflix is mostly credited with the development of Iceberg, along with Apple.

Ryan Blue, who co-created Iceberg with Dan Weeks while at Netflix, co-founded Tabular in 2021 with Weeks and another former Netflix colleague, Jason Reid, to automate data lakehouse management in an Iceberg environment. The company raised $26 million last year as it brought its cloud lakehouse service to market.

Merging the teams behind Iceberg and Delta will deliver benefits to customers in the form of greater choice and fewer incompatibilities, say executives at Databricks, which announced the acquisition today in a blog post.

(rarrarorro/Shutterstock)

“As one, we are going to lead the way with data compatibility so that you are no longer limited by which lakehouse format your data is in,” write Ali Ghodsi, Arsalan Tavakoli-Shiraji, Reynold Xin, and Adam Conway. “We look forward to welcoming the team once the transaction closes and we are excited to work with them towards our joint vision of the open lakehouse.”

The deal was valued at more than $1 billion, Databricks confirmed to Datanami. The deal is expected to be completed by the end of the company’s second quarter, which ends July 31.

Databricks executives explained their rationale for acquiring a company competing with their preferred table format:

“These two projects have emerged as the two leading open source standards for Lakehouse formats. Unfortunately, even though both of these formats are based on Apache Parquet and share similar goals and designs, they became incompatible due to their independent development,” they wrote.

“Over time, a number of other open source and proprietary engines adopted these formats. However, they usually adopted only one of the standards, and more often than not, only part of that standard. This has effectively fragmented and siloed enterprise data, undermining the value of the lakehouse architecture.”

Achieving data interoperability will require the Iceberg and Delta Lake communities coming together, the executives wrote.

“We intend to work closely with the Iceberg and Delta Lake communities to bring interoperability to the formats themselves,” they wrote. “This is a long journey, one that will likely take several years to achieve in those communities. That’s why we introduced Delta Lake UniForm to the world last year.”

Iceberg has emerged as the leading open table format in recent months on the back of strong support from independent software vendors. Among those is Snowflake, which competes directly with Databricks for data analytics and AI workloads. Snowflake today announced general availability of its support for Iceberg tables, but the Databricks-Tabular deal may put a damper on the celebration.

A potential unification of Delta and Iceberg, if it comes to pass, puts Apache Hudi as the lone remaining independent table format. Onehouse, the company behind Hudi, is backing a new open source project called Apache XTable, which is an open interchange format that provides read-write compatibility for Hudi, Delta, and Iceberg, potentially making the differences between the format moot.

Related Items:

Onehouse Breaks Data Catalog Lock-In with More Openness

Tabular Plows Ahead with Iceberg Data Service, $26M Round

Open Table Formats Square Off in Lakehouse Data Smackdown

Editor’s note: This article was corrected. The deal for Tabular will be complete by the end of the second quarter, which ends July 31, not June 30. Datanami regrets the error.

 

Datanami