Follow Datanami:
June 12, 2024

Databricks Open Sources Unity Catalog, Creating the Industry’s Only Universal Catalog for Data and AI

SAN FRANCISCO, June 12, 2024 — Databricks, the Data and AI company, today announced that it is open sourcing Unity Catalog, the industry’s only unified solution for data and artificial intelligence (AI) governance across clouds, data formats and data platforms.

This initiative builds on Databricks’ commitment to open ecosystems, ensuring customers have the flexibility and control they need without vendor lock-in. Databricks is ushering in a new era for open catalog standards for data and AI with support from Amazon Web Services (AWS), Google Cloud, Microsoft, NVIDIA, Salesforce, and more.

Unity Catalog OSS offers a universal interface that supports any data format and compute engine, including the ability to read tables with Delta Lake, Apache Iceberg, and Apache Hudi clients via Delta Lake UniForm. It also supports the Iceberg REST Catalog and Hive Metastore (HMS) interface standards. Additionally, Unity Catalog OSS provides for unified governance across tabular, non-tabular data, and AI assets, such as machine learning (ML) models and generative AI tools, letting organizations simplify management at scale.

Unity Catalog: The Leading Data and AI Catalog

Databricks introduced Unity Catalog in 2021 to meet customer demand: organizations need an interoperable catalog for their data and AI workloads. Historically, organizations relied on multiple different single-purpose solutions, creating silos between platforms and between data and AI assets. These silos made it difficult to build modern data and AI applications, which combine tabular data in multiple table formats, unstructured data, ML models, vector indices, and AI tools.

Customers created complex webs to manage metadata silos, copied data into different places or different formats to enable access by various engines, or maintained DIY solutions to sync metadata between catalogs. Ultimately, this led to increased costs and complexity, as well as weak governance and fragmented access control. Unity Catalog breaks down those silos for over 10,000 organizations.

“Our customers love Unity Catalog. It lets them manage all their data objects — tabular data, unstructured data, and AI and ML assets — in a single source of truth within the Databricks Data Intelligence Platform, versus gluing together multiple single-purpose solutions,” said Ali Ghodsi, Co-founder and CEO at Databricks. “Our platform is the only major data platform in the industry where all data is in an open format by default — now, metadata and governance are open as well, giving enterprises the governance solution they need in today’s data and AI landscape. We’re excited to open source Unity Catalog and release the code. We’ll continue to evolve the open standard in close collaboration with our partners.”

Unity Catalog OSS is the industry’s only universal catalog for data and AI. Key features include:

  • Interoperability: Unity Catalog OSS offers a universal interface that supports any data format and compute engine, including the ability to read tables with Delta Lake, Apache Iceberg, and Apache Hudi clients via Delta Lake UniForm. It also supports the Iceberg REST Catalog and Hive Metastore (HMS) interface standards. Unity Catalog OSS is interoperable with all major cloud platforms, including Microsoft Azure, AWS, GCP, and Salesforce; compute engines like Apache Spark, Presto, Trino, DuckDB, Daft, PuppyGraph, and StarRocks; and data and AI platforms including dbt Labs, Confluent, Eventual, Fivetran, Granica, Immuta, Informatica, LanceDB, LangChain, Tecton, and Unstructured.
  • Unified governance: Unity Catalog OSS enables unified governance across tabular data, non-tabular data, and AI assets, such as ML models and generative AI tools, letting organizations simplify management, discovery and development at scale.
  • Openness: With its open APIs and Apache 2.0 licensed open source server, Unity Catalog OSS maximizes flexibility and customer choice by enabling broad interoperability across various engines, tools, and platforms.

“AT&T is committed to making our data interoperable with our platforms. With the announcement of Unity Catalog’s open sourcing, we are encouraged by Databricks’ step to make lakehouse governance and metadata management possible through open standards. The flexibility to utilize interoperable tools with our data and AI assets, with consistent governance, is core to the AT&T data platform strategy,” said Matt Dugan, VP Data Platforms, AT&T.

“Nasdaq is proud to leverage Databricks’ Unity Catalog as part of our holistic data management strategy,” said Lenny Rosenfeld, Vice President, Capital Access Platforms, Nasdaq. “Databricks’ decision to open source Unity Catalog provides a solution that helps eliminate data silos and we look forward to further scaling our platform, enhancing our governance, and modernizing our data applications as we continue to deliver for our clients.”

“At Rivian, the adoption of the Databricks Data Intelligence Platform has given us the ability to use Data and AI in building our next-gen EAVs. We are excited about Databricks open sourcing Unity Catalog and releasing Open APIs to bring interoperability across our data landscape without any concerns of vendor lock-in. Combined with support for all our data assets — structured and unstructured data, ML models, and Gen AI tools — it was an easy decision to standardize on Unity Catalog,” said Jason Shiverick, Director of AI Platforms, Rivian.

Availability

Unity Catalog OSS will be available at the Data + AI Summit.

About Databricks

Databricks is the Data and AI company. More than 10,000 organizations worldwide — including Block, Comcast, Condé Nast, Rivian, Shell and over 60% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to take control of their data and put it to work with AI. Databricks is headquartered in San Francisco, with offices around the globe, and was founded by the original creators of Lakehouse, Apache Spark, Delta Lake and MLflow.


Source: Databricks

Datanami