Follow Datanami:
May 4, 2023

Databricks Enhances Lakehouse Governance with Okera Acquisition and Immuta Investment


Databricks is bolstering the data governance capabilities of its lakehouse platform with a new acquisition and investment. The company has entered into a definitive agreement to acquire the data governance firm Okera for an undisclosed amount. Additionally, its investment arm, Databricks Ventures, announced it has invested in data security specialist Immuta.

Okera solves data privacy and governance challenges across the spectrum of data and AI. It simplifies data visibility and transparency, helping organizations understand their data, which is essential in the age of LLMs and to address concerns about their biases,” the company wrote in a blog post.

The current AI gold rush is creating governance challenges. Databricks says data governance technologies have traditionally relied on enforcing control at a narrow layer, such as in SQL-based access control for cloud data warehouses. Efficiency depends on workloads fitting into this “walled garden,” and the rise of machine learning and LLMs is making this approach insufficient, says Databricks. For AI-specific governance concerns like provenance and bias, these traditional data governance platforms fall short.

“First, the number of data assets an enterprise has to govern increases exponentially because many data sources used in AI are machine-generated instead of human-generated. Second, given the rapid pace of development of the AI landscape, no single company is capable of creating a walled garden expressive enough to capture the state-of-the-art. A vendor can enforce access control for its own SQL-based data warehouse engine but wouldn’t be able to change every single open source library to make sure they adhere to the particular control of a walled garden,” the company wrote.

AI-enabled automation has significantly decreased the needed time for tasks related to data discovery, classification, and policy writing. Okera’s platform automatically discovers and tags sensitive data. The tags enable the creation of no-code access policies, and a self-service portal allows users to audit sensitive data usage and track data usage patterns. Additionally, Okera has a new isolation technology, currently in private preview with more details coming soon, that can support arbitrary workloads while enforcing governance control without sacrificing performance, asserts Databricks.

Databricks says it will integrate Okera’s AI-centric governance technologies with its Unity Catalog lakehouse governance layer, noting that customers will have a holistic view of their data across clouds and can use a single permission model to define access policies.

Okera was founded in 2016 and has raised over $29 million in funding: “We founded Okera to help modern, data-driven enterprises accelerate legitimate data access while minimizing data security risks and delivering regulatory compliance. As data continues to grow in volume, velocity, and variety across different applications, CIOs, CDOs, and CEOs across the board have to balance those two often conflicting initiatives – not to mention that historically, managing access policies across multiple clouds has been painful and time-consuming,” said Nong Li, co-Founder and CEO of Okera.

Li is known for creating the open source storage format Apache Parquet and is a Databricks alumnus: he led the company’s vectorized Parquet effort and the codegen effort that resulted in Apache Spark 2.0’s 10x performance improvement, according to Databricks.

Circling back to Immuta, Databricks did not disclose the amount it invested in the deal. It seems the company is hedging its bets, as Immuta is a direct competitor to Okera. In a release, Immuta says this investment builds on a longstanding partnership between the two companies and will go towards product innovation to strengthen the integration between both platforms.

Immuta centralizes policy enforcement across interactive clusters and Databricks SQL, CEO Matt Carroll says. (Source: Immuta)

Immuta CEO Matthew Carroll explained in a blog post that Databricks adapted its Unity Catalog compute platform to manage metadata in a single metastore across all workspaces, and Immuta has taken that single metastore and centralized policy enforcement across its interactive clusters and Databricks SQL.

In the latest “GigaOm Radar for Data Governance,” Andrew J. Brust explains that two market factors have reshaped data governance into a facet of data management predominantly focused on access control. He says the first is the need for reduced time to insight from data processes which is driving decreased latency in BI, analytics, and transactional systems. Another factor is higher expectations from customers that their data is safe from breaches and that organizations can reliably and accountably control data access. “Contemporary governance solutions must provide flexible, nuanced, secure access to data in such a way that it is auditable and dependable for all stakeholders—including business users and customers,” he says.

The GigaOm report notes that vendors differ in their approaches to achieving data governance objectives, which offer varying degrees of effectiveness for specific enterprise use cases, roles, and departments. Brust says Okera’s strengths lie in the robust uniformity of the user experience it provides for access management, no-code policy writing, audit logging, deployment flexibility, and query brokering.

Immuta’s strength is in its access controls that operate at both the attribute and role-based level, Brust reports. Like Okera, the company also offers a no-code policy builder aimed at non-technical and technical users alike. Immuta also has policy-as-code support with a command-line interface for DevOps teams.


“Immuta is a trusted data security partner,” said Ali Ghodsi, CEO and co-founder of Databricks in a statement. “Over the last six years, we’ve been successfully collaborating to serve global enterprise customers like ADP, Swedbank, and many others. By integrating directly with Databricks Unity Catalog, Immuta provides a seamless way for our joint customers to protect their data in the Databricks Lakehouse.”

Related Items:

Databricks Bucks the Herd with Dolly, a Slim New LLM You Can Train Yourself

Immuta Raises $100M Series E as the Latest Data Access Unicorn

We’re Still in the ‘Wild West’ When it Comes to Data Governance, StreamSets Says