Follow Datanami:
March 23, 2023

Kobai Rides Atop the Lakehouse with Semantic Graph Engine

(ArtemisDiana/Shutterstock)

Kobai this week announced the launch of Saturn, a new offering designed to bring the power of the knowledge graph to data already stored in lakehouses. By creating a semantic layer that sits atop Snowflake and Databricks lakehouses, customers gain the capability to run SPARQL queries against that data, giving them a powerful new way to glean insights but without the complexity of a full-blown graph database project.

Kobai was founded five years ago by two former software engineers at General Electric, Ryan Oattes and Parag Goradia. The pair struggled to help GE’s industrial customers build data systems that could track and query the state of things–such as all the parts in an airplane–over an extended period of time.

“We were really interested in the kind of complexity that comes when you need 10 experts to solve the really hard problems, such as the life of some part that was designed, manufactured, serviced over 25 years,” says Oattes, Kobai’s CTO. “There are so many people you need to have collaborate around ‘How do we structure data?’ and ‘What are the analytics we want to do on it?’ If you’re handing it off to developers, that just makes it harder.”

An RDF-based knowledge graph–with a triple-store built around a subject, object, and predicate–was an ideal way to organize this sort of information. But instead of starting at the database level and building up, Oattes and Goradia opted to begin at the other end: the user interface. Their first offering, Kobai Studio, had the end-user front and center.

“We don’t start talking about data in our engagements. We start with business users, experts who say ‘Let’s talk about your first five or 10 questions. What are the pieces of your business that you need to know about?’” Oattes tells Datanami. “One of the first employees was a UX designer. How was a user going to interact with this?  How do we make it like two or three guys collaborating around the whiteboard that then becomes actionable?”

Most Kobai implementations leveraged Virtuoso, an open source graph database developed by OpenLink Software (although anything with a SPARQL endpoint will work). While Virtuoso has served many Kobai clients well, it’s based on Postgres and therefore has limited scalability. That has proven to be a dealbreaker for bigger deployments by Kobai’s larger customers.

So in late 2021, the folks at Kobai started developing Saturn. The goal with Saturn essentially is to build a knowledge graph that leverages the scale that Databricks and Snowflake have already developed in their lakehouse offerings.

Saturn basically functions as a virtualization layer between Kobai Studio and the underlying lakehouse from Snowflake or Databricks. This layer takes the SPARQL queries generated by Kobai Studio and translates it into SQL code, which Snowflake and Databricks are expecting. Saturn then returns the query result to the Kobai Studio user, where it can power a dashboard or be used in another way.

“We have some IP around how do we structure a relational schema and maintain it over time in Databricks or Snowflake that is amenable to performant graph queries being run across it,” Oattes says.  “So we are ingesting the data into a schema in Databricks or Snowflake that is organized by Kobai. It’s the Saturn schema within Databricks or Snowflake, and then from there, when you’re running queries against it, you’re running them through sort of a Kobai service on top.”

Kobai was attracted to Snowflake and Databricks for several reasons. The most obvious is that the two lakehouse providers are attracting a considerable amount of data. Thousands of customers are parking their data in these two lakehouses, and investing resources to manage their data there. By implementing a knowledge graph service layer atop the lakehouse, Kobai can leverage those existing customer investments.

“We are writing queries that are optimized for Photon in the Databricks case,” Oattes says, adding that they’ve done the same for Snowflake. “It’s tuned to those environments. So we get to ride on top of that investment that Databricks and Snowflake have made in terms of the investment they have made.”

The lakehouses also provides lower cost of ownership for Kobai customers, as well as workload isolation, which becomes an issue with larger implementations. If a Kobai Studio user kicks off an intensive graph query that will touch the same piece of data that Snowflake or Databricks is querying to power a dashboard, the Kobai user doesn’t have to worry about performance issues that would result when those workloads collide.

“With a lakehosue, we need one copy of the data and I can isolate the compute for both of those use cases,” Oattes says.

By pairing Kobai Studio with Saturn in the cloud, the Pleasanton, California company now has the scale to meet some of its bigger customers’ graph needs. The company’s biggest client is a Fortune 100 aviation conglomerate with the previously mentioned parts-tracking challenge. It’s also working with customers that have computational challenges in financial services and workforce management, Oattes says.

One of Kobai’s customers wants to optimize their supply chain operations not around cost, but around risk, such as environmental, social, and governance (ESG) variables.

“What if I want to bring in not just geopolitical risk, but ESG stuff?” Oattes says. “If I want to be able to make decisions based on ….the human rights record of some of these suppliers? A lot of traditional BI shops are kind of unprepared for the flexibility required to bring that stuff in.”

In addition to bringing scale and workload isolation to Kobai’s knowledge graph customers, it also allow them to expand their knowledge graph out to new portions of their business that haven’t yet benefited from this technology. That has the potential to snare more data into the RDF scheme of things, yielding greater benefits as it grows, Oattes says.

“Instead of using an instance of Kobai for a team or a project…this lets me embrace the network effect of the data fabric, where every use case extends this logical ontology across my business,” Oattes says. “And now I get additional value from having that broader context available for me, training models, dashboards–whatever it is.”

Kobai is starting with Snowflake and Databricks with Saturn, but the company has plans to expand to other lakehouses in the future. For more info, see the company’s website at www.kobai.io.

Related Items:

Data Fabrics: The Killer Use Case for Knowledge Graphs

Databricks Claims 30x Advantage in the Lakehouse, But Does It Hold Water?

Why Young Developers Don’t Get Knowledge Graphs

 

Datanami