Databricks Unleashes New Tools for Gen AI in the Lakehouse
Fresh off its announcement of the acquisition of MosaicML on Monday, Databricks today unleashed a torrent of new AI capabilities at its Data + AI Summit designed to enable its customers to create generative AI applications, including a collection of large language models (LLMs) and new vector search capabilities in LakehouseAI and a natural language interface for data analytics called LakehouseIQ.
Databricks created Lakehouse AI as a way to automate and unify the various steps that developers and operations personnel go through with AI apps, everything from data collection and preparation to model development and LLMOps, as well as serving and monitoring.
For starters, Lakehouse AI will feature a handful of curated open source LLM models offered through the Databricks Marketplace. Among those will be MPT-7B, the 7 billion parameter LLM developed by MosaicML, which Databricks announced on Monday that it’s planning to buy for $1.3 billion (the deal is currently under regulatory review).
Other curated models in Lakehouse AI include Falcon-7B for instruction-following and text summarization, as well as Stable Diffusion for image generation, the company says.
Lakehouse AI also brings vector search, which has emerged as a key capability for LLMs and generative AI models. Databricks says vector search will help customers increase the accuracy of their LLMs by utilizing embeddings. Vector search will be integrated with Databricks’ Unity Catalog.
The company also announced that its Model Serving offering has been adapted to handle LLMs. On the ModelOps front, the company announced that MLflow 2.5 has been updated with LLM capabilities, including AI Gateway, which helps with credential management for protecting access to LLMs, as well as Prompt Tools, which provide visual methods for working with prompts to interact with LLMs. Lakehouse Monitoring, meanwhile, provides ways for customers to keep track of the data and models involved with Gen AI apps.
As part of its Gen AI push, Databricks modified its AutoML offering to provide customers with a low-code method for fine-tuning their own LLMs and training it on their own enterprise data. Model ownership is a critical factor in the current Gen AI and LLM revolution, said Ali Ghodsi, the co-founder and CEO of Databricks.
“Companies want to own their own model,” Ghodsi said during a press conference at Data + AI Summit yesterday. “Every conversation I’m having, the customers are saying I want to control the IP [intellectual property] and I want to lock down my data.”
Vector search and Lakehouse Monitoring are currently in preview.
In a separate announcement, Databricks unveiled LakehouseIQ, a new offering that utilizes a pre-built LLM designed to enable customers to explore and query data they have stored in their Delta Lakehouse.
According to Databricks, LakehouseIQ functions as a knowledge engine that understands specific details about a company by learning it from the company’s assets, including schemas, documents, queries, popularity, lineage, notebooks, and BI dashboards.
“The engine understands their unique business jargon and context to more accurately interpret the intent of the question, and can even generate additional insights that could spur new questions or lines of thinking,” the company says in a press release.
Databricks is focused on democratizing data and AI, and LakehouseIQ fits right into that plan. By enabling people to use natural language to explore and query their data, it lowers the need for folks with advanced analysis and SQL skills. LakehouseIQ plugs into Unity Catalog, providing integrated governance and access control.
“LakehouseIQ solves two of the biggest challenges that businesses face in using AI: getting employees the right data while staying compliant and keeping data private when it should be,” Ghodsi said in a press release. “It alleviates time-strapped engineers, eases the burden of data management, and empowers employees to take advantage of the AI revolution without jeopardizing the company’s proprietary information.”
LakehouseIQ is currently in preview.