Databricks: We’re a Data Intelligence Platform Now
Five years ago, Databricks debuted the world’s first data lakehouse, which combined the beneficial aspects of data lakes and data warehouses. Thanks to the rise of AI, the nature of data platforms is changing, which necessitates a new name: the Data Intelligence Platform.
“At Databricks, we believe that soon, AI will eat all software,” Databricks co-founders wrote in a November 15 blog post. “That is, the software built over the past decades will be intelligent, leveraging data, making it much smarter. The implications are vast and varied…”
One of the implications is that the nature of the systems used to manage data and build data-driven applications is changing. While the lakehouse features will undoubtedly remain, the company is laying out its view for how data platforms will evolve. It’s all about AI, the execs say.
“We argue that the impact of AI on data platforms will not be incremental, but fundamental: massively democratizing access to data, automating manual administration, and enabling turnkey creation of custom AI applications,” they wrote in the blog. “All this will be enabled by a new wave of unified platforms that deeply understand an organization’s data. We call this new generation of systems Data Intelligence Platforms.”
Just as the move to a lakehouse architecture was necessary to enable organizations to proceed with workable governance, security, reliability, and management capabilities, the switch to a data intelligence platform is driven by new and emerging requirements. According to Databricks, the barriers include:
- Technical skill barriers, such as the need for SQL and Python skills;
- Data accuracy and curation, which typically requires lots of data curation and planning;
- Management complexity, including the need to rein in cloud costs;
- Governance and privacy, including lineage, security and privacy needs;
- Emerging AI applications, such as generative AI apps, which require new data management skills.
Data intelligence platforms are needed because they provide and mesh with the semantic understanding of data. In addition to providing data lakehouse functions, which provide the “good, better, best” progression for creating and storing trusted data, the data intelligence platform brings more semantics to the equation, including how the data is used and what it’s used for.
To that end, Databricks says the data intelligence platform is marked by the addition of these GenAI and LLM capabilities:
- Natural language access to a customer’s own data, including jargon and acronyms;
- Semantic cataloguing and discovery of the customer’s data;
- Automated management and optimization of the customers data, including its layout, partitioning, and indexing;
- Enhanced governance and privacy by automatically detecting, classifying, and preventing misuse of sensitive data, and simplifying management with natural language;
- And first-class support for AI workloads, by integrating AI applications with business data, eliminating the need to “hack intelligence together through brittle prompt engineering.”
The recent acquisition of MosaicML is a big part of this shift to a data intelligence platform. The company is using the “model factory” software developed by MosaicML to create a “data intelligence engine,” which it calls DatabricksIQ.
The founders note that DatabricksIQ is already used throughout the Databricks platform for things like: automatically creating indexes; improving governance in Unity Catalog; improving the generation of Python and SQL; enhancing the query planning component of its Photon query engine; and bolstering the autoscaling capabilities of Delta live tables and serverless jobs
Looking forward, DatabricksIQ will be paired with its AI platform, which it calls Mosaic AI, to improve the integration of data into AI applications, the company says. To that end, DatabricksIQ will have a hand in things like developing Retrieval Augmented Generation (RAG) capabilities integrated with the Databricks Vector Database; training custom models from scratch or fine tuning existing models; running inference on customers’ data, overseen by Unity Catalog; and end-to-end MLOps via MLflow.
“We believe that AI will transform all software, and data platforms are one of the areas most ripe to innovation through AI,” the Databricks founders write. “Historically, data platforms have been hard for end-users to access and for data teams to manage and govern. Data intelligence platforms are set to transform this landscape by directly tackling both these challenges – making data much easier to query, manage and govern.
“In addition, their deep understanding of data and its use will be a foundation for enterprise AI applications that operate on that data,” they continue. “As AI reshapes the software world, we believe that the leaders in every industry will be those who leverage data and AI deeply to power their organizations. DI Platforms will be a cornerstone for these organizations, enabling them to create the next generation of data and AI applications with quality, speed and agility.”