February 24, 2020

Databricks, Partners, Open a Unified ‘Lakehouse’

George Leopold

via Shutterstock

Coalescing around an open source storage layer, Databricks is pitching a new data management framework billed as combining the best attributes of data lakes and warehouses into what the company dubs a “lakehouse.”

The new data domocile is promoted as a way of applying business intelligence and machine learning tools across all enterprise data. The company and its lakehouse partners also have assembled a “data ingestion network” that allows users to load siloed data into Delta Lake, a storage layer released by Databricks to the open source community last year.

Among the applications that can be integrated into the lakehouse are Google analytics, Salesforce and SAP along with Cassandra, Kafka, Oracle, MySQL and MongoDB databases. Those along with mainframe and file data would be available in one location for BI and machine learning use cases.

As it aims to develop an enterprise AI platform, high-flying Databricks and it network partners are attempting to fuse traditional structured data with unstructured volumes while combining BI and machine learning use cases. Siloed data lakes and warehouses result in “slow processing and partial results that are too delayed or too incomplete to be effectively utilized,” Ali Ghodsi, Databricks’ co-founder and CEO said this week in introducing the lakehouse framework.

Lakehouse “aspires to combine the reliability of data warehouses with the scale of data lakes to support every kind of use case,” Ghodsi added. “In order for this architecture to work well, it needs to be easy for every type of data to be pulled in.”

Along with enterprise analytics applications and databases, data can also be pulled into Delta Lake from cloud file storage service like Amazon Web Service S3, Google Cloud Storage or Microsoft Azure data lake storage. Databricks said other integrations would be available soon from Informatica, Segment and Stitch.

The lakehouse partner network includes Fivetran, Infoworks, Qlik, Steamsets and Syncsort. Qlik said Monday (Feb. 24) it is deploying its data integration platform with Delta Lake, enabling the ability to automate and stream data to the cloud from mainframes, data warehouses or databases, then applying cloud-based analytics tools.

The unified storage layer would allow users to run machine learning along with traditional business intelligence workloads on a single lakehouse, added George Fraser, CEO of network partner Fivetran.

In donating Delta Lake code last year, Databricks noted the open source project targets shortcomings in data lakes as structured and big data are combined. Among them are poor data quality, unreliable read and writes and degraded performance as data lakes fill up.

The lakehouse framework is therefore promoted as combining the reliability of data warehouses with the scaling capability of data lakes to support emerging machine learning use cases.

To that end, Delta Lake includes ACID transactions between rewrites along with schema management, data versioning and “time travel,” a reference to the ability to view older versions of a table or directory when new file versions are created.

Recent items:

Will Databricks Build the First Enterprise AI Platform?

Databricks Donates Delta Code to Open Source

Databricks Snags $400M, Now Valued at $6.2B

Applications: Artificial Intelligence, Enterprise Analytics

Technologies: Cloud, Frameworks

Sectors: Energy, Financial Services, Healthcare, Manufacturing, Other, Retail

Vendors: Databricks

Tags: Ali Ghodsi, bi, cloud storage, data lake, data warehouse, Delta Lake, lakehouse, machine learning, storage layer

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

May 3, 2024

May 2, 2024

May 1, 2024

April 30, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Databricks, Partners, Open a Unified ‘Lakehouse’

Join the discussion Cancel reply