February 2, 2022

Onehouse Emerges from Stealth to Deliver Data Lakes in ‘Months, Not Years’

Jaime Hampton

Onehouse, a data lakehouse management company, has emerged from stealth today with $8 million in seed funding from investment firms Greylock Ventures and Addition.

Onehouse’s cloud-native managed lakehouse service is based on Apache Hudi, a data platform created by Onehouse Founder and CEO Vinoth Chandar while working at Uber in 2016. Hudi (pronounced like “hoodie”) is a data lake technology that allows for stream processing and data clustering and optimization techniques in Apache Hadoop-compatible cloud stores and distributed file systems.

Data lakes are popular with organizations who have data needs beyond a traditional warehouse, which is an architecture that is easy to use but can become extremely costly as the scale of a company’s data and AI/ML workloads increases, especially in exabyte scale operations.

Chandar encountered these challenges while building a data lake at Uber, as the company needed a high performance, large scale solution capable of supporting AI and ML workloads in near real-time due to the rapid pace of its ride sharing enterprise. Hudi was created to give core warehouse and database functionality to a data lake, a setup which gave rise to the term “lakehouse.”

Data lakes are not without their unique problems, including constantly tuning data layouts, large-scale concurrency controls, fast data ingestions, stale or unreliable data, and data deletions. Building a data lake can be risky because it can take a large investment of time and resources and there is often not enough talent available for the highly skilled engineering teams needed to develop it. Also, lake projects can become stalled or abandoned due to a lack of standardized, high quality data infrastructure centered around lakehouse technologies.

(Source: Onehouse)

“Onehouse is a cloud-native, managed foundation for your lakehouse that automatically ingests, manages and optimizes your data for faster processing,” said Chandar in a blog post. “Onehouse is not another query engine, but a self-managing data layer that seamlessly interoperates with any of the popular query engines [e.g., Apache Spark, Trino, Presto] or data/table formats and vendors out there.”

Chandar notes that Onehouse is not an enterprise version of Hudi, but that it leverages Hudi’s capabilities in order to enable incremental data processing much faster than traditional batch processing. The company claims it can help organizations build data lakes in “minutes, not months” while saving money and retaining rights to their own data without being bound to individual vendors.

“The data lakehouse is the future of data lakes, providing customers the ease of use of a data warehouse with the cost and scale advantages of a data lake,” said Greylock Partner Jerry Chen. “Apache Hudi is already the de facto starting point for modern data lakes and today Onehouse makes data lakes easily accessible and usable by all customers.”

In Search of the Modern Data Stack

The Apache Software Foundation Announces Apache Hudi as a Top-Level Project

Technologies: Systems

Tags: business intelligence, data lake, data lakehouse, data management, data warehouse, database, Hadoop

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Onehouse Emerges from Stealth to Deliver Data Lakes in ‘Months, Not Years’

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 16, 2024

April 15, 2024

April 12, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Onehouse Emerges from Stealth to Deliver Data Lakes in ‘Months, Not Years’

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 16, 2024

April 15, 2024

April 12, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link