Follow BigDATAwire:

August 30, 2024

Dremio Unveils New Features to Enhance Apache Iceberg Data Lakehouse Performance

Dremio, a data lakehouse company based in Santa Clara, CA, has announced a significant advancement in data lake analytics. The company claims that the new features and advances to the platform can dramatically accelerate query performance on Apache Iceberg tables while reducing the need for user intervention.

Enhancing query performance on Apache Iceberg tables addresses a significant challenge in data lakehouse environments: managing the complexity and resource demands of querying massive datasets. Dremio’s breakthrough also helps organizations lower total cost of ownership (TCO) and shorten the time to gain business insights.

One of the new features introduced by Dremio is Live Reflections which is designed to optimize and simplify data management and query acceleration. It does this by automatically updating materialized views and aggregations whenever changes are made to the base Iceberg Tables. The feature also automatically triggers updates to the views and aggregations used to accelerate the queries. 

Live Reflections allows users to speed up queries without the need for maintenance, while built-in ROI estimates help them select the Reflection recommendations that deliver the best value and optimal performance. Users won’t have to manually figure out the necessary aggregations, table sorting, or refresh frequency.

The new Result Set Caching feature accelerates responses up to 28 times faster across all data sources, according to Dremio. It does this by storing frequently accessed query results, rather than just storing the queries themselves. As users often query the same data, this feature allows for quick retrieval of pre-computed results. 

Storing query results instead of queries in the database requires more storage space, but since object storage is relatively inexpensive compared to compute resources, this approach is cost-effective.

Dremio has also added a data merge-on-read feature that accelerates Iceberg table writes and ingestions up to 85%. This speed enhancement is crucial for maintaining up-to-date data and improving overall system performance. 

The new Auto Ingest Pipes feature significantly enhances the management and automation of Iceberg data pipelines. This feature offers seamless data loading from Amazon S3 to Iceberg tables. It also uses notifications to trigger automatic updates, ensuring that data ingestion processes are continuously updated with fresh data.

“We continue to deliver market-leading performance and manageability for Iceberg lakhouses to our customers,“ said Tomer Shiran, founder of Dremio. “With Live Reflections, Result Set Caching, and Merge-on-Read, Dremio pushes the boundaries of high-performance analytics in lakehouse environments. In addition, by utilizing our new Auto Ingest Pipelines as well as improved query federation capabilities, companies can now reduce the complexity of data movement and the setup and management of data pipelines.”

Dremio’s success stems from its innovative data lakehouse technology, particularly its integration with Apache Iceberg, which has become a popular choice for managing large-scale data due to its performance and versatility. Several key players in the industry have thrown their weight behind Apache Iceberg, including Databricks and Snowflake. 

Dremio’s new features, which are now generally available, are pushing the boundaries of analytics performance and redefining how organizations interact with and derive value from their data. The new features also highlight the increasing emphasis on automation and optimization. 

Related Items 

The Data Lakehouse Is On the Horizon, But It’s Not Smooth Sailing Yet

There Are Many Paths to the Data Lakehouse. Choose Wisely

Will the Data Lakehouse Lead to Warehouse-Style Lock-In?

 

BigDATAwire