Follow Datanami:
October 21, 2022

Dremio Supports Iceberg in Lakehouse Update

Dremio has announced the general availability of its support for DML operations (insert, update, and delete) on Apache Iceberg tables and for time travel for in-place querying of historical data.

Apache Iceberg is an open source table format for analytics on the data lakehouse and is a core component of Dremio’s open lakehouse architecture. Iceberg has just celebrated a milestone with its major 1.0 release. The release introduces substantial performance and usability improvements, according to Dremio, including long-term API, high performance updates and deletes (merge-on-read), multi-dimensional sorting (Z-order), statistics, and numerous other capabilities.

“All of Apache Iceberg’s new functionality means there’s never been a better time to adopt it to build your data lakehouse,” said Mark Lyons, vice president of product management at Dremio. “With the broadest ecosystem of community contributions and deployment, Iceberg is the fastest growing table format and the industry standard for managing data in data lakes. It’s essential to the foundation of an open lakehouse, and Dremio has been in step with Iceberg from the start.”

Source: Dremio

Dremio’s GA support for DML operations and time travel enables use cases such as deletes for privacy and compliance, updates for customer information changes, and inserts for late-arriving supply chain records directly in the data lakehouse.

“Until recently, using robust DML operations and accessing historical data within any defined period were only available in data warehouses and other databases,” explained Lyons. “Now it’s easier than ever to put aside expensive and proprietary cloud data warehouses and run workloads on an open lakehouse, with the full power of SQL at your fingertips and without the need to copy your data into a closed proprietary system. Data mutations and leveraging historical snapshots are possible directly on the data lake. The result is lower costs, more flexibility, significantly reduced time-to-insight and increased productivity and innovation for data engineers and business analysts—without vendor lock-in.”

In addition to the DML capabilities, Dremio also announced new features on its platform, including:

  • Native row and column role-based access policies;
  • SQL User Defined Functions (UDFs);
  • A new SQL IDE with autocomplete and multi-statement support;
  • New Azure data sources; and
  • BI integration updates including Tableau SSO and Power BI Azure Active Directory.

Dremio says its open data lakehouse architecture greatly decreases data movement and copying, and in turn, decreases complexity and cost, while still offering full and direct access to petabyte data sets.

Customers seem enthusiastic for the new release: “Fivetran is excited about Dremio’s recent release that enables customers to leverage the features of Apache Iceberg 1.0.,” said Fraser Harris, vice president of product at Fivetran. “We are impressed by the broad ecosystem adoption and performance that Iceberg offers. For customers who desire the open architecture approach, Fivetran looks forward to providing automated and reliable pipelines to open data lakehouses built on Apache Iceberg tables as an alternative to data warehouses.”

Moonfare, a global private equity investing platform, adopted Dremio Cloud on AWS to enable interactive analytics and dashboards for all of its employees.

“We were drawn to Dremio Cloud for its performance at scale and for the ability of the semantic layer to provide easy, efficient access to our data in Amazon S3,” said Angelo Slawik, data engineer at Moonfare. “After our initial implementation is complete, we are eager to explore capabilities enabled by Dremio Arctic such as Git-like version control for our datasets.”

To read more details about the Apache Iceberg 1.0 release and Dremio’s new features, check out a blog post from Dremio’s Alex Merced here.

Related Items:

Lakehouse Update a Warehouse Killer, Dremio Says

Dremio is Swimming Laps Around the Data Lake with $160M Series E, $2B Valuation

Apache Iceberg: The Hub of an Emerging Data Service Ecosystem?