Dremio Charts Open Course with Dart
Dremio last week unveiled the first product deliverable under Dart, its new initiative to bolster the performance of its SQL analytics engine for data lakes, reduce costs, and close the performance gap with dedicated data warehouses.
As a database engine, Dremio doesn’t include a data store. Instead, like the open source Presto offerings, Dremio assumes that the user is storing data in S3 or another data lake that can store S3. This approach brings its share of advantages, such as a decreased need for extensive ETL/ELT processing. But it also brings disadvantages compared to dedicated data warehouses, which typically store data in an optimized format.
With its Dart initiative, Dremio is looking to further strip away the remaining performance advantages of data warehouses, which primarily exist in the use cases where large number of users demand fast results to their SQL queries.
For starters, Dremio is introducing query plan caching, which the company says eliminates planning overhead and latency for repeated queries. “This is particularly impactful for BI dashboarding use cases, where many users are simultaneously firing similar queries against the SQL engine as they navigate through dashboards,” the company says in its June 3 press release.
Dart also brings a new compiler that will let customers run “much larger and more complex SQL statements” with reduced resource requirements, the company says. Its coverage of the ANSI SQL standards has also improved, with additional functions and operators, including new window and aggregate functions. Dremio says that nearly all SQL operators, functions, and casts are now supported inside of Gandiva, the LLVM-based toolkit inside of its Apache Arrow in-memory columnar data format.
Users will also save money with Dart when it comes to S3 and Azure Data Lake Storage (ADLS) data access costs. According to Dremio, reads of S3 and ADLS data can constitute 30% to 60% of the total cost of a query execution workload. By utilizing a new scan filter pushdown capability, the Dart initiatives can eat into those cloud data lake read costs.
Other new features that will come to Dremio as part of the Dart initiative includes support for unlimited table sizes with an unlimited number of partitions and files, and automated management of the query acceleration data structures in Dremio (its “Reflections” component).
Enhancing the core Dremio engine to support enterprise SQL workloads is a theme for Dremio. Seven months ago, with its fall 2020 release, the company unveiled several enhancements geared toward bolstering performance, including support for caching data in the Apache Arrow format; the capability to scale-out its query planner; and enabling runtime filtering.
In addition to bolstering its core Dremio engine, the company is working at the data layer, including delivering support for the Apache Iceberg table format, which enable multiple engines to work together on the same data in a transactionally consistent manner; and Project Nessie, which brings Git-like semantics to the data lake.
The Dart enhancements are all about giving customers the same level of performance in an open cloud data lake as they have come to expect from closed data warehouses, says Dremio founder and Chief Product Officer Tomer Shiran.
“We’ve gotten rid of the downsides of open,” Shiran told Datanami in a recent interview. “There were still some advantages to the data warehouse in terms of transactions and record-level inserts and updates. But all that’s being solved now. I think in the next year, I don’t see why people would continue to use data warehouses, other than they’re familiar.”