Follow Datanami:
March 2, 2022

Lakehouse Update a Warehouse Killer, Dremio Says

(Aunt Spray/Shutterstock)

Dremio co-founder and Chief Product Officer Tomer Shiran says the data lakehouse architecture updates it announced today at the start of its Subsurface Winter Live conference–including a 60% faster query engine, a new data management layer built on Apache Iceberg, as well as a “forever free” tier with Dremio Cloud–should finally mark an end to the data warehouse’s long reign.

“Basically, what we’re announcing tomorrow is to make the data warehouse irrelevant,” Shiran told Datanami yesterday. “Up until now, companies have had the tradeoff. You could have more SQL like updates, inserts, and deletes and higher performance of the warehouse, or you could have more flexibility with the lake approach. You can use whatever engines you want, more future proofing.

“Now, because of this innovation we’ve done…all of that makes it so that you don’t have this tradeoff anymore,” he continued. “Now it’s a win-win. We can do everything with the lake.”

It’s a familiar refrain for Shiran and his colleagues at Dremio, who have been hammering the data warehouses as an archaic approach to data analytics for years. With today’s enhancements to the offerings, Shiran believes Dremio has finally put the last nail in the coffin of data warehouses.

It starts with Dremio Sonar, which is the new name Dremio has given to its query engine. The query engine, which is based in part on Apache Arrow, is the heart of the Dremio offering, as it enables users to query data in a wide variety of data stores, including S3 but also Azure Data Lake Storage (ADLS), HDFS, and relational and NoSQL databases.

“It’s almost 70% faster than our previous generation in terms of TPC-DS benchmarks, and it will soon support at DML [Data Manipulation Language] operations as well, so inserts, updates, deletes–basically everything you do with the data warehouse,” Shiran said. “So I get the rich SQL coverage in a warehouse, and better performance of the warehouse. But there’s more flexibility in the lake right now. You get the best of both worlds with Dremio Sonar.”

Dremio Arctic, meanwhile, is a new data metastore that’s based on a Dremio-sponsored project called Nessie as well as Apache Iceberg, the data table format co-developed by Ryan Blue, who was recently named a Datanami 2022 Person to Watch.

Support for the Iceberg table format is a key element of the lakehouse that Dremio is building, Shiran said, as it delivers the manageability of data in a data lake that customers have come to expect from data warehouses.

The other major table format for lakehouses is the Delta Lake format from Databricks, he said. But Iceberg is seeing more adoption as an open format among big lakehouse and data warehouse players, such as Snowflake and Amazon Athena from AWS, whereas Databricks withholds some of the key details of Delta Lake that would be necessary for wider community adoption, Shiran said.

In addition to serving as the metastore, Arctic will serve two additional purposes for Dremio, Shiran said. “One is it automatically optimizes your tables,” he said. “So in the background it’s running various jobs to go compact small files together, to repartition the data automatically, garbage collecting, etc. So it does all the optimization, which in the past you could get if you put all your data warehouse. [Now] you can have that kind of functionality in the lake.

“And the second thing is does is it provides a Git-like experience for data,” he continues. “So if you’re familiar with Git or GitHub, you can now do commits, branches, and tags on your data. And there’s a variety of different use cases that opens up, like multi-statement transactions, which again is something you could never do on a lake. And the ability to experiment [is also important], like a data engineer who wants to go in and ingest a bunch of data, transform it, test it before actually exposing it to everybody else in the company.”

The combination of Nessie and Iceberg will help to transform how customers manage data in the long run, Shiran said. “We think that in five or 10 years from now, all of the world’s data will be managed that way, just like all the world’s source code is managed that way,” Shiran said. “I think that’s what the future will look like.”

Lastly, Dremio announced the general availability of Dremio Cloud. It’s being made available first on AWS, to be followed by support for other clouds, Shiran said.

While Dremio has been in the cloud for a while–indeed, 80% of its customers are running in AWS, Microsoft Azure, or Google Cloud Platform, Shiran said–the launch of Dremio Cloud is important because it marks the first fully managed service from the company, he said.

Before today, to run Dremio in the cloud, customers would have needed to spin up their own Kubernetes service in their own virtual private cloud (VPC) environment. It was up to the customer to ensure there was enough horsepower under the Dremio query engine to ensure all the analytics and other downstream applications were being served sufficiently. And it was up to customers to scale that environment down so they’re not charged for excessive EC2 capacity (on AWS, for example).

“All of that is abstracted away now,” Shiran said. “Now, it’s like an application. There’s no version numbers in Dremio Cloud. It’s like Gmail. I use it [and] it gets better over time. But I’m not dealing with upgrades. From a scaling standpoint, I don’t need to deal with any of that. You just bring your data, bring your queries, and everything is kind of taken care of for you.”

In addition to powering SQL queries, Dremio helps with data engineering and ETL (Image source: Dremio)

There’s another piece of news that may interest prospective users: Dremio Cloud is free. Of course, that doesn’t mean that the entire Dremio-powered analytics environment is free. Customers still must pay to keep their data in S3, and they must pay for the EC2 processing capacity that actually executes the Dremio Sonar queries. But the Dremio environment itself is free to use, at any scale. Customers, of course, can pay Dremio for enterprise support, which also comes with more advanced security and authentication capabilities.

Dremio has segregated management of the cloud offering into two pieces, including a user-facing control plane where they provide Dremio (the company) with the necessary permissions to access their environment, and a separate execution plane that the company uses internally to manage customer environments.

“And so, based on the workload, we’re deciding, okay, maybe there’s no computer resources running right now in their account, or maybe right now everybody is hitting some dashboard that they have and we need to get more compute capacity, so we’ll just spin up EC2 instances and spin them down,” Shiran said. “This allows them to take advantage of their capacity discounts from Amazon, their EDP [Enterprise Discount Program] commits–all that kind of stuff.”

Dremio, which raised a $160 million Series E round in January at a $2 billion valuation, is growing quickly at the moment. The company essentially doubled in size (revenues, employees, customers) over the past year, according to Shiran, and it’s increasingly enthusiastic about holding the baton for the burgeoning open data ecosystem, which brings it up against much bigger players.

“Anytime a customer seriously evaluates Dremio versus a competitor, nine times out of 10, they choose Dremio,” Shiran said. “There are other companies out there that are larger, have more mindshare or whatever. And this [Dremio Cloud] makes it a no-brainer for customers.”

Dremio wants customers to use Dremio Sonar to query their data, but the company recognizes that customers will likely use a wide variety of engines, including Spark, Presto, Flink, and even Hadoop to query their data. That data-loving freedom is what separates Dremio’s lakehouse from other offerings, Shiran said.

“It’s not like a data warehouse, like Snowflake, where you’re ingesting into the data warehouse and getting locked in and, you know, extorted,” Shiran said. “That’s what happens with these data warehouses. It’s why everybody’s pissed off at Teradata. Because you get locked in and then you can’t get out and then the vendor realizes it and then what does a vendor do? Well, they raise their prices.”

Subsurface Winter Live continues through tomorrow. You can sign up and find more information here.

Related Items:

Snowflake, AWS Warm Up to Apache Iceberg

Dremio is Swimming Laps Around the Data Lake with $160M Series E, $2B Valuation

Did Dremio Just Make Data Warehouses Obsolete?