Follow Datanami:
January 6, 2021

Dremio Officially a ‘Unicorn’ As it Reaches $1B Valuation

The explosion of data lake analytics in the cloud has been good for Dremio, which today announced the completion of a $135-million Series D round. The company’s valuation has more than doubled in the past year to $1 billion, putting the company in rarified “unicorn” status.

Dremio emerged from stealth back in 2017 with an ambitious plan for a data platform that radically flattened and simplified the dominant analytics stack at the time, which was MPP data warehouses. On top of the in-memory Apache Arrow fast middleware layer developed by co-founder and CTO Jacques Nadeau, Dremio added a range of supporting technologies, including native query push-downs to support multiple data repositories; caches to optimize data access; and a query planner.

In 2018, it fleshed out its platform with more capabilities, including support for Kubernetes to deliver elastic scalability, and a data catalog to give users faster access to data. As data started piling up in object stores in 2019, it developed ways to accelerate data lake analytics, including techniques like predictive pipeline and intelligent caching.

This past October, the company added several new capabilities –like caching data in the Arrow format, a scale-out query planner, and runtime filtering–that got co-founder and chief product officer Tomer Shiran wondering whether Dremio just  made data warehouses obsolete.

Dremio clusters can be used to query data residing in a variety of databases and file systems. But today, about 80% of its customers are querying data stored in AWS S3 and Microsoft ADLS. Cloud data analytics is what’s driving Dremio’s growth, says CEO Bill Bosworth.

Dremio architecture

“There’s a significant acceleration of analytics being done in the cloud, and Dremio’s solutions fundamentally simplifies the data workflow for teams,” Bosworth tells Datanami. “By allowing people to query the data directly in cloud storage where it’s accumulating so rapidly, it gives our customers significant more flexibility and faster time to analytics. So as cloud data lake storage emerges, so does the need for a new architecture to query that data directly, and that’s all been accelerated through COVID.”

Bosworth took the helm of Dremio last March, just two weeks before the COVID-19 lockdown rendered the company’s Santa Clara, California headquarters essentially an empty building. Bosworth’s arrival coincided with a Series C round totaling $70 million that was meant to start scaling the company. As the lockdown eased in the summer and fall months and organizations continued pouring data into the cloud, business picked up speed. Sales and employee count doubled, and Dremio found several suitors knocking on its virtual door.

“We had aggressive in-bound interest from several high-quality investors,” Bosworth explains. “Ultimately Sapphire Ventures was the right choice for right now… [It was] very difficult to say no to that. So this was a preemptive action that came along very opportunistically that just made sense as we looked at building the dry powder that we need to aggressively grow the company in the next several years. This allows us to stay heads down and focused on that task without worrying about additional fundraising.”

The $135 million infusion also came with a significant increase in valuation. During the Series C round last March, the company was valued at $370 million. With a new valuation of an even $1 billion, it means Dremio’s valuation has grown 170% in just 11 months.

Dremio CEO Billy Bosworth

That doesn’t put Dremio in Snowflake territory yet, as the cloud data warehouse vendor is enjoying an $80 billion market capitalization following its successful IPO last September. But as the folks at Dremio see it, the battle for control of cloud analytics will be fought in the data lakes, not in proprietary data warehouses.

“I think that they definitely are benefiting from that cloud trend,” Bosworth says of Snowflake. “The trend that’s happening now is, as that data accumulates in the cloud storage, there’s a story before the story when it comes to data warehouses, and that’s how do you get all the data in there, before you start realizing the benefits of it?

“That’s becoming a bigger problem for people,” he continues “It creates a lot of workflow issues. It delays the ability to react to the data consumer as quickly as they need to. It creates data governance concerns around moving and copying so much data.”

Instead, Dremio advocates leaving the data where it is in the S3 or ALDS object store, and using Dremio to query it directly, usually from the comfort of a BI tool like PowerBI, Tableau, or Lookker. Outside of a few edge use cases that require very high performance analytics coupled with transactional support, querying data directly in the data lake will satisfy the majority of analytic demands for organizations, Shiran says.

It’s taking away a complicated step in the ETL process,” he says. “After the data gets in the cloud storage, which is becoming the default bitbucket, we don’t require you to do anything else with it, from that point. You can start querying it directly.”

To ensure customers have a good experience, Dremio is working with Microsoft and Tableau to ensure those BI tools work well with the Dremio platform. It’s also working with a range of organizations on Apache Iceberg, an open source project developed by Netflix that adds a table format for slow-moving tabular data to account for schemas that drift over time. Iceberg was originally developed for Presto, Hive, and Spark SQL, but Dremio is also tapping into it.

“The format, by and large, is primarily Parquet, with Iceberg on top of it,” Shiran says. “The idea there is to bring functionality to the data lake around inserts, updates, deletes, transactions, and time travel–things that in the past you would have had to go into this proprietary, expensive system to get. Now you can do that directly in a data lake. And most importantly, the data stays in an open format so you can access it with Databricks and Athena and Spark and Flink and Kafka and all these different technologies that all these companies are using.”

With data flowing into cloud data lakes at unprecedented rates, the iron is hot for Dremio to strike. Bosworth says the plan is to double the size of the company over the next year to address the market opportunity, including ramping up its Indian engineering hub. That’s a bit of a challenge with the current COVID-19 lock-down across multiple jurisdictions, but the company already has 10 months experience at it, Bosworth points out.

At the end of the day, the company is banking on its ability to deliver the kind of data analytics experience that customers are used to with cloud data warehouses, but do it atop data lakes instead.

“As enterprises increasingly use cloud storage platforms, such as Amazon Web Services’ S3 and Azure’s ADLS, they’re looking for ways to use that data where it is, and by as many groups as possible in order to make insightful business decisions,” Anders Ranum, managing director at Sapphire Ventures, said in a press release. “We believe that Dremio is well on its way to becoming a category-defining company, and we couldn’t be more excited to partner with Billy and the Dremio team on their mission to reimagine the cloud data lake and eliminate the need for data warehouses.”

Related Items:

Did Dremio Just Make Data Warehouses Obsolete?

Dremio Preps for Growth with $70M in the Bank

Dremio Emerges from Stealth with Multi-Threat Middleware

Datanami