Apache Druid Gets Multi-Stage Reporting Engine, Cloud Service from Imply
Imply, the Northern California company behind the open source analytics database called Apache Druid, made a pair of announcements today during a virtual event, including the beta of a new multi-stage query engine as well as Polaris, a new cloud offering based on it.
Apache Druid is a column-oriented, in-memory data store that was originally developed in 2011 at the ad tech analytics firm Metamarkets. The original goal was to surface insights on high cardinality data, which is something that other databases of the time (both relational and NoSQL) struggled to do. In addition to querying data in batch, the distributed framework could “shapeshift” and process streaming data, too.
Several Druid creators founded Imply in 2015 to build a commercial entity behind the open source technology, which was being used by large companies, including Yahoo. The company has grown over the years, and last June, it completed a Series C round of funding valued at $70 million, bringing its total outside investment to more than $115 million. The company says it was valued at $350 million following its $30 million Series B in late 2019.
Today Imply announced the availability of a private preview for the new multi-stage query engine, which is also critical to its new cloud offering.
Imply describes the new multi-stage query engine as an “evolution” of the core Druid data store. While the core storage engine could handle data at terabyte and even petabyte scale, developers needed to bring other components to the data party, including facilities for managing data exports, reporting, and advance alerting.
These are the core requirements that Imply is targeting with the multi-stage query engine. The new Druid for Reporting component improves Druid’s capability to handle long-running, heavyweight queries and enables customers to use a single database for powering applications that require both interactivity and complex reports or data exports.
Druid for Alerting, meanwhile, builds on “Druid’s longstanding capability to combine streaming and historical data” and enables users to build alerting across a large number of entities with complex conditions at scale.
Imply has also enhanced its data ingestion component to deliver high concurrency data ingestion for HDFS, Amazon S3, Azure Blob and Google GCS. Customers can use the same SQL language they currently use for queries to control data transformations directly from Druid.
“The multi-stage query engine represents the most significant evolution of Druid, an expansion of the architecture that makes it unparalleled in the industry,” Gian Merlino, co-founder and CTO of Imply and Apache Druid PMC chair, stated in a press release. “It brings both flexibility as well as ease to the developer experience. I’m excited that the entire open source community will be able to take full advantage of it.”
That brings us to the second part of Imply’s announcement today: the delivery of a Druid-based software-as-a-service (SaaS) offering.
Having a SaaS offering was a top goal for the company following the recent Series C round, and with Polaris, the first delivery in its 12-month Project Shapeshift, the company is now doing that, Imply CEO and co-founder Fangjin Yang said.
“Today, we are at an inflection point with the adoption of Apache Druid as every organization now needs to build modern analytics applications,” Yang stated in a press release. “This is why it’s now time to take Druid to the next level. Project Shapeshift is all about making things easier for developers, so they can drive the analytics evolution inside their companies.”
Polaris is a fully managed service based on Druid that eliminates the need for users to worry about the underlying infrastructure, Imply says. The service comes with pre-configured configurations, built-in monitoring, and automated performance tuning. The company says the development interface is simple to use for a range of analytics applications, and a built-in connection to the Confluent Cloud helps to make it easier to establish connections to data sources.
Imply Polaris is generally available now. Users can access it at signup.imply.io