April 20, 2023

Real-Time Analytics Databases Emerge to Take On Big, Fast-Moving Data

Alex Woodie

(Blue Planet Studio/Shutterstock)

A new product category is emerging in the analytics field to deliver timely queries atop very big and very fast-moving data. The name hasn’t been nailed down yet, but one of the leading providers in the space calls its product a real-time analytics database.

Once you’ve reached the limits of what a traditional data warehouse like Snowflake, BigQuery, or Redshift can do, you may step up into a more exotic line of distributed systems. The leaders in this space–Apache Druid, ClickHouse, and Apache Pinot–aren’t exactly new, but they are seeing a surge of interest as data volume and velocity continues to build, and the window of opportunity to act on the data continues to get smaller.

These databases are united not so much in the technology they use, but in what capabilities they can deliver. They all excel at executing complex OLAP-style SQL queries against very large amounts of fast-moving data, for a large number of users, and returning the results in a short amount of time (usually sub-second).

One of the people watching this space is David Wang, the vice president of product and technical marketing at Imply, the company behind Apache Druid. Wang says it’s been fun to see how Druid, Clickhouse, and Apache Pinot have competed in the emerging market for real-time analytics databases.

“I think that’s really exciting because everybody has always thought of analytics as BI and the classical executive style reporting and Tableau dashboards,” Wang told Datanami in a recent interview.

“But this whole new world of developers are building applications and they’re building analytics applications,” he said. “If you look at this category that we represent, it’s encompassing of Apache Druid, ClickHouse, Apache Pinot. There’s kind of a new wave of really fast, real-time analytic databases that are serving this new use case.”

The term “real-time” is vague and can have multiple meanings, Wang acknowledged. For example, it can refer to the pace at which new data is being generated, where it’s sometimes a synonym for streaming data. On the other hand, real-time can refer to the latency of the queries and the speed at which the user gets results. But it doesn’t really matter in the end, because Druid can check both of those boxes, Wang said.

“There is this intersection point on the Venn diagram when you’re trying to do real analytics, but do it at the speed, the concurrency, and the operational nature of events–then you’ve got to have something that’s purpose-built for that intersection, and I think that’s where this category has emerged,” he said.

A better way to think about real-time analytic databases like Druid is what niche they fill. According to Wang, this new class of analytics database are serving an emerging need for analyzing the massive amounts of fast-moving data being generated by online applications.

Druid customers like Netflix, Target, and Cisco’s ThousandEyes have these types of fast-moving analytic problems. So does Sovrn, the ad-tech firm that adopted a hosted version of Apache Pinot from StarTree, and which we recently profiled. So does Yandex, the Russian search giant that developed ClickHouse and then spun it out into its own company in September 2021.

“Druid was built for the intersection of analytics and applications,” Wang said. “Analytics always represented large-scale aggregations and group-bys and big filtered queries, but applications always represented a workload that means high concurrency, operational data. It has to be really, really fast and interactive.”

ClickHouse, StarTree, and Imply may not have the same mindshare as Snowflake or Databricks. But among technologists who needed established products to solve challenging analytics challenges, they’ve already proven their worth. Expect to see more development in this emerging product category in the coming months and years.

Apache Pinot Uncorks Real-Time Data for Ad-Tech Firm

Speedy Column-Store ClickHouse Spins Out from Yandex, Raises $50M

Applications: Data Mining

Technologies: Frameworks

Vendors: ClickHouse, Imply, StarTree

Tags: Apache Druid, Apache Pinot, Clickhouse, real-time analytics database

Real-Time Analytics Databases Emerge to Take On Big, Fast-Moving Data

April 22, 2024

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Real-Time Analytics Databases Emerge to Take On Big, Fast-Moving Data

April 22, 2024

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link