Did Rockset Just Solve Real-Time Analytics?
Companies have been pushing the envelope of real-time analytics and what technology is capable of for many years. Along that vein, Rockset today claimed it has smashed through the barriers preventing customers from running the full gamut of SQL-based queries atop streaming data.
The big new feature unveiled by Rockset today is a new rollup capability that enables customers to continuously run SQL-based aggregations atop data that’s streaming into the database via Apache Kafka, AWS Kinesis, or from connectors to operational databases.
The new rollups and aggregations feature is a gamechanger, according Venkat Venkataramani, CEO and co-founder of Rockset, which delivers its analytics database as a service in the AWS cloud.
“Simply using SQL, you can ask Rockset to maintain a bunch of metrics that would be up-to-date to the second, and Rockset can do that very, very efficiently without having to load all the raw data and then run analytics in a batch mode,” Venkataramani said. “We can do it in real time, almost like dipping into the stream and keeping track of all the metrics you care about.”
With the capability to do roll-ups and aggregates on real-time data, Rockset becomes almost a real-time version of the Snowflake data warehouse running in the cloud, said Venkataramani, who helped developed the core RocksDB key-value store while managing Facebook’s online infrastructure from 2007 to 2015.
“People are working with huge torrents of data coming in, terabytes of data that they’re sending via data streaming systems like Kafka and Confluent Cloud or what have you, and then they want real time analytics on top of the data,” he told Datanami. “The traditional way is to bring it into a data lake and then do an hourly job or a nightly job.”
However, that doesn’t cut it for acting upon the latest data. Alternatively, companies could use compute engines like Spark Streaming or Flink to analyze the data, he says. However, those systems lack a serving layer, such as PostgreSQL or MySQL database, to be able to respond to queries. Similarly, Confluent’s KSQLdb offering (which is built with RocksDB) enables SQL analytics to be run on streaming data, like Rockset, but it also lacks a serving layer or database to serve queries.
Some companies have cobbled together their own real-time analytics systems, but they typically lack the scalability and performance of a system that was architected from the ground up to provide this in a cloud-native fashion, Venkataramani said.
“The system would have looked like duct taping multiple disparate components,” he says. “It will be just like a Rube Goldberg machine of sorts. The freshness will not be there, and it just won’t be as fast as you want.”
The one exception to this is log data. There are many options companies have to analyze log data in real time, Venkataramani said. But, alas, they do not support SQL, which, despite being in its sixth decade, remains lingua Franca of the business analytics world.
Rockset solved the problem by building a single system from the ground up that combines a full database along with the capability to run SQL queries atop fast-moving event data. After telling Rockset what dimensions and queries to keep track of in real time, the system will maintain that roll-up, and enable users to query it just like any other database table, Venkataramani said.
The company brings the full weight of ANSI SQL to the stream of data, including joins, windowing functions, sorts, aggregations, and order-bys, he said. It even includes support for non-ANSI SQL elements, such as the capability to work on nested arrays and objects described in JSON. The JSON capability is nice, but the capability to run SQL queries on the data is really what’s pushing the needle forward for most customers, Venkataramani said.
“Almost of our customers can’t really build their solution without SQL,” he said. “If you have another table from another stream or another source, like DynamoDB or MongoDB, now you can join them just using SQL. So now you can do all of your analytics right on top of that as though it’s another fully typed, fully indexed SQL table that is just built and maintained for you in the cloud.”
The RocksDB database is an important element in Rockset’s analytic system, but it’s neither sufficient on its own, nor is it a necessary component, Venkataramani said. Other elements, such as support for strong dynamic typing, schemaless data ingestion, converged indexing , time-series optimization, query planning, and its serverless, cloud-native architecture, all played a role in delivering the whole, according to an August 2020 whitepaper on Rockset’s design.
In addition to the rollup feature, Rockset introduced two other new features, including the capability to use SQL to continuously transform data as it’s being ingested, and the capability to set time-based partitioning and retention policies.
Rockset expects developers to utilize the new roll-up feature to build real-time, interactive dashboards that reflect the current state of multiple data streams, including customer data, device data, and product data across e-commerce, logistics, delivery tracking, gaming leaderboards, fraud detection systems, health and fitness trackers, and social media use cases.
“We are accelerating people’s movement from batch to real time,” Venkataramani said. “And this is a very, very important piece, almost like a cornerstone for the industry to be able to do this real time rollup. That, and SQL. With that combination, now suddenly anybody can come to us and say, here’s the batch job I have. I want to make it real time. And now we can actually do that for them, end to end.”
Rockset’s cloud analytics offering separates compute and storage, which enables customers to scale both independently. The new roll-up feature is in public beta, and available to all customers.