Follow Datanami:
March 24, 2022

StarTree Keeps Real-Time Analytics Fresh with New Options for Pinot


Since it was created at LinkedIn in 2015, interest in Apache Pinot–the distributed storage and analytics engine for real-time analytics–has steadily grown. But over the past year, the number of downloads and active participants in the open source project jumped significantly. Now StarTree, the commercial entity behind Pinot, is ramping up its options for customers to run Pinot in the cloud.

In many ways, Apache Pinot is the technological successor to Apache Kafka, the distributed pub/sub messaging system created by three LinkedIn developers, Jay Kreps, Neha Narkhede and Jun Ra, and released as open source in 2011. The fact that Narkhede has invested in StarTree, and Confluent’s former director Tim Berglund recently joined StarTree as vice president of developer relations, help to bolster that point.

“Apache Pinot is what comes after Apache Kafka,” Rohit Agarwalla, head of product for StarTree, tells Datanami. “It’s the natural progression as customers move to putting more and more data into real time.”

That’s not to say that Pinot is a replacement for Kafka. Many Pinot users rely on Kafka as the real-time messaging bus that delivers events, although Pinot works with many others, such as Apache Pulsar, and Amazon Kinesis. Pinot does not directly compete with Kafka; instead, it’s complementary to the popular platform as it seeks to be a super-fast index (200,000 queries per second) for the book of real-time interactions that is constantly being logged by Kafka.

Agarwalla describes Pinot as an OLAP system for real-time data. Customers often struggle to run the analytics that give them insight in time for it to matter. With its natively distributed architecture and optimized StarTree index for columnar data, Pinot was designed to deliver SQL analytics against huge amounts of fast-moving data at very low latencies.

“As you’re generating all this real time data, you need a system that can also serve analytics on this real time data, actually in real time,” Agarwalla says. “And that’s why we continuously see that Confluent folks, and of course Tim being one that popularized stuff at Kafka, coming here to help us out with popularizing Pinot.”

Interest in Pinot grew steadily for years, before growing much more quickly in the past year. According to Agarwalla, there have been more than 1 million downloads of Pinot Docker images over the “past year or so.” The open source project’s Slack list now sits at about 2,000 individuals. Those two metrics have grown 16x and 13x over the past year, Agarwalla says. “So just phenomenal growth in terms of the open source project itself,” he adds.

Pinot in the Cloud

Like most commercial open source ventures, the Mountain View, California company launched a cloud offering, in which StarTree manages Pinot in customer’s own private cloud environments. Up to this point, that offering has been on a trial basis. On Tuesday, the company announced that this bring your own cloud (BYOC) offering is now generally available on Amazon Web Services and Google Cloud, with support for Microsoft Azure coming soon.

Many Pinot customers prefer the BYOC approach because they remain in control of their data, Agarwalla says. “They didn’t want to move the data out of that environment because of cost and compliance and regulatory reasons,” he says. “They still wanted to run Pinot, but not by themselves. And so with BYOC, we give them the Pinot environment all deployed and operated remotely from the StarTree cloud environment. So the data sits in their own cloud, but our control plane effectively manages everything in their environment.”

Simultaneously, StarTree this week announced the beginning of trials for its software as a service (SaaS) offering. This offering–which is also being offered in AWS and GCP with Azure support coming soon–is ideal for customers who are comfortable putting their data into StarTree’s hands.

StarTree provides an OLAP storage and query engine for fast-moving data (Image courtesy StarTree)

“Given the high adoption of SaaS in general in our industry, customers are getting more comfortable with sending the data out to SaaS and that’s why we do have this SaaS edition,” Agarwalla says.

Pinot, like Kafka and other modern distributed applications, has a lot of knobs and switches, which can be great for technologists who have the time to understand how it all works, but can be overwhelming for customers who just want to do analytics on real-time data and don’t necessarily want to become experts in the underlying technology. When you add various flavors of Kubernetes into the mix, the technological complexity goes up even more. Both the BYOC and SaaS offerings help to simplify life for customers.

“In terms of the flavor of Kubernetes, whatever the cloud provider’s Kubernetes version is, we make sure that we soup-to-nuts deploy that as part of our deployment,” Agarwalla says. “And so the customers don’t have the overhead to manage different versions of Kubernetes or different versions of Pinot when this gets deployed.”

A Growing Base

StarTree has raised $24 million to date, and is now looking to ramp up sales and marketing efforts to get customers to try its brand of real-time analytics for real-time data. Some analysts suggest that 30% of the global datasphere will consist of real-time data by 2025, so the time to act is now.

LinkedIn and Uber are documented Pinot users, but they run their own environments. StarTree is showcasing the stories of two of its cloud customers, including Guitar Center, which runs 300 retail locations and an active e-commerce site, and Just Eat, the parent company of brands like, Just Eat, SkipTheDishes, Grubhub, and Menulog.

Saritha Ivaturi, the vice president of data platform and engineering at Guitar Center, says Pinot helps her company find insights and uncover blindspots in the data.

“During the last Black Friday event, we were able to implement user-facing analytics on real-time data residing in Apache Pinot and discover opportunities to serve our customers better in real time,” Ivaturi says. “The BYOC approach in StarTree Cloud gives us a managed service running in our own cloud environment where the data resides, balancing our internal security and compliance requirements with ease of use.”

In addition to delivering the BYOC edition of StarTree Cloud to the wider market, StarTree is also onboarding customers to its SaaS edition. With the SaaS edition, users need to only make data available in StarTree Cloud, accelerating time to value with managed Apache Pinot clusters running in StarTree’s Cloud environment.

StarTree Cloud customer Just Eat (parent company of brands including, Just Eat, SkipTheDishes, Grubhub, and Menulog), is a leading global online food delivery marketplace that focuses on connecting consumers and restaurants through its platforms.

The SaaS edition of StarTree Cloud has made it very easy for Just Eat to get started with Apache Pinot and real-time applications, says Soyinka Majumder, the company’s head of marketing analytics. “We were able to ingest batch data and use real-time applications that helped significantly reduce Mean Time To Detect (MTTD) and Mean Time To Respond (MTTR) for key business metrics issues,” he says in a press release.

Related Items:

Cloud Analytics Firm StarTree Receives $24 Million in Series A Funding

Real-Time Data Streaming, Kafka and Analytics Part 3: Effective Planning for Data Streaming Improves Data Analytics

8 New Big Data Projects To Watch

Editor’s note: This story was updated to reflect the relationship between Pinot and Kafka.