Follow BigDATAwire:
September 10, 2024

Confluent Nabs Competitor WarpStream to Bolster Streaming in the Cloud

Well, that was quick. Barely a year after Kafka-compatible streaming startup WarpStream Labs opened its virtual doors, it’s been acquired by Confluent, the commercial outfit behind Apache Kafka. The big get for Confluent is WarpStream’s S3-based data streaming offering, which WarpStream claims eliminates the expensive inter-networking fees that plagues Kafka in the cloud. Confluent also takes out a potential competitor.

WarpStream Labs was founded in July 2023 by two Datadog engineers, Richard Artoul and Ryan Worl, with the goal of delivering a fast cloud-native data streaming platform that was fully compatible with Apache Kafka, which continues to dominate the streaming data landscape. Artoul, who is the CEO, described the Chicago-based company’s unique streaming architecture in his introductory blog post, appropriately titled “Kafka is dead, long live Kafka.”

“WarpStream is an Apache Kafka protocol compatible data streaming platform built directly on top of S3,” Artoul writes. “It’s delivered as a single, stateless Go binary so there are no local disks to manage, no brokers to rebalance, and no ZooKeeper to operate. WarpStream is 5-10x cheaper than Kafka in the cloud because data streams directly to and from S3 instead of using inter-zone networking, which can be over 80% of the infrastructure cost of a Kafka deployment at scale.”

WarpStream CEO and co-founder Richard Artoul

The movement of Kafka data within an Availability Zone is a real problem for the Kafka architecture, Artoul says, and ultimately contributes to data storage costs that are about 10-20x more per GiB than S3 storage.

“Kafka was designed to run in LinkedIn’s data centers, where the network engineers didn’t charge their application developers for moving data around,” Artoul wrote in his introductory blog. “But today, most Kafka users are running it on a public cloud, an environment with completely different constraints and cost models. Unfortunately, unless your organization can commit to 10s or 100s of millions of dollars per year in cloud spend, there is no escaping the physics of this problem.”

Instead of building custom tools to help automate the management of Kafka data, Artoul and Worl decided to take a radically simiplified approach. They were informed by their work at DataDog, where they built a columnar database for observability data running directly on S3. “When we were done, we had a (mostly) stateless and auto scaling data lake that was extremely cost effective, never ran out of disk space, and was trivial to operate,” he write. “Almost overnight our Kafka clusters suddenly looked ancient by comparison.”

WarpStream Cloud architecture (Image courtesy WarpStream Labs)

By developing WarpStream around S3, Artoul and Worl felt they were following in the footsteps of Databricks and Snowflake, which “lean into cloud economics by designing their systems from teh ground up around commity object storage.”

WarpStream Labs was growing its BYOC offering and had a full complement of features it was looking to add. It has raised $20 million in venture capital, and had more than a dozen employees, in addition to companies like Grafana Labs, Zomato, PostHog, and others. Then Jay Kreps, the CEO and co-founder of Confluent and one of the co-creators of Kafka, came calling on the two founders.

Kreps liked the BYOC approach that WarpStream had taken, particularly as it pertains to enabling customers to maintain control of their data while also delivering a fully managed experience in in customers’ own cloud accounts. That was something that Confluent had been working on, too.

“When we looked at products that worked this way they were often the worst of both worlds: self-managed data systems that had been forklifted into the cloud with semi-managed models that left responsibility for security and uptime pretty vague,” Kreps wrote in a blog post yesterday.

It was WarpStream’s “unique architectural approach” that caught Kreps’ attention, and what ultimately led to a meeting in New York a few months back. That meeting eventually culminated in a deal getting done, and yesterday’s acquisition announcement, the terms of which were not disclosed.

WarpStream costs vs hosted and self-managed Kafka (Image courtesy WarpStream)

Confluent’s plan calls for the WarpStream product to continue to be developed and supported. It will sit smack dab in the middle of the Confluent lineup, right between Confluent Platform, which delivers lots of control but is difficult to manage, and Confluent Cloud, which is easy to manage but doesn’t offer a lot of control.

“I’ve been deeply impressed with WarpStream–it’s BYOC done right,” Kreps says in a press release. “With this acquisition, we have a data streaming offering for everyone.”

In particular, Confluent sees WarpStream being adopted by organizations running “large scale workloads with relaxed latency requirements in their own cloud environment.” That would include things like processing huge observability streams and loading data lakes.

“Today the WarpStream product is still a young startup product, so it won’t immediately be a fit for all customers,” Kreps says in the blog, “but we plan to invest in security and hardening over time to bring it up to the same enterprise-grade standards as Confluent Platform and Confluent Cloud, as well as integrate it into our systems for ease of signup, billing, and account management.

There will also be some sharing of components between WarpStream and the Confluent-developed products, including things like data connectors, stream processing, and governance solutions, Kreps says.

(Image courtesy Confluent)

Meanwhile, the WarpStream team has a full list of features they’re working on, including support for Kafka transactions, cluster quotas, a BYOC schema registry, a mirroring product called Orbit, active-active multi-region clusters, and customer control planes for Google Cloud and Azure (it currently only supports AWS).

“Many announcements like this proceed to lament that the product is shutting down or radically changing, but we’re doing quite the opposite,” Artoul wrote in his blog post today, appropriately titled “WarpStream Is Dead, Long Live WarpStream.” WarpStream is about to get better–a lot better–with the resources and backing of the leader in streaming.”

Related Items:

Confluent Adds Flink, Iceberg to Hosted Kafka Service

Confluent Works to Hide Streaming Complexity

Confluent Expands Apache Flink Capabilities to Simplify AI and Stream Processing

 

BigDATAwire