Follow Datanami:
May 3, 2018

Want Kafka on Kubernetes? Confluent Has It Made

If the idea of streaming massive amounts of data on a virtual server platform that spans public cloud and on-premise clusters piques your interest, you’re not alone. Many organizations have attempted to wire Apache Kafka to run on the Kubernetes cluster manager, but it’s a challenging technical feat that few have succeeded at. Today Confluent announced that it’s solved the challenge and will start selling the kit later this year.

The combination of Apache Kafka and Kubernetes seems like a match made in big data heaven. Kafka has emerged as the next-generation messaging bus for streaming data, amassing millions of downloads of a free and open source product that’s both easy to use and very powerful. Similarly, Kubernetes has emerged as the defacto standard for cloud containerization systems thanks to its capability to cleanly insert a run-time abstraction on top of physical infrastructure, thereby enabling users to scale clusters up and down and move containerized workloads from cloud to on-premise, basically at will.

But like most things in IT, the devil is in the details. “It’s actually not that easy,” says Neha Narkhede, the CTO and co-founder of Confluent, the commercial venture behind open source Apache Kafka. “Kubernetes is amazing, but it was designed for stateless applications.”

Like all stateful applications, Kafka makes certain assumptions about the infrastructure that it’s running on. “If machines come and go, you have to maintain the logical context of what a node is,” Narkhede tells Datanami. “As the underling hardware changes, you need to make sure that that node concept stays the same. In addition to that, there’s a bunch of networking-layer details that need to be right.”

But the tougher challenges were centered around Kafka nuances. “Just managing configuration correctly so that your Kafka cluster can be deployed across Availability Zones when you’re deploying it in a public cloud, or across racks when you’re deploying it on premise,” Narkhede says, “and being able to do that even as these pods or nodes….come and go. Managing that on the fly is one of the difficulties.”‘

Performing rolling restarts and upgrades of Kafka is another area that demanded attention. “You need to make sure that you’re never really taking a partition or part of your data offline,” she says. “In particular, you need to make sure that data is evenly re-balanced, even as you add more nodes …..  And when you’re removing them, that data is balanced elsewhere.” Narkhede addressed some of the challenges in greater detail in a blog post published today.

Confluent has addressed these Kafka-on-Kubernetes challenges in Confluent Cloud, its Kafka-as-a-service running on the Amazon Web Services and Google Cloud Platform, where it runs Kafka on Docker containers managed by Kubernetes. Others in the growing Kafka community have tried to solve them too, with mixed success.

“These are pretty hard to do,” Narkhede says of the extensive tuning Confluent did to make Kafka run on Kubernetes. “There’s a whole bunch of gotchas when trying to do this yourself using Kubernetes.  There are many mistakes that people end up making.  Because ultimately, they’re not really the experts of Kafka.  They may know Kubernetes from deploying other stateless applications as part of their microservices transition, but really something stateful like Kafka needs a lot of care and careful consideration.”

Confluent is taking what it learned from running Kafka on Kubernetes in the Confluent Cloud and making it part of the Confluent Platform, its full-stack streaming data solution that’s built on Apache Kafka. The Kubernetes knowledge is being packaged into Confluent Platform for Kubernetes. The key deliverable is Confluent Operator, an API that gives customers the ability to run Kafka on their own Kubernetes assets, including those running on premise and in the public cloud.

Besides tuning Kafka to run on Kubernetes, much of the work that Confluent did in preparing for todays’ launch revolved around ensuring the Confluent Operator works well with popular Kubernetes distributions, including Pivotal Cloud Foundry, Heptio Kubernetes subscription, Mesosphere DC/OS, and OpenShift, which is backed by Red Hat. On the cloud, it’s supported on Amazon’s Elastic Container Service for Kubernetes (EKS), the Google Cloud Kubernetes Engine, and Microsoft Azure Container Service (AKS).

Neha Narkhede is co-founder and CTO of Confluent

Today’s announcement drew praise from the Kubernetes community. “Kubernetes radically simplifies the deployment and operation of connected systems and is an ideal operating environment for enterprise workloads,” Craig McLuckie, Heptio’s CEO and co-founder, states in a press release. “Given the critical role that Kafka plays in modern application architectures, Confluent’s commitment to supporting Kubernetes is an important milestone in the cloud native computing journey.”

Tobi Knaup, CTO and co-founder at Mesosphere, also welcomes the news. “Many of our customers run Apache Kafka and Confluent Platform on DC/OS to enable IoT and machine learning use cases,” he states in a press release. “We’re looking forward to collaborating with Confluent delivering the best customer experience for Apache Kafka and Kubernetes.”

In addition to the Confluent Operator, Confluent is making several deliverables available to help customers get started on Kubernetes, including production-ready Confluent Platform Docker images, configurable deployment templates for Kubernetes, and a reference architecture with best practices for Kafka on Kubernetes.

Confluent has focused most of its Kubernetes-related attention on supporting Kafka on Kubernetes. But there is more to the Confluent Platform than just Kafka, including enterprise features like security and monitoring, not to mention streaming analytics and the recently added support for SQL. “We’ll start with Kafka,” Narkhede says. “The goal is to automate the entire Confluent Platform on Kubernetes.”

The rollout will begin this summer with an early access program for Confluent Platform for Kubernetes, followed by general available later this year. Confluent does not want to rush into the Kubernetes world, and so will work with early adopter to ensure a smooth adoption, according to Narkhede.

The other thing that Confluent does not want to do is open source the Kubernetes work. “At this time our intention is not to open source this,” says Narkhede, who is a 2017 Datanami Person to Watch. However, there will be a trial version of Confluent Platform for Kubernetes that people can download and use for limited period of time. If they decide that they want to move forward with Kafka on Kubernetes, they can purchase a subscription to Confluent’s platform.

As Kafka use rises, so does Confluent’s (privately held) stock. Every company must generate revenue, and it would appear that enabling Kafka to run on Kubernetes is part of Confluent’s plan to make money.

Enabling two important technologies to work together is a big deal. Narkhede right points out: “This a big stepping stone to enabling the vision of enabling streaming data everywhere, no matter where you are, making that detail go away.” Confluent is betting that people will pay to make that detail go away. Time will tell if it’s correct.

Related Items:

Confluent Adds KSQL Support to Kafka Platform

Fueled by Kafka, Stream Processing Poised for Growth

A Peek Inside Kafka’s New ‘Exactly Once’ Feature