Follow Datanami:
September 30, 2019

Higher Abstractions, Lower Complexity in Kafka’s Future

(Marina Sun/Shutterstock)

Apache Kafka has established itself as a leader in streaming data. If you want to handle billions (or even trillions) of events per day, Kafka is a pretty good bet. But for all its powerful abstractions for managing big event data, Kafka remains a complicated platform to set up and run. Confluent, the company behind Kafka, did something about that today at Kafka Summit.

Confluent kicked off its Kafka Summit event in San Francisco by announcing that customers can get started with its cloud version of Kafka, called Confluent Cloud, for free. Developers can put a credit card down and use up to $50 in services per month for three months, with no obligation to buy anything.

Of course, the company is gambling that you will want to continue processing data beyond that, but that’s your choice. It’s all about simplifying the Kafka experience, says Tim Berglund, senior director of developer experience at Confluent.

“Vanilla Kafka by itself, it’s fair to say, is difficult to administer. You don’t want to do that,” Berglund tells Datanami in an interview at the show. “At scale there are large operations, like WalMart, where they can make that investment in the operations personal, the expertise and all that. But Kafka itself doesn’t have to live at that scale. There are all kind of positions along the scale spectrum, where the economics of Kafka engage. And down lower, we don’t want to impose the overhead of Kafka operations on people. You’d rather have a service you can turn on.”

Confluent Cloud clearly looms large in the company’s long-term plan, especially as more enterprises look to expand their big data operations or establish new ones. Earlier this year, Gartner declared that the cloud is now the default deployment option for enterprises. Kafka clearly isn’t a database, in the classic sense. But it stores and processes data, just like a database does, and it has many of the same infrastructure requirements and characteristics of a database.

For Confluent, the long term goal is to give customers higher levels of abstraction so they can do more with their event data, while simultaneously lowering the level of complexity to make the offering easier to use.

Berglund summoned Grady Booch, one of the creators of Unified Modeling Language (UML), in explaining where Kafka and Confluent Cloud are headed.

“Booch said ‘The history of software is increasing layers of abstraction,'” Berglund says. “And right now, with Kafka as it stands, it’s kind of the abstraction level of a file system. It’s like the event file system. What Confluent Cloud currently gives you is topics. So the abstraction we give you is not cluster. It’s not broker. It’s not ‘Do you want a six-broker cluster or three? What’s your commitment going to be?’ It’s ‘Here are topics.'”

Companies use Kafka primarily in one of two ways, Berglund says. They either use it as the glue to connect microservices or for streaming analytics or streaming ETL. Kafka Streams and KSQL are the two big abstractions Confluent and the broader Kafka community have delivered for Kafka. But the future will likely see more abstractions, he says.

“The future of Kafka surely holds higher levels of abstractions,” Berglund says. “Those will happen in Confluent Cloud for sure when that stuff emerges.  But we’re already going from, you need to be thinking about how to administer things, to no, no, what you want is a topic.  Produce, consume, KSQL. Do your thing.”

Confluent Cloud makes Kafka available in a serverless manner on AWS, Google Cloud, and Microsoft Azure (the software can run on Kubernetes). According to a blog post published today by Senior Product Director of Confluent Cloud, Priya Shivakumar, $50 can go a long way.

“If you were to stream 1 GB of data in, retained that GB and did nothing else, it would cost you exactly $0.11 for data in plus $0.10 for storage with 3x replication, for a total of $0.41 on your bill that month,” she writes. “As an example development use case, let’s say you streamed in 50 GB of data, stored all of it, and had two consumers, so you streamed out 100 GB. That translates to $31.50 for the month.”

Confluent co-founder Jun Rao delivered today’s keynote at Kafka Summit San Francisco. The show continues tomorrow, with keynote appearances by co-founders Jay Kreps and Neha Narkhede.

Related Items:

Cloud Now Default Platform for Databases, Gartner Says

Exactly Once: Why It’s Such a Big Deal for Apache Kafka

Kafka Gets Streaming SQL Engine, KSQL

Datanami