Follow Datanami:
November 10, 2020

Confluent Moves to Boost Kafka Reliability

Apache Kafka has emerged as a leading platform for the management of data transaction streaming. Hence, users are placing a premium on maintaining Kafka uptime by anticipating and resolving issues before Kafka clusters go down, with the resulting loss of data.

Confluent Inc., the company behind Apache Kafka continues to expand its offerings with a monitoring tool that seeks to reduce the risk of Kafka downtime and data loss by identifying and resolving issues before an instance is lost. It is also promoted as a way of preserving Kafka instances without large infrastructure investments.

The “proactive support” platform dubbed Reliable is part of Confluent’s Project Metamorphosis launched earlier this year as a next-generation event streaming platform running in the Confluent Cloud. The initiative is designed to help users scale Kafka deployments while Reliable aims to keep the event streamer up and running.

Confluent said Tuesday (Nov. 10) its Kafka support platform provides real-time analysis of cluster metadata, alerting operators of potential issues before a crash. The monitoring tool also provides performance metrics that would allow support staff to trace the root cause of a Kafka performance issues.

The company said the impetus for the Kafka platform was about 1,000 support tickets that helped it gain insights into detecting performance issues. “We proactively tell you about issues we find with [customers’ deployments] based on experience running literally thousands of clusters,” said Tim Bergland, Confluent’s senior director of developer advocacy.

Hence, the company developed a framework by which cluster metadata is fed through its collection of algorithms to gauge performance and anticipate potential problems. Real-time “health metrics” are promoted as helping support engineers identify root causes faster, cutting the time to resolve support issues by as much as 25 percent, the company claimed.

Among the most common issues affecting the scaling of Kafka’s real-time performance and the threat of data loss are network and storage bottlenecks. Overloaded networks or storage media often bring down Kafka clusters, taking with them business events and other critical data.

The managed service also is designed to help IT administrators reduce the amount of time and resources devoted to operating the relatively complex, distributed Kafka environment.

In launching Project Metamorphosis this past spring, Confluent said it planned a series of monthly product and feature upgrades through the end of the year as it fleshes out its real-time streaming platform. The managed support service is the latest in that series of offerings.

Customers running Confluent 6.0 can opt in to use the initial release of the managed Kafka service. Alerts can be sent via integration with the Confluence control center. Alternatively, users can set up alerts via email or Slack.

Recent items:

Step One in Kafka’s Metamorphosis Revealed

Real-Time Data Streaming, Kafka and Analytics, Part 1: Data Streaming 101

Real-Time Data Streaming, Kafka and Analytics, Part 2: Going Beyond Pure Streaming