September 28, 2016

Commercial Kafka Distro Gets Global Smarts

Alex Woodie

Companies operating multiple Apache Kafka clusters in on-premise and cloud data centers will benefit from a handful of new enterprise-level features unveiled at the Strata + Hadoop World conference today by Confluent, the commercial open source company behind the popular big data message bus.

Confluent today announced that its enterprise-strength Kafka offering, called Confluent Enterprise, is getting three key new capabilities in the version 3.1 release that will ship next month, including multi datacenter replication, automated cross-cluster data balancing, and a cloud-migration facility.

The new multi data center (MDC) replication capability and data balancing features, in particular, are expected to significantly simply enterprise Kafka operations for multi-national organizations adopting Confluent Platform Enterprise as the core of their streaming data initiatives. More than 35% of the Fortune 500 companies have deployed Apache Kafka

“This is a really big deal for Kafka users because literally every single customer that Confluent has, has deployed Kafka across multiple data centers,” says Confluent co-founder and CTO Neha Narkhede. “The fact that data lives across different locations actually means you need to synchronize data.”

The MDC replication capability is superior to a previously available open source product called MirrorMaker. The proprietary MDC replication capability that Confluent will deliver in the subscription-based product is more advanced and production-ready than MirrorMaker, Narkhede says.

“We learned a ton from that experience [developing MirrorMaker] and baked it into the tool,” Narkhede tells Datanami. “This is asynchronous across data centers in our first version. What we will guarantee is that every single piece of data in on particular data center is in order to the other data center. And there will be full monitoring end-to-end to make sure that users have visibility into how that’s happening.”

Confluent Platform Enterprise customers will be able to set up the replication to suit their specific needs. “You can set it up to mirror one-to-one copies of all your data across data centers,” Narkhede says, “or you set up more of a geo-replication mode, which is feeding multiple data centers into one data center that might be feeding your analytical systems in parallel.”

The new automatic data balancing feature will help to ensure that production Kafka clusters stay online as new nodes are added to handle demand.

Confluent Enterprise includes a mix of open source and proprietary components

“This capability in Confluent Enterprise offers a fully automated solution and a very efficient algorithm to balance data efficiently across a cluster of machines while respecting user-defined quotas,” Narkhede says. “This is the second proprietary capability in Confluent Platform, and pretty much a feature that the Kafka community has been waiting for for a really long time.”

The last major new feature—the new facility to automatically move data from on-premise clusters to cloud-based Kafka clusters—is a logistical extension of the MDC and data-balancing features. Confluent developed it because many Kafka customers with multiple data centers invariably have data and applications reside ding in the cloud.

“For a lot of companies, what that actually means is they’re trying to move data from on-premise to the cloud,” Narkhede says. “When you do that, you need a common replication layer that allows you to copy data from your premise data center to the cloud.”

2016 has been a big year for Confluent, which has emerged as one of the hottest tech startups thanks to the rise of Apache Kafka as the defacto standard data transport method for moving big data from their source to their destination. While Kafka itself serves as the data movement layer, Confluent is also looking to take Kafka “up stack” by adding basic stream processing capabilities to the project.

Kafka Streams, which is part of the free and open source Apache Kafka offering, provides simple stream processing capabilities, including data transformation for streaming data. Kafka Connect, meanwhile, is the Apache Kafka component that provides an array of connectors for streaming data from an array of sources to their destination.

Confluent shared a few other facts about Kafka adoption. The company says that Kafka is being used by seven of the top 10 global banks, eight of the top 10 insurance companies, nine of the top 10 U.S. telecom companies and six of the top 10 travel companies.

The Real-Time Rise of Apache Kafka

Kafka Gets a Stream-Processing Makeover

Editor’s note: This article has been corrected. The new MDC replication feature is not based on Mirror Maker, as the story previously stated. Also, Kafka Streams and Kafka Connect are part of Apache Kafka, not part of Confluent’s platform. Datanami regrets the errors.

Applications: Complex Event Processing, Predictive Analytics

Technologies: Cloud, Frameworks, Middleware, Network

Sectors: Financial Services, Healthcare, Retail

Vendors: Confluent

Tags: Apache Kafka, Confluent, real-time analytics, streaming data

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Commercial Kafka Distro Gets Global Smarts

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Commercial Kafka Distro Gets Global Smarts

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link