Follow Datanami:
March 19, 2024

Confluent Adds Flink, Iceberg to Hosted Kafka Service

(Blue Planet Studio/Shutterstock)

Confluent today made a pair of big announcements at its Kafka Summit London event, including the general availability of Apache Flink and the addition of Apache Iceberg support in its Kafka-based cloud offering. The new features will bolster the analytics capability and the accuracy of the popular streaming data service.

For some time, Confluent has been working on adding support for Apache Flink, a distributed processing engine for streaming data that has emerged as one of the most powerful tools for real-time data. Paired with the Apache Kafka streaming messaging bus, the addition of Flink significantly bolsters what developers can do with real-time data.

By integrating Flink and Kafka, Confluent is eliminating painful integration work that stems from different data formats and inconsistent schemas, which can hinder the quality of streaming data for downstream systems and consumers, according to IDC analyst Stewart Bond.

“A fully managed, unified Kafka and Flink platform with integrated monitoring, security, and governance capabilities can provide organizations with a seamless and efficient way to ensure high-quality and consistent data streams to fuel real-time applications and use cases, while reducing operational burdens and costs,” Bond said in a press release.

Confluent also touted Flink’s capability to create data pipelines to help feed data into vector databases, which are important tools for supporting generative AI applications. Confluent says it supports vector databases from Elastic, Pinecone, Rockset, SingleStore, and Zilliz.

Confluent customers ACERTUS plans to use the addition of the serverless Flink service to bolster the accuracy and availability of vehicle-location data in its transportation management system. Similarly, Dutch energy company Essent is looking forward to using the combination of Flink and Confluent Cloud to extract data from various sources and route it to downstream customers for timely analyses.

In other news, Confluent announced that it’s now supporting Apache Iceberg, an open source table format originally developed at Netflix to resolve data consistency issues and introduce transactions to data lake environments.

The new Tableflow feature in Confluent Cloud is designed to make it easier to get streaming data into data lakes and data warehouses, thereby enabling organizations top query data with different engines without worrying that the results could be wrong.

Support for Iceberg within Confluent Cloud enables customers to convert Kafka topics, associated schemas, and metadata to Iceberg tables with a single click, which enables it to better support analytic workloads in data lakes and data warehouses, Confluent says. Previously, the process of changing real-time data flowing through Kafka into an Iceberg table was not easy, the company says.

“This can be a time-consuming and complex process that requires careful management of data formats and schemas,” Confluent says in a press release. “As a result, many companies must execute complex migrations which can be resource-intensive, resulting in stale and untrustworthy data and increased costs.”

Tableflow addresses those concerns by simplifying the Iceberg transformation process. It works by allowing users to “easily materialize Kafka topics and schemas as Iceberg tables in one click to feed any data warehouse, data lake, or analytics engine for real-time or batch processing use cases,” Confluent says.

In addition to materializing topics to Iceberg tables, the new Tableflow function materializes any associated schemas, and also ensures that Iceberg tables are kept up-to-date. Confluent also made sure that Tableflow works with its Stream Governance capabilities and Apache Flink to “clean, process, or enrich data in-stream so that only high-quality data products land in your data lake,” the company says.

Tableflow is a feature of Confluent Cloud’s Kora Engine, and will be available soon.

The company also announced a new iteration of Connect, its collection of 80-plus connectors that enable Confluent Cloud customers to build connections to data sources and destinations. Confluent says it has brought new security, usability, and pricing enhancements, including support for private networks using DNS Forwarding and Egress Access Point on AWS and Microsoft Azure.

Confluent is also turning on its Stream Governance feature by default, thereby giving customers access to governance features like schema registry, data portal, and real-time stream lineage.

Related Items:

Confluent Works to Hide Streaming Complexity

Confluent to Develop Apache Flink Offering with Acquisition of Immerok

Intimidated by Kafka? Check Out Confluent’s New Developer Site