Follow Datanami:
March 24, 2023

Apache Flink 1.17 Update Drives Streaming Data Warehouses

The folks at the Apache Flink project have announced a 1.17.0 release of the popular open source distributed framework for streaming data use cases.

Apache Flink is the leading stream processing standard, and the concept of unified stream and batch data processing is being successfully adopted in more and more companies. Thanks to our excellent community and contributors, Apache Flink continues to grow as a technology and remains one of the most active projects in the Apache Software Foundation,” the Apache Flink project management committee said in an announcement.

The Flink 1.17 release is geared towards optimizing streaming data warehouses, which are a modern data storage and processing solution for handling real-time or near-real-time data streams. Many traditional data warehouses are primarily batch-oriented where data is only loaded at scheduled times, but streaming warehouses continuously ingest, process, and analyze data as it is generated, allowing for analytics and decision-making based on the newest data available. Flink is a popular choice for implementing streaming warehouses because the framework was specifically designed for large-scale, low-latency data stream processing.

The 1.17 release has several features and improvements for data stream processing. One feature is streaming SQL semantics which addresses non-deterministic operations challenges by fixing incorrect optimization plans and functional issues. An experimental feature has been introduced to inform SQL users of potential correctness risks and optimization suggestions. There are also enhanced checkpoint improvements to improve speed, stability, and usability. A new REST interface allows users to manually trigger checkpoints with custom types during job execution.

Another enhancement has been made to watermark alignment to enhance coordination and reduce excessive buffering by downstream operators. Additionally, The FRocksDB update brings improvements to RocksDBStateBackend, including shared memory between slots and support for the Apple M1 chip.

Flink 1.17 also has updates to support batch processing. There is a new delete and update API in Flink SQL for batch mode, enabling row-level modifications in external storage systems. Enhancements to batch workload stability and performance have been made. Flink 1.17 introduces a “gateway mode” for SQL Client, enabling users to submit queries to a SQL Gateway for advanced functionality. Additionally, users can now manage job lifecycles through SQL statements.

Apache Flink continues to garner interest due to its unique ability to run stream processing with very large state or high throughput. In a recent article, Robert Metzger, a member of the Apache Flink PMC, notes that “In 2022 alone, a total of at least $55 million has been invested by venture capitalists into startups building companies around Apache Flink.” Examples of companies investing in Flink are Confluent and its recently acquired Immerok, and also AWS, which offers Flink as a hosted service.

“Flink is hot because the community of data scientists and infrastructure engineers have decided that the future is Flink. We have all the ingredients: well-funded startups, well-resourced enterprises loaded with engineering talent, a battle-tested and open-source technology, and a huge market that is rapidly emerging from an early state into one that is looking to modernize data stacks to become real-time,” wrote Metzger.

Related Items:

Five Drivers Behind the Rapid Rise of Apache Flink

Confluent to Develop Apache Flink Offering with Acquisition of Immerok

Preventing the Next 9/11 Goal of NORAD’s New Streaming Data Warehouse

Datanami