The latest release of the Apache Flink framework introduces the capability to execute exploratory SQL queries on data streams from a new SQL client, which will open the processing framework to a new class of users.
Apache Flink has contained SQL functionality since Flink version 1.1, which introduced a SQL API based on Apache Calcite and a table API, too. While the combined SQL and Table API today provides valuable ways for developers to apply well-understood relational data and SQL constructs to the world of stream data processing, its usefulness is somewhat limited.
For starters, only Scala and Java experts can avail themselves of API, according to the description of the new SQL client, which is codenamed FLIP-24. What’s more, any table program that was written with the SQL and Table API had to be packaged with Apache Maven, a Java-based project management tool, and submitted to the Flink cluster before running.
With the launch of the SQL CLI Client in Flink version 1.5, the Flink community is taking its support for SQL in a new direction. According to the FLIP-24 project page, providing an interactive shell will not only make Flink accessible to non-programmers, including data scientists, but it will also eliminate the need for a full IDE to program Flink apps. With millions of SQL-loving data analysts out there, the benefits could certainly be vast.
Apache Flink 1.5 users can execute SQL queries against streaming and batch data using the new SQL CLI Client
The Flink CLI Client’s command line interface (hence CLI) doesn’t do much for those who value beautiful GUIs, but for those who want to wield the power of SQL on streaming data, the SQL CLI Client could be just what the doctor ordered. In addition to running SQL queries on streaming or batches of data, the new SQL client will also provide a REST or JDBC interface for BI tools or dashboards.
The Flink framework has emerged as a solid contender for building modern and flexible stream processing applications. In addition to providing a true continuous streaming capability (which is something that Apache Spark has only recently delivered), Flink also provides the capability to run all sorts of operations on batch and streaming data sets, including tumbling, sliding, and windowing operations. Flink’s fault-tolerance, exactly once sematics, high throughput, and low latency has earned it the business of some very large organizations, including Disney, whose use of an AWS-based Flink system for a streaming analytics use case we profiled in a recent Datanami story.
According to a data Artisans blog post by Fabian Hueske and Till Rohrmann, Flink 1.5 is the sixth major release of the framework and contains fixes more than 780 issues. In addition to the new SQL client, Flink 1.5 brings:
- A redesigned and re-implemented process model (FLIP-6), which will enable a “more natural Kubernetes deployments,” support for a HTTP/REST protocol for all external communications, and better resource utilization on YARN and Mesos clusters;
- Support for streaming broadcast state (FLINK-4940) that will make it easier to replicate a control/configuration stream that runs parallel to and operates on the actual data stream, thereby eliminating a barrier to scale;
- Improvement to the network stack (FLINK-7315), which will reduce latency by minimizing data sent over the wire, and also reduce the amount of time required to complete a checkpoint in “backpressure” situations;
- A new local state recovery (FLINK-8360) capability that will “significantly reduce the time to recover an application with large state from a failure”;
- Support for “windowed outer equi-joins” with the SQL and Table API
More than 100 people contributed to the latest release of Flink, according to data Artisans. For more info, see flink.apache.org.
How Netflix Optimized Flink for Massive Scale on AWS
Flink Aims to Simplify Stream Processing
Flink Distro Now Available from data Artisans