Follow Datanami:
October 10, 2019

Flink Gets Extension into Stateful Functions

Ververica, the company behind open source Apache Flink, this week unveiled Stateful Functions, a new framework designed to extend Flink into the world of distributed, stateful applications.

Stateful Functions is a collection of tools designed to give developers the ability to create stateful applications that run in the modern serverless manner. The software provides developers a set of stateful functions (hence the name), while the software runs atop Flink’s distributed data processing engine.

Apache Flink has emerged as a leading stream processing framework that gives developers an abundance of capabilities for building distributed, event-driven applications. Organizations that want to respond to events, like a human making a purchase or a sensor detecting a spike in log-in attempts, would do well to select Flink or a framework like it to build the application.

However, these Flink apps, for the most part, run in a stateless manner. While Flink does offer some functions for allowing developers to work with state, there are limitations to what the developer can achieve with them.

Stateful Functions extends Flink to merge stateful computing concepts with Flink’s distributed, serverless paradigm

According to Ververica, Stateful Functions was designed to overcome these limitations by “enabling users to define loosely-coupled, independent functions with a small footprint that can interact consistently and reliably in a shared pool of resources,” Ververica’s co-founder and CTO Stephan Ewen and Marta Paes, the company’s producdt marketing manager, write in a blog post.

The company says the API behind Stateful Functions is based on “small snippets of functionality that encapsulate business logic, somewhat similar to actors.” There is typically one function per entity (such as a user or a stock item). “Each function has persistent user-defined state in local variables and can arbitrarily message other functions (including itself!) with exactly-once guarantees,” Ewen and Paes write.

The new library is a good choice when a developer needs to maintain a stateful connection to entities, but does not want to give up the benefits of an event-driven architecture or break away from the function as a service (FaaS) style of development. The new Stateful Functions approach is not designed as a replacement for stateless computing, but instead offers a new path forward when both approaches are needed.

Stateful Functions is all about computing over state, not from state, Ververica says. “The major advantage of this model is that state and computation are co-located on the same side of the network, which means you don’t need the round-trip per record to fetch state from an external storage system (e.g. Cassandra, DynamoDB) nor a specific state management pattern for consistency (e.g. event sourcing, CQRS),” Ewen and Paes write.

Another advantage is elimination of the need to manage in-flight messages and maintain complex replication or repartition strategies, the company says. Persistence is maintained by keeping a connection to an object store for snapshots, it says. The new approach also provides high throughput for both real-time stream processing as well as offline batch processing, “allowing you to blur the boundaries between event-driven applications and generic data processing,” the company says.

Ververica (formery data Artisans) made the announcement Tuesday at Flink Forward, which was held this week in Berlin, Germany. The company, which was acquired by Alibaba earlier this year, is releasing the software under an Apache 2.0 license.

“Orchestration for stateless compute has come a long way, driven by technologies like Kubernetes and FaaS — but most offerings still fall short for stateful distributed applications. Handling state consistently and interacting reliably between services poses significant challenges to the overall ease of development” Ewen stated in a press release. “Stateful Functions is a big step towards addressing those shortcomings, bringing the seamless state management and consistency from modern stream processing to the space.”

Related Items:

Alibaba Acquires Apache Flink Backer data Artisans

Flink Delivers ACID Transactions on Streaming Data

Datanami