People to Watch 2019
Original Creator and PMC Apache Flink
Co-Founder and CEO dataArtisans
Kostas Tzoumas is one of the original creators of the Apache Flink framework, and a PMC member of Apache Flink. He is the co-author of Introduction to Apache Flink: Stream Processing for Real Time and Beyond (O’Reilly Media, Oct. 2016). He was co-founder and CEO of dataArtisans, which was acquired by Alibaba Group and renamed Ververica.
Datanami: Kostas, Flink has pushed the state of the art in what a stream processing engine can do. Where do you see Flink going in five years?
Kostas Tzoumas: We created Apache Flink as an open source framework for efficient and user-friendly data processing at massive scale. Our team has accomplished a lot in the past years and since we constantly strive for more innovation in the stream processing space, we can expect that the scope of Flink will expand in the future to a number of directions.
The size of the stream processing market is set to increase in the coming years at a CAGR% of 20%. With this healthy growth in the industry and more companies adopting stream processing, we see new use cases extending the boundaries of Apache Flink to new levels on a daily basis. With new use cases come additional requirements for stream processing and Flink: requirements on the framework’s scalability, interoperability and robustness which are some areas we have and will continue focusing in the future.
We expect that Flink will become the defacto data processing framework, not only for applications that we think today of as streaming but for a number of mission-critical applications in the enterprise. This is due to developments towards a unified data processing framework that is robust, with massive capabilities and great interoperability with other systems and technologies.
This is the direction that we see Flink developing as it becomes the de facto data processing technology for the modern enterprise.
Datanami: We’ve seen batch and stream processing begin to merge. Do you think that batch will completely go away and become just a part of real-time streaming?
Apache Flink has matured rapidly, and as you mention it provides a unified platform for both stream and batch processing. From our viewpoint, there is no fundamental reason for this distinction. Many applications need both of these capabilities so in the end, the conversation is about data processing that offers the adequate scalability and latency levels to the respective applications.
For example, there are many applications that continuously analyze data by collecting continuously produced data (in a database or data lake) and periodically running batch jobs on the recorded data. For us, it’s not about batch processing going away but rather developing a technology that is robust, scalable and handles data from any type, shape or form with the same unparalleled speed, fault-tolerance and robustness.
As an example, Alibaba recently contributed Blink, their previously closed fork of Flink to the Apache Flink community. Blink already has several improvements for running batch workloads, and we expect those to be part of the main Apache Flink branch soon.
Datanami: Outside of the professional sphere, what can you share about yourself that your colleagues might be surprised to learn – any unique hobbies or stories?
I was never much of a hobby person – I remember myself usually being all-in in a project or not at all. Maybe this is something I should change :-). As a Greek, I cannot imagine myself not going back home at least once per year to swim in the Aegean Sea. Thankfully, this is something that I have managed to do pretty much every year despite the work needed to start and scale a software company.