Real-time stream processing is one of the hottest topics this week at Strata + Hadoop World, and one of the new frameworks turning heads is Apache Flink. Developed by the German company data Artisans, Flink is unique in that it aims to simplify the big data analytics stack with “streaming first” mentality, and the project’s momentum was rewarded today with a $6 million Series A round led by Intel Capital.
You may not have heard about Apache Flink, but chances are good you will hear more about it in the future. The in-memory stream processing framework is similar in some respects to Apache Spark, but differs in key areas, according to data Artisans co-founder and CEO and Flink committer Kostas Tzoumas.
“They both aim to cover a lot of analytics use cases,” Tzoumas tells Datanami. “The difference is that Flink starts with the premise that you see everything as a stream.”
While Spark and Hadoop have generated a lot of traction for all kinds of big data use cases, Tzoumas says, neither is ideal for building real-time streaming applications. Tzoumas and his colleagues working on the Stratosphere research project at the Technical University of Berlin created the Flink framework in 2009 as a way to run both historical batch and real-time streaming analytic workloads in a parallel manner.
“It’s really the next level,” Tzoumas says. “When people say ‘Lets try to find a better Spark or a better Hadoop,’ what they mean is…my data is coming in continuously but all my infrastructure is based on the assumption that data is static. So let’s transition to a streaming architecture and let’s look at what the better options are. When you do that, then Flink is a very good options.”
Since becoming a top-level Apache project in early 2015, Flink has built up considerable momentum. It is now the fifth-most popular open source big data project at the Apache Software Foundation, Tzoumas says, and the latest release involved more than 150 contributors.
Among the companies that have adopted Flink are Capital One, Ericsson, Bouygues Telecom, Twitter, and Amadeus ResearchGate. The gaming company King.com uses Flink to “continuously analyze a very, very large amount of data at a very high volume,” Tzoumas says. “A lot more companies are in the stage of trying it out and going to production, and when this happens it’s going to create another level of awareness.”
To hear Tzoumas tell it, Flink is at the cutting edge of a new “streaming first” approach to big data analytics–and one that could supplant other platforms that needlessly introduce complexity to provide streaming capabilities on top of data that is stored statically, he says. By consolidating on Flink, customers could get rid of four or five other frameworks, as well as the Lambda architecture that’s used to reconcile system-of-record batch analytics with less accurate streaming systems, he says.
“Right now, people are saying ‘We’re ingesting so many gigabytes and terabytes per day and we’re using Hadoop to analyze the data,'” Tzoumas says. “This by itself is kind of weird, because why are you using a tool that assumes the data is static to analyze your data which is being continuously produced? Our vision is to cover all of these applications with better performance and way more simplicity compared to the way they’re handled right now.”
Flink is typically deployed alongside Apache Kafka, which acts as the message routing system to feed data into Flink, where it is analyzed. The software, which exposes Java and Scala APIs, is YARN compatible and can run within Hadoop, but it’s just as happy running independently.
Much of what Flink does can be considered operational analytics, Tzoumas says–counting and aggregating clicks or events from a mobile app or an online game, for example. “A lot of the applications you see in streaming are not data science applications, but they are actually more applications that are in the core of the business,” he says. “So not so much about experimenting with data, but it’s actually more about operational use cases.”
Berlin-based data Artisans aims to build a business atop Flink, including providing technical support, services, and software. The company also has a presence in San Francisco, and has connections to more than 25 local user groups in major cities around the world.
The $6 million round was led by Intel Capital with participation from Tengelmann Ventures as well as existing investor b-to-v Partners. It brings data Artisans total funding to $7.3. Tzoumas. The technology caught the eye of Ron Kasabian, vice president in the Data Center Group and general manager of Big Data Solutions at Intel.
“Our customers view stream-based data processing as the next big trend in the data infrastructure market,” Kasabian says in a press release. “Apache Flink is one of the most advanced and transformative stream processing systems available in the open source community, and with its founding team of Flink inventors, data Artisans will help accelerate this growing opportunity.”
8 New Big Data Projects To Watch
Apache Flink Takes Its Own Route to Distributed Data Processing