The LinkedIn team that built the Apache Kafka real-time messaging service has left to form a new company called Confluent. The startup said it would offer a “real-time data platform” built around Apache Kafka.
Along with serving as a robust real-time, scalable messaging system, Kafka has been applied to collecting user activity data, logs, application metrics, even device instrumentation. Confluent’s chief focus appears to be supplying “high-volume data” as a “real-time stream for consumption in systems with very different requirements,” according to the company’s web site. Those systems include everything from batch systems like Hadoop to low-latency real-time systems as well as “stream processing engines” that handle data streams as they are delivered.
The startup said its infrastructure represents a “central nervous system” for transmitting messages to different systems and applications within an enterprise.
LinkedIn scaled Kafka to deliver hundreds of million of messages a day. The developers made Kafka an open source tool and claim it has been widely adopted. Confluence said it would aim to build a real-time data platform “to help other companies get easy access to data as real-time streams.”
The startup team said it has so far raised $6.9 million from investors including LinkedIn, Benchmark and Data Collective. The startups founders noted that Benchmark has a track record of working with open source companies like Red Hat and Hortonworks.
Confluent, based in Mountain View, Calif., is led by Jay Kreps, He previously served as LinkedIn’s lead architect for data infrastructure. Kreps is credited with the initial development work on Apache Kafka along with several other open source software projects.
Another co-founder, Neha Narkhede, will serve as Confluent’s head of engineering. Along with helping with Kafka development, she was responsible for LinkedIn’s petabyte-scale data stream infrastructure.
The third co-founder, Jun Rao, was a Kafka architect at LinkedIn, and previously worked for IBM’s Almaden Research Center in San Jose.
The startup’s executive team also includes Ewen Cheslack-Postava, whose doctoral work at Stanford University focused on distributed systems for scalable spatial query processing. The startup also said it is hiring.
Confluent added that it intends to use Kafka “as a hub to sync data between all types of systems that load data infrequently to real-time systems that require low-latency access.”
The origins of the Kafka system lay in previous tools that lacked the attributes of a modern distributed system into which data could be safely dumped while achieving the scale needed by growing companies like LinkedIn. Or as Kreps put it in a blog post, the goal was “making data integration less Kafkaesque.”
Kreps said his team viewed Apache Kafka “as a messaging system, but it was built by people who had previously worked on distributed databases so it was designed very differently. So it came with the kind of durability, persistence and scalability characteristics of modern data infrastructure.”
Confluent, Kreps added, was formed to commercialize Kafka as an open source real-time data tool. The startup said it expects to develop some proprietary tools to complement Kafka, but it will remain “100 percent open source.”
LinkedIn Centralizing Data Plumbing with Kafka
Hadoop Labor Update: Cloudera Talks Impala 2.0 as Hortonworks Preview Kafka