People to Watch 2016

Jay Kreps
Co-founder and CEO
Confluent

Jay Kreps Image

Jay Kreps is the Co-founder and CEO at Confluent, the commercial open source company behind Apache Kafka. He is the original author of Apache Kafka, the high-throughput message broker at the center of many real-time analytic and Internet of Things (IoT) applications. Kreps developed Kafka while working at LinkedIn, and he was also involved in the development of Apache Samza, Voldemort, and Azkaban. Jay received both his B.S. and M.S. in Computer Science from the University of California, Santa Cruz.

Datanami: Hi Jay. Congratulations on being selected as a Datanami 2016 Person to Watch. What are Confluent’s leading goals for 2016 with regard to big data?

Jay Kreps: Confluent’s mission is to enable companies to make real-time data a core part of their architecture and applications. This is really what the team has been working on since we first created Apache Kafka at LinkedIn a number of years ago — it’s just that both the team and the number of people using the technology has grown a lot.

The two big things we’ve done recently is release new functionality that is part of Apache Kafka to help advance this mission. The first addition to Kafka is a framework called Kafka Connect that makes it possible to capture and manage streams of data coming from external applications, databases, or other sources. The second addition is a state of the art stream processing facility called Kafka Streams—this facility makes it dead simple to build applications that process data streams in real-time.

This year we’ll be focusing on really refining this new functionality as well as building out the ecosystem of connectors to other data systems that work with Kafka Connect. Already dozens of these have appeared in open source in only a matter of weeks, so we’re really excited about the pace of adoption.

We’ll also be doing everything we can to help support the Kafka open source community. One big thing we’re putting together soon is the first ever Kafka Summit, which is happening on April 26^th in San Francisco.

Datanami: Where are we in the evolution of real-time streaming? What impact do you see it having on computing in the future and how we make business decisions going forward?

Real-time streaming is a fundamental advance in how companies use data. I think we will come to see it as being as big a deal as the move to distributed computing has been in helping companies unlock the power of their data.

Why is this such a big deal? We are coming out of an era where data mostly sat on the edge of a company and was used for reporting. But increasingly businesses have started to become something that are as much made out of software as they are made out of people and human processes. And this is causing the uses of data-intensive software to change from purely reporting on the business to actually directly powering the business.

Since businesses are inherently real-time in much of what they do I see the expansion from daily batch computing to real-time streams as being a big enabling factor in making this possible.

We’re at an exciting time now because a lot of the technology that makes using streaming data possible is really just now getting to the level of usability, reliability, and completeness to make this type of usage practical. Already we’re starting to see an explosion of exciting applications and the spread of this technology beyond a leading edge of companies to the much larger mainstream.

Datanami: Generally speaking, on the subject of big data, what do you see as the most important trends for 2016 that will have an impact now and into the future?

I’d flag three big ones:

The transition from daily batch computing to real-time streams
The transition from private data centers to the cloud, which really changes the economics and practicality of scalable data processing
Growing focus on the importance of privacy, security, and data governance in doing all this in a secure way

Datanami: Outside of the professional sphere, what can you tell us about yourself – personal life, family, background, hobbies, etc.?

Outside work I enjoy spending time with my two little girls as well as running and occasionally rock climbing.

Datanami: Final question: Who is your idol in the big data industry and why?

Alan Turing—big data was a little smaller back in the 1940s but no less important.


P. Taylor Goetz Apache Software	Jeff Hammerbacher Cloudera	Jay Kreps Confluent

Todd Lipcon Cloudera	Jacques Nadeau Dremio	Peter Norvig Google

Alex Pentland MIT	Jennifer Priestley Kennesaw State University	Nate Silver FiveThirtyEight

Daniel Sturman Cloudera	Werner Vogels Amazon	Matei Zaharia Databricks