Follow Datanami:

People to Watch 2016

Matei Zaharia
Co-founder and CTO
Databricks

Matei Zaharia is both the Co-founder and Chief Technology Officer at Databricks, the commercial open source company behind Apache Spark. Matei is known for creating Apache Spark–the in-memory framework that is revolutionizing big data analytics–and co-creating Apache Mesos during his Ph.D. at UC Berkeley. He also works at the Massachusetts Institute of Technology as an assistant professor of Computer Science and is credited with designing the core scheduling algorithms used in Apache Hadoop.

Datanami: Hi Matei. Congratulations on being selected as a Datanami 2016 Person to Watch. When did you first come up with the idea for Spark? What was the defining moment, or the “spark,” that led you to create it?

Matei Zaharia: We came up with the idea for Spark when I was a grad student at UC Berkeley in 2009. I had already been working on Apache Hadoop, and some students in machine learning in our lab asked me for help to run their computations at scale. We quickly realized that Hadoop MapReduce was very inefficient for their types of computations, and so I built a new engine that could run these more efficiently. Once that came out and people saw the engine, they tried it out, gave feedback, etc. and it’s all history from there.

Datanami: You’re also one of the six founders of Databricks. Can you tell us how the six of you found each other and how you came up with the idea for Databricks?

All of us were students and researchers at UC Berkeley working on Spark. Therefore, it was very easy to get started, because we all knew each other already and had built a lot of things together. We decided to build a cloud service in particular because we looked at the industry and we thought that the biggest problem with big data is that it’s still too complex to use: too many moving parts, too much time to get started, too many pieces of software to manage and use. With Databricks we hide all that and let users start getting production applications and insights within days.

Datanami: Generally speaking, on the subject of big data, what do you see as the most important trends for 2016 that will have an impact now and into the future?

There are two really interesting trends. The first is increasing access to big data to new types of users: whereas the first big data tools were designed for software engineers, there are now tools that data scientists, business analysts and BI users can use. We have done a lot in Spark to make the engine accessible to more users through Python, R and SQL interfaces, and indeed we see that the majority of access at Databricks is now through these. A second trend is the emergence of new hardware platforms such as GPU and NVRAM, and we are updating Spark’s internals to be able to use those.

Datanami: I’m sure most of your students would love to follow in your footsteps. What advice can you give that would help them to achieve this goal?

I always tell people to learn a new area by diving in hands-on and discovering what’s difficult. For example, when I started working with MapReduce, I didn’t know that interactive queries or machine learning would be a problem, but this quickly became clear as a common trend among all the users I talked with. This led us to think about completely different programming models and to ask what other applications (e.g. streaming) they might enable.

Datanami: Final question: Who is your idol in the big data industry and why?

There are a lot of great people in the industry, but if I had to pick I’d say a lot of it has been influenced by Google’s Jeff Dean, as well as Sanjay Ghemawat and the other folks who worked on large-scale infrastructure there.

 

P. Taylor Goetz
Apache Software
Jeff Hammerbacher
Cloudera
Jay Kreps
Confluent
Todd Lipcon Image Jacques Nadeau Image Peter Norvig Image
Todd Lipcon
Cloudera
Jacques Nadeau
Dremio
Peter Norvig
Google
Alex Pentland Image Jennifer Priestley Image Nate Silver Image
Alex Pentland
MIT
Jennifer Priestley
Kennesaw State University
Nate Silver
FiveThirtyEight
Daniel Sturman Image Werner Vogels Image Matei Zaharia
Daniel Sturman
Cloudera
Werner Vogels
Amazon
Matei Zaharia
Databricks

 

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13

Datanami