Follow Datanami:
September 9, 2020

VC Ben Horowitz Dishes on Hadoop, AI, and Data Culture

Don’t mistake Ben Horowitz as big fan of Hadoop. “The product was just never good,” the noted venture capitalist said today in a wide-ranging fireside chat with Sisu CEO Peter Bailis during the Future Data Conference.

There’s no denying that Horowitz has had an outside influence on tech startups with Andreessen Horowitz, the Menlo Park, California investment firm that he co-founded with Marc Andreessen, the co-author of Mosaic and the founder of Netscape. The list of current investments and exits on the venture capital company’s website is simply ridiculous.

The storied Sandhill Road firm is currently invested in Sisu, which shows promise as a next-gen analytics system that uses machine learning to help people ask better questions of the data. Andressen Horowitz, which has $12 billion under management, has helped fund a variety of ecosystem tool players featured in these pages, like Alluxio, Anyscale, Cazena, Databricks, and Fivetran. And that’s just the first six letters of the alphabet (this may be an online publication, but we don’t have that much space).

So when Bailis, who’s currently an assistant professor of computer science at Stanford University and a member of co-principal of the Stanford DAWN project, asked Horowitz during their fireside chat at the Future Data Conference what he made of the whole Hadoop thing, you figured that Horowitz would deliver the goods.

“It was just not good in any way, other than it was better than using a relational database if you have a big data problem,” Horowitz said of Hadoop. “But you could tell that thing was going to be vulnerable from day one.”

Apache Spark exposed Hadoop’s vulnerability, perhaps sooner than most industry watchers expected.

The Hadoop elephant has taken some hits (mw2st/Shutterstock)

“Of course, Spark came along and it was immediately 10 to 100x faster,” Horowitz said. “It was way easier to program. It was way easier to deploy. It could deploy into HDFS. And so it was just crazy. But anytime anything gets to be the standard in a really big category like that, it’s going to draw a lot of money because people assume that the product is good. But the product was just never good.”

If Spark had been only 2x or 3x better than Hadoop, then nobody would have moved. “And Cloudera and Hortonworks would have won,” Horowitz said. “But if there’s something that’s 20x or 30x better, then everybody is moving. Nobody is going through that kind of pain just because it was the standard, and it had an ecosystem around it.”

Being a standard is a very powering thing, Horowitz said. “But it’s not so powerful that it can overcome a truly horrible product. And that’s what happened there,” he said.

If something comes along that’s 30 times better than Spark, then that will be a problem for Databricks, Horowitz said. “The difference is, I also think Databricks just is a much, much better company than Cloudera or Hortonworks ever was,” he said. “They’ve got a way better CEO. They’ve got probably the best large engineering team in enterprise software today. I’m not too worried about history repeating itself on that one.”

Horowitz on ML

Horowitz also gave his take on the state of machine learning and AI. While he influences how many millions of dollars are invested in ML and AI technology, Horowitz does not seem to hold the opinion that AI and ML will be radically changing the essential nature of business anytime soon.

Sisu CEO Peter Bailis (left) interviewed Ben Horowitz today during the fireside chat at the Future Data Conference

“Machine learning becomes interesting when statistic runs out of gas. For most companies, they’re still not at that point, for most of the things that they’re trying to do,” Horowitz said. “I do think [that] most of their data is highly dimensional enough and complex enough where using machine learning techniques makes sense. I just don’t think they’re going to be able to hire those people.”

Those people, of course, are the data scientists and data engineers who are experts at wrangling and teasing insights from huge amounts of data. They are needed to tackle the toughest problem, but there are still so few of them that most companies will never be able to afford them.

“Well over 90% of businesses won’t have data scientists,” Horowitz predicted. “They’ll have off-the-shelf, no-code tools that help them use their data to make really good decisions for their business. But that’s different than companies where their product is going to change dramatically if they use the data in the most cutting-edge way.”

Horowitz clearly is a big believer in infusing ML and AI tech into existing products, such as analytic tools. That’s what Sisu and other a16z investments are actively doing. But he doesn’t seem to believe that all companies must adopt data science to survive, as least as data science currently exists.

“You don’t just do machine learning,” he said. “You need to move your data. You need to transform your data. You need to do feature engineering. There’s a lot of work that nobody has had the skills to do, and then the toolchain has been immature on that. The toolchain is quickly getting much more mature, so I think we’ll see more AI projects start to succeed, now that we have tools like Databricks and Fivetran, dbt, and Tecton, and what not.

“So tools that can deliver with the right amount of AI under the hood, as opposed to just mega Excel, are going to be extremely valuable,” he continued. “But they’re going to have to be simplified to the point where a company can actually adopt them. There just have been very few companies that are good at doing AI themselves.”

Horowitz on Data Culture

Clearly, Horowitz suffers from a pragmatic streak, and this tendency showed up when Bailis asked him to elaborate on his views on the importance of data culture. From his vantage on Sand Hill Road, clearly we must be one chapter away from Data Utopia, right?

The ability to quickly get correct answers is the first step in building a data culture (pgraphis/Shutterstock)

“The prerequisite for solving the culture question is you have to have the capability,” Horowitz said. “If you’re going to have a data-driven culture, you have to be able to ask the data question and get the answer.”

What it comes down to, Horowitz said, is how fast you can get correct answers out of data.

“If you’re in a meeting and people go, ‘Okay we want to change the product this way or put more sales people in this region or we want to do whatever it is that we want to do,’ then somebody who’s trying to set the culture says, ‘Well, what does the data say?’ If the answer is ‘We’ll come back in a month and tell you,’ you’re never going to be a data-driven culture. There’s just no way. That can never happen.

“But if you have an amazing product like Sisu….and it can just tell you, then it’s really easy to get to data-driven culture, because all you have to do is literally ask the question ‘What’s the data say,’ when somebody comes up with the proposal. And if you ask that enough, it will go that way.”

In other words, you may not need to ‘become a data company’ to survive, as so many pundits sometimes say. And chances are you won’t be able to afford a top-notch data science or data engineering team, as Lyft and Facebook have. But according to Horowitz, you need to learn how to analyze data and get good answers from it, because that capability is quickly becoming a commodity. (Perhaps you might even be looking for a super analyst, even if you haven’t articulated that particular thought yet.)

“Anybody who’s making a decision in business is going to need to be a data analyst in order to be competitive. That seems pretty obvious that that’s gotta happen because, look, these decision are really complicated,” Horowitz said. “There’s so much data out there on which you can base decisions that if you’re just doing everything by intuition in say 2025, it does seem that it’s going to be impossible for you to compete with any company that knows how to use the data and make sharper decisions.”

Related Items:

Finding and Managing Super Analysts for the Fourth Industrial Revolution

Exposing AI’s 1% Problem

Hadoop Has Failed Us, Tech Experts Say