Follow Datanami:
June 17, 2015

Ex-Googler Now Helping Cloudera Build Hadoop

Cloudera scored a coup recently when it nabbed former Google executive Daniel Sturman to head up its engineering department. In an interview with Datanami, Sturman explains how he intends to use his experience designing distributed systems at the Internet giant to help evolve Hadoop.

Sturman was intimately involved in designing and running the software infrastructure that Google uses to run its massive online business. As vice president of engineering at Google, he led the teams responsible for the Google Compute Engine and Google App Engine. These systems weren’t based on Hadoop–Google was a big user of Hadoop and MapReduce early on and has since moved to other distributed systems. But the experience translates well to Hadoop, he says.

“What I’m bringing from that Google experience is an idea of how systems operate at large scale,” says Sturman, who reports to Cloudera CEO Tom Reilly. “You know you’re on a distributed system when some component that you never heard of fails and it brings you down. So I have a lot of experience…in successfully building systems to work at scale in distributed environments.”

Google was a pioneer in horizontally scaling commodity servers and it invested hundreds of millions, if not billions, of dollars to assemble top-notch engineering and development teams who were capable of building these systems from scratch. Ten years ago, Google’s competitors were doing the same thing–you’ll recall that Yahoo’s Doug Cutting based what would become the Hadoop Distributed File System (HDFS) in part on an obscure paper about the Google File System.

Fast forward to 2015, and Hadoop is on the cusp of giving people the same kind of distributed processing power that Google, Yahoo, and others worked so hard to build, but without all the blood, sweat, and tears. As Cloudera’s new vice president of engineering, Sturman is happy to be working with Cutting, who’s the chief architect at Cloudera.

“I’m very excited about where Hadoop is right now,” Sturman says. “Having seen how this stuff works at Google about the sort of insights you can get from data when you have the right tools at your disposal–I know the power of that. Google put a lot of time and engineers in building up that expertise, and rightfully Cloudera’s customers are a little bit more impatient. They want to unlock that power much faster. They don’t want to quite have that level of investment.”

Daniel Sturman

Cloudera Vice President of Engineering Daniel Sturman

Sturman is just one week into his new job, which is barely enough time to map the routes to the office coffee machines, let alone properly introduce himself to all the members of Cloudera’s development team. But he has already outlined what he believes his role at Cloudera will be, and where he can have the biggest impact.

“This is an incredibly talented team here at Cloudera. They do their jobs very well and they know the community very well,” says Sturman, who was the driving force behind the Kubernetes container project while at Google and has a Ph.D. and master’s degree in computer science from the University of Illinois.

Sturman says his focus will be identifying the barriers to Hadoop adoption, and getting it past “the knee” in the adoption curve. “Cloudera has its competitors and we all compete for deals. But I think the real competitor is things that are blocking the people who aren’t using it from coming on board and using the technology. “I’m not quite sure where that will be in the technology stack yet.  I’m absolutely sure it will involve the Apache community in one way or another. But I’m really going to be looking at what are those inhibitors and how do we make them go away.”

Hadoop’s cup is either half full or half empty, depending on how you look at it. This duality was evident in a recent report from Gartner analyst Nick Heudecker, who authored an April study that found 54 percent of enterprises were not investing in Hadoop and had no plans to, which he dubbed “anemic adoption” that runs counter to the hype. The other way to look at that data is that 46 percent of enterprises have either already adopted it or are investing in Hadoop. That’s certainly how Rob Bearden, the head of Cloudera’s competitor Hortonworks, sees it. “The opportunity that sits in front of us is simply staggering to me,” Bearden said at last week’s Hadoop Summit.

Sturman recognizes that the potential of Hadoop is massive, but says it’s not quite where it needs to be. As the director of development for IBM‘s DB2 for Linux, Unix, and Windows database, Sturman can list “enterprise software development” on his résumé, too.

The Hadoop stack needs some filling out, Sturman says. “Enterprises tend to have a number of needs and expectations, which come from the way they managed databases and data under traditional systems,” he says. “And while we’re dealing with very different scales here, I think not all those tools are in place. I think Cloudera is doing a great job leading the market on that, but there still needs to be continued focus there in order to really enable people to do this and be comfortable about how to get there.”

Many enterprises are already getting value out of Hadoop, but we’ve just seen the tip of the iceberg in terms of the potential impact that Hadoop can have on business computing, according to Sturman. “It’s certainly not brand new, but I think we’re starting to see it become broadly adopted for a growing set of well-understood use cases,” he says. “Where I think we need to get to is where it becomes much more of a core, enterprise-ready asset.”

Hadoop is broadly adopted today in the financial services, retail, and telecommunications industries today, and many of those engagements involved extensive technical services. As Hadoop adoption continues to spread in these industries, Sturman sees patterns emerging that will become the foundations for more standardized software offerings. “Within 18 months I think we’ll start seeing that with significant numbers, especially around core verticals,” he says. “We’re starting to better understand the problems and see the patterns so that stuff can really get productized in the not-to-distant future.”

It’s hard to separate the rise of Hadoop from the big data phenomenon. (At the recent Hadoop Summit, Hortonworks executives pondered whether Hadoop was driving the Internet of Things, or if the IoT was driving Hadoop.). Whatever the dynamics, the phenomena are intrinsically related and complementary.

Sturman says we’re “on the brink of something incredible” in how we get value from data. “When it all comes together–and I think it is coming together pretty soon–it gives you an exponential power effect, because they just build on top of each other,” he says.

Related Items:

From Spiders to Elephants: The History of Hadoop

Does Hadoop Need a Reality Check?

Congratulations Hadoop, You Made It–Now Disappear