Rudin: Big Data is More Than Hadoop
In a presentation at the Strata + Hadoop World conference this week, Facebook Analytics boss, Ken Rudin, admonished the audience to question some of the commonly held beliefs about big data that he believes need to be challenged.
Among the first, Rudin rebuked the idea of Hadoop being synonymous with big data. There’s a belief, said Rudin, “that if you want big data, you need to go out and buy Hadoop and then you’re pretty much set. People shouldn’t get ideas about turning off their relational systems and replacing them with Hadoop,” he said.
“At Facebook, we’re a young enough company that we started by using Hadoop as our core data technology rather than relational,” he explained, saying that the company is moving to introduce more relational systems into the company’s fabric. “As we start thinking about big data from the perspective of business needs, we’re realizing that Hadoop isn’t always the best tool for everything we need to do, and that using the wrong tool can sometimes be painful.”
Rudin explained his view that big data is not about technology, but rather about business needs – a concept that permeated his entire talk. “When you start thinking about big data in terms of business needs instead of technologies, it opens up the possibility of using a much broader range of technologies,” he explained. “In reality, big data should include Hadoop and it should include relational, and it should include any other technology that is suited for the task at hand.”
He explained that the type of analysis dictates which database tool they use. “We do exploratory analysis within Hadoop to look through the data and figure out what are the metrics that really matter,” he said. “Once we know what those metrics are, and we want to do operational analysis on it – the slicing and dicing on these metrics by the various dimensions – it’s faster and simpler to do that in relational [systems],” he explained.
Facebook uses Hadoop for analysis where high granularity of the data is needed, as well as for real-time monitoring, explained the analytics chief. However, for analysis that examines trending over days, weeks, months, or years, data variables are determined in Hadoop and then brought into a relational system where it can be further analyzed. “The bottom line is, use the right technology for whatever it is you need,” he explained. “Big data is expansive and inclusive.”
Other commonly held beliefs that Rudin challenges included the following:
Big data gives you better and deeper answers – It does, says Rudin. However having complete answers to the wrong questions doesn’t add any value. Rudin says that Facebook solves this problem by focusing on hiring the right people. “It is no longer sufficient to hire people who have a PhD in statistics,” says Rudin. “You also need to make sure that the people that you hire have business savvy.” He says that Facebook does this by giving employee prospects case studies from its own business and asking them what metrics they think would be important to look at.
The focus of big data should be on actionable insights – “Everyone feels that the goal of big data is to give you actionable insights,” he said. “It’s not.” The goal of big data, said Rudin, is to do something about the insights. “You need to close the gap,” he said. “You need to go the last mile and evangelize your insights so that people actually act on them and there is impact.”
The bottom line of Rudin’s talk is that the data science analysts within the organization need to own the outcomes. “It doesn’t matter how brilliant our analyses are. If nothing changes we have made no impact,” he said, adding that, in that scenario, it doesn’t matter whether or not the analyst even works at the company.
“The people who do this well are driving tremendous impact in their companies and changing the industries in which they work,” he said. “And as an analyst, that’s the biggest impact we can have.”