Big Data So Easy a Caveman Could Do It?
Let’s face it: big data isn’t easy. If you’re building a big data application today, you’re up to your eyeballs in things like R and Java, MapReduce and Pig, and Storm and Kafka. There’s a reason data scientists are so hard to find that they’re compared to unicorns. But in the future, the big data application assembly process may be dumbed down to the point where, as the insurance commercial says, even a caveman could do it.
That’s the approach that Microsoft is taking with its increasingly robust suite of big data tools and services based around its hosted Hadoop distribution, called HDInsight. At last week’s Strata + Hadoop World conference, the IT giant made a slew of announcements that show it’s taking seriously the challenge of boiling the complexity out of big data and making it accessible for everybody.
“What we do is say…let’s make the plumbing stupid simple for everybody,” T. K. “Ranga” Rengarajan, corporate vice president of the Data Platform business at Microsoft, told Datanami in an interview at the show last week. “You can get there in fewer clicks, fewer hours, fewer days to achieve your objectives than any other place. That’s how we make it easy.”
The announcements include support for Apache Storm on HDInights, the availability of a Linux-based version of HDI on Azure, the general availability of Azure Machine Learning, and a connector between HDI and its hosted DocumentDB NoSQL database. When you factor in the recent acquisition of R parallelization experts Revolution Analytics, the challenge to Tableau, Qlik, and Spotfire that the January launch of PowerBI version 6 poses, and the forthcoming public launch of an internally developed MapReduce analog dubbed Cosmos, you get the sense that Microsoft is up to something big in the big data space.
“This is a linchpin technology for IOT scenarios,” Rengarajan says. “We have the world wide hyperscale cloud [Azure] and we have the end points around the world, called event hubs. Essentially if you have any sort of device, anything that produces data, you can simply point to an event hub and pump data into it. From the event hub, you can get a Storm Spout, if you’ve already written the connector, and then bring it into the Hadoop world. Then all the data is available for any other service in Azure.”
Storm is a critical piece of the emerging Hadoop stack, but it’s not easy to use. Rengarajan recounts a session at Strata where the speaker asked how many people in the audience were using Storm. Half the people raised their hands. Then the speaker asked how many think it’s easy? One raised his hand.
“The technology is very compelling, but the damn thing is hard,” Rengarajan says. “Everybody would like to have it, but it’s not easy. Our big goal is to make it easy enough for everybody [to use]….We want to open up this amazing world for them, leveraging the assets they already have.”
One early adopter of Storm in Azure is Linkury, a company in the ad optimization space that has participated in Microsoft’s BizSpark program for startups. The Israeli company is using Storm to pump real-time data feeds into HDI so it can adjust and optimize its advertising spending in real time in response to what people are looking at on the Internet.
“They have only one developer who’s pinch-hitting as an IT person,” Rengarajan says. “Everything is running in Azure. It’s a fundamental capability for them and they don’t have to invest much in terms of people power to do that.”
Similarly, Microsoft will be looking to make machine learning as drop-dead simple as possible through its Azure Machine Learning offering.
“We have a long history of big data and machine learning at Microsoft, from Bing and Xbox, but we’ve been using them mostly internally,” Rengarajan says. “We have very mature algorithms that we’re packaging. So even if you don’t know all the technology behind these algorithms, you can still consume them, knowing what they do. That’s the beauty of our approach–make it very simple to pull it all together.”
As Rengarajan sees it, Microsoft will win in big data by being open, by adopting the latest open technologies, and by shielding users from the complexities of the technologies as much as humanly possible. The fact that Microsoft is supporting Java tooling, the Linux operating system, and the iPhone for mobile dashboards (even before it supports its own Windows mobile phone offering) shows that there just might be something to this “new Microsoft” that employees talks about.
“The important part of our approach is to allow people to bring in the tools, technologies, and frameworks of their choice and achieve their objective in the simplest manner possible, across on-premise and the cloud,” Rengarajan says. “You innovate only where you think you can make a dramatic IP difference. If you want to take the standard parts, it’s plumbed.”