Can Microsoft Become the McDonald’s of Hadoop?
Microsoft last month officially took the wraps off its Hadoop service, dubbed Windows Azure HDInsight Service. Now that the Hortonworks-based offering is GA, Microsoft is gearing up its incredibly audacious plan to reach no fewer than one billion users with its big data platform. Has Microsoft lost it? Or is the plan just crazy enough to work?
“I am pleased to announce that Windows Azure HDInsight–our cloud-based distribution of Hadoop–is now generally available on Windows Azure,” Quentin Clark, corporate vice president of the Data Platform Group at Microsoft, wrote last week in a blog post. “The GA of HDInsight is an important milestone for Microsoft, as its part of our broader strategy to bring big data to a billion people.”
For the better part of a year, Microsoft has been beta testing its HDInsight offering, which is based on Hortonworks HDP for Windows Server offering and the Apache Hadoop version 1.x codebase. Over that time, it’s been poked and prodded by numerous users in the Windows Azure cloud environment, including some–like Virginia Tech, Ascribe, and Christian Hansen to name three–that are now on the record as official HDInsight customers.
Those early adopters are doing some interesting things. Christian Hansen has increased the number of drug trials it can analyze by a factor of 100 using its new HDInsight service and existing SQL Server databases. Meanwhile, medical researchers at Virginia Tech apparently have quicker access to DNA sequencing tools and other resources now that its HDInsight service is online.
But to get to a billion users–which is roughly one out of every seven humans on earth–Microsoft is going to have to do something drastic. There are numerous Hadoop distributions out there to choose from, and new NoSQL and NewSQL database vendors pop up nightly. Many of these vendors–especially those in the Hadoop camp–will pledge their allegiance to open source software and Java, which are things that the Redmond, Washington, company is diametrically opposed to–or at least has been historically.
But the big data-era Microsoft is different. It’s not only doing the work to make Hadoop run on Windows, Clark says, but it’s also collaborating with the broader Hadoop ecosystem by making contributions to Apache projects like Tez, Stinger and Hive. That in itself is notable, especially considering the billions of dollars and thousands of person-years it spent fighting open source software in the last decade.
“Microsoft recognizes Hadoop as a standard and is investing to ensure that it’s an integral part of our enterprise offerings,” Clark writes. “We have put in thousands of engineering hours and tens of thousands of lines of code. We have been doing this in partnership with Hortonworks, who will make HDP (Hortonworks Data Platform) 2.0 for Windows Server generally available next month, giving the world access to a supported Apache-pure Hadoop v2 distribution for Windows Server. Working with Hortonworks, we will support Hadoop v2 in a future update to HDInsight.”
Clark presented a session at Strata + Hadoop World last week called “Can Big Data Reach One Billion People?” Clark says that big data is creating a “major transformation” that will impact everybody in a business. “The impact of this is beyond just making businesses smarter and more efficient. It’s about changing how business works through both people and data-driven insights,” he says.
Those are big words, but how do they translate into actions? Microsoft has a couple of things working to its advantage here. For starters, it’s notable that Microsoft is bringing its .NET development tools to bear on Hadoop apps. It has built an API for Hadoop that allows developer to write MapReduce jobs using .NET, and developers can also access Hive using .NET tooling. This is critical, as the requirement to have Java programming skills to develop on Hadoop is arguably one of the biggest shortcomings of that big data platform.
The other big thing that Microsoft has going for it is Excel and its other data manipulation tools, including Power BI. Over the years, hundreds of millions of people have been exposed to Excel, which remains a standard in the corporate world. (It’s much loved by some, much hated by others, but a standard nonetheless.) Excel still doesn’t get you to a billion people, but it shows you how Microsoft is trying to leverage its existing strengths to approach this problem.
The Power BI offering looks to be a major part of Microsoft’s HDInsight strategy. The software allows users to refine, visualize, and query multiple data sources, including HDInsight instances, external Hadoop clusters, or any sort of SQL Server database. Power BI has the potential to play a similar role that Tableau and QlikTech play in many organizations’ big data strategies. (Or maybe Microsoft should just buy Tableau or QlikTech.)
It’s really no wonder that Microsoft wants in on the big data game, which has opened a spigot of cash from venture capital firms looking to invest in big data startups. We’re at the beginning part of a spending upswing by actual companies and other organizations that will enrich those who are ready to service organizations that want to “do” big data.
The problem is, Microsoft has never been much of an early innovator. Indeed, it owes much of its success to building off the success of early innovators (anybody remember WordPerfect, Lotus 123, or Netscape Navigator?). It was rarely–if ever–the first to market with a ground-breaking product. Instead, its business model was to build comparable offerings (some would say “copy” them), then used its marketing muscle or monopoly position to dominate the market.
But the Microsoft of today is far different than the Microsoft of the late 1990s or early 2000s. With Steve Ballmer at the helm, the company has been a kinder, gentler Microsoft (to borrow a George H.W. Bush phrase) than under Bill Gates’ run. But Ballmer is on his way out, and the company, once again, is in need of a makeover. While its consumer business is shrinking, it still has a very profitable business in the corporate server racket with its Windows Server OS, middleware offerings like SQL Server, and ERP applications with Dynamics.
The question is, will Microsoft do what it takes to be successful in the big data world? With Visual Studio and .Net, it would seem to be in a great position to offer an alternative to the Java-focused development paradigm that so far has dominated Hadoop.
It’s off to a start with the APIs it’s written for Hadoop and the integration with Excel. But if it really wants to become the McDonald’s of big data, Microsoft needs to do much more to leverage its huge Visual Studio installed base and truly make it easy for them to write big data applications within a Microsoft environment.