You know the saying… “If you can’t beat ‘em, join ‘em.”
Apparently Redmond is taking this strategy to heart as they revamp their approach to addressing big data in the era of Hadoop.
With Dryad out to dry and a renewed push to find a unique message for enterprises enamored with the elephant, the company had to rethink its strategy—a process that is pushing it closer to the world of open source than it’s had to go in the past.
Microsoft is hoping to bring Hadoop to the Windows platform, both in its on-premise and Azure (cloud) capacity via a partnership with Hortonworks, which Dr. Michael Rys, Principal Program Manager for the SQL Server Engine division, claims was formed because of their close ties to the original Yahoo open source roots. The Hortonworks progress has been rather slow—they’ve been in development on a common solution to suit the Windows masses for well over a year, but according to the Microsoft folks we spoke with this week at the annual Supercomputing Conference (SC12), this is because they want to focus on reliability, high availability and full integration with the tools their customers expect.
The Hadoop implementation for Windows Server and Azure marks a change in pace for Microsoft, which has been developing its own answer to Hadoop designed for its infrastructure and platforms via Dryad since 2006. As interest and adoption possibilities waned, they were forced to drop the project and look to where the real momentum behind Hadoop was coming from—the community and ecosystem that created it. That made Hortonworks an ideal partner since they’re the gatekeepers of the Apache project—and it marked a way for Microsoft to show its commitment to giving back to the open source community that pulled the rug out from under them.
HDInsights, which is the Windows-primed version of Hadoop that Microsoft is nearing some key use cases for, has the goal of bringing in big data business for Windows customers who cling tightly to all the tools and integration that the platform has delivered on for years, most notably System Center, Visual Studio, and ready capabilities in SQL, among other elements.
In what can be seen as something of a concession speech post, the company stated last November in the wake of the Dryad ditch, "Hadoop has emerged as a great platform for analyzing unstructured data or large volumes of data at low cost, which aligns well with Microsoft’s vision for its Information Platform. It also has a vibrant community of users and developers eager to innovate on this platform. Microsoft is keen to not only contribute to this vibrant community, but also help its adoption in the enterprise."
Dryad, which was originally called LINQ to HPC (which already painted the program into the high-end of the computing corner, thus leaving it without the direct commercial appeal Hadoop had) was Microsoft’s vision to build a parallel programming toolset that could work on-site or in their Azure cloud. This was a grand ambition—so grand, in fact, that the HPC/technical computing team was moved into the cloud division—effectively eliminating the company’s direct links to HPC (even if the technical computing team was still developing the same projects). As part of their “all in” with the cloud strategy, a Hadoop-like framework might have fit nicely. The problem was, they didn’t see the market adoption required to get it off the ground.
One can see why Microsoft would be so interested in helping their customers jump the Java ship with Hadoop and MapReduce approaches and it seems there was some surprise that it didn’t all pan out the way they thought it might. However, now that they are working with Hortonworks, they can give the code back to the Apache community to free up other distros from handling all the complex tooling that will make Hadoop Windows-ready. This means they are touching the open source community in a way that might breathe new life into Microsoft’s big data strategy since more of the tentative Windows folks, not eager to leave behind their SQL, Server and other tools behind in exchange for the unfamiliar Java territory, can climb on board with greater ease.
Granted, while Microsoft has the most manpower for a job like this, they’re not the first to move Hadoop closer to the SQL heart. Cloudera, with its Impala release last month helps make this easier—a development that Rys says is something his company is watching carefully.
Rys agreed that the lack of an active community (i.e. Apache) was part of the problem with the Dryad approach, but says that what they developed was technically sound. Although his team was not directly involved with Dryad, he said that it had all the makings of a solid Hadoop alternative for the gads of customers who wanted the familiarity and integration of the tried-and-true Windows platform tools—it just didn’t get the kind of market adoption one might have thought given the emergence of Hadoop as a much-hyped solution.
Rys feels that the Hadoop for Windows move is solid given the open source approach they’ve taken. “As long as the other distros are basing their stuff off the core Apache trunk, they can benefit from this—it’s the whole point,” he said. “We have HDInsights, which is a subset of the Hortonworks platform, and it’s designed from the ground up to work on Windows, but that’s just the first step to get the ball rolling.” He told us they have contributions that go back to the Apache project directly to appease the “many customers who love the Microsoft tools and platforms—who want to program in Visual Studio, use System Center, SQL—they don’t want to go to a completely different world to program, so we’re giving them the ability to use the ecosystem that the Hadoop world provides with all of its capabilities and libraries, but with the manageability, integration and general experience they get with Windows platforms.”
The company sees big things ahead when it comes to big data. In addition to the use cases that Rys assured us were forthcoming, they are building out some interesting capabilities, including a query processor that lets users run queries against Hadoop, HDFS datasets and within their own parallel data warehouse SQL server appliance. He said that this marks a compelling extension to integrate all of this into one’s existing Microsoft ecosystem instead of just running a bare bones Hadoop distro without the integration and tooling.