DataTorrent
Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan


November 14, 2012

Microsoft Taps on Open Source Window


You know the saying… “If you can’t beat ‘em, join ‘em.”

Apparently Redmond is taking this strategy to heart as they revamp their approach to addressing big data in the era of Hadoop.

With Dryad out to dry and a renewed push to find a unique message for enterprises enamored with the elephant, the company had to rethink its strategy—a process that is pushing it closer to the world of open source than it’s had to go in the past.

Microsoft is hoping to bring Hadoop to the Windows platform, both in its on-premise and Azure (cloud) capacity via a partnership with Hortonworks, which Dr. Michael Rys, Principal Program Manager for the SQL Server Engine division, claims was formed because of their close ties to the original Yahoo open source roots. The Hortonworks progress has been rather slow—they’ve been in development on a common solution to suit the Windows masses for well over a year, but according to the Microsoft folks we spoke with this week at the annual Supercomputing Conference (SC12), this is because they want to focus on reliability, high availability and full integration with the tools their customers expect.

The Hadoop implementation for Windows Server and Azure marks a change in pace for Microsoft, which has been developing its own answer to Hadoop designed for its infrastructure and platforms via Dryad since 2006. As interest and adoption possibilities waned, they were forced to drop the project and look to where the real momentum behind Hadoop was coming from—the community and ecosystem that created it. That made Hortonworks an ideal partner since they’re the gatekeepers of the Apache project—and it marked a way for Microsoft to show its commitment to giving back to the open source community that pulled the rug out from under them.

HDInsights, which is the Windows-primed version of Hadoop that Microsoft is nearing some key use cases for, has the goal of bringing in big data business for Windows customers who cling tightly to all the tools and integration that the platform has delivered on for years, most notably System Center, Visual Studio, and ready capabilities in SQL, among other elements.

In what can be seen as something of a concession speech post, the company stated last November in the wake of the Dryad ditch, "Hadoop has emerged as a great platform for analyzing unstructured data or large volumes of data at low cost, which aligns well with Microsoft’s vision for its Information Platform.  It also has a vibrant community of users and developers eager to innovate on this platform. Microsoft is keen to not only contribute to this vibrant community, but also help its adoption in the enterprise."

Dryad, which was originally called LINQ to HPC (which already painted the program into the high-end of the computing corner, thus leaving it without the direct commercial appeal Hadoop had) was Microsoft’s vision to build a parallel programming toolset that could work on-site or in their Azure cloud. This was a grand ambition—so grand, in fact, that the HPC/technical computing team was moved into the cloud division—effectively eliminating the company’s direct links to HPC (even if the technical computing team was still developing the same projects). As part of their “all in” with the cloud strategy, a Hadoop-like framework might have fit nicely. The problem was, they didn’t see the market adoption required to get it off the ground.

One can see why Microsoft would be so interested in helping their customers jump the Java ship with Hadoop and MapReduce approaches and it seems there was some surprise that it didn’t all pan out the way they thought it might. However, now that they are working with Hortonworks, they can give the code back to the Apache community to free up other distros from handling all the complex tooling that will make Hadoop Windows-ready. This means they are touching the open source community in a way that might breathe new life into Microsoft’s big data strategy since more of the tentative Windows folks, not eager to leave behind their SQL, Server and other tools behind in exchange for the unfamiliar Java territory, can climb on board with greater ease.

Granted, while Microsoft has the most manpower for a job like this, they’re not the first to move Hadoop closer to the SQL heart. Cloudera, with its Impala release last month helps make this easier—a development that Rys says is something his company is watching carefully.

Rys agreed that the lack of an active community (i.e. Apache) was part of the problem with the Dryad approach, but says that what they developed was technically sound. Although his team was not directly involved with Dryad, he said that it had all the makings of a solid Hadoop alternative for the gads of customers who wanted the familiarity and integration of the tried-and-true Windows platform tools—it just didn’t get the kind of market adoption one might have thought given the emergence of Hadoop as a much-hyped solution.

Rys feels that the Hadoop for Windows move is solid given the open source approach they’ve taken. “As long as the other distros are basing their stuff off the core Apache trunk, they can benefit from this—it’s the whole point,” he said. “We have HDInsights, which is a subset of the Hortonworks platform, and it’s designed from the ground up to work on Windows, but that’s just the first step to get the ball rolling.” He told us they have contributions that go back to the Apache project directly to appease the “many customers who love the Microsoft tools and platforms—who want to program in Visual Studio, use System Center, SQL—they don’t want to go to a completely different world to program, so we’re giving them the ability to use the ecosystem that the Hadoop world provides with all of its capabilities and libraries, but with the manageability, integration and general experience they get with Windows platforms.”

The company sees big things ahead when it comes to big data. In addition to the use cases that Rys assured us were forthcoming, they are building out some interesting capabilities, including a query processor that lets users run queries against Hadoop, HDFS datasets and within their own parallel data warehouse SQL server appliance. He said that this marks a compelling extension to integrate all of this into one’s existing Microsoft ecosystem instead of just running a bare bones Hadoop distro without the integration and tooling.

Related Articles

Hortonworks Dishes on Distro Differences

Cloudera Runs Real-Time with Impala

Marching Hadoop to Windows

Share Options


Subscribe

» Subscribe to our weekly e-newsletter


Discussion

There are 0 discussion items posted.

 

Most Read Features

Most Read News

Most Read This Just In



Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia

ISC'14

Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
StampedeCon
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014
ISC'14
Leipzig
Germany

» View/Search Events

» Post an Event