Language Flags

Translation Disclaimer

HPCwire Enterprise Tech HPCwire Japan

August 17, 2012

Marching Hadoop to Windows

Bringing Hadoop to Windows and the two-year development of Hadoop 2.0 are two of the more exciting developments brought up by Hortonworks’s Cofounder and CTO, Eric Baldeschwieler, in a talk before a panel at the Cloud 2012 Conference in Honolulu.

The panel, which was also attended by Baldeschwieler’s Cloudera counterpart Amr Awadallah, focused on insights into the big data world, a subject Baldeschwieler tackled almost entirely with Hadoop. The eighteen-minute discussion also featured a brief history of Hadoop’s rise to prominence, improvements to be made to Hadoop, and a few tips to enterprising researchers wishing to contribute to Hadoop.

“Bringing Hadoop to Windows,” says Baldeschwieler “turns out to be a very exciting initiative because there are a huge number of users in Windows operating system.” In particular, the Excel spreadsheet program is a popular one for business analysts, something analysts would like to see integrated with Hadoop’s database. That will not be possible until, as Baldeschwieler notes, Windows is integrated into Hadoop later this year, a move that will also considerably expand Hadoop’s reach.

However, that announcement pales in comparison to the possibilities provided by the impending Hadoop 2.0. “Hadoop 2.0 is a pretty major re-write of Hadoop that’s been in the works for two years. It’s now in usable alpha form…The real focus in Hadoop 2.0 is scale and opening it up for more innovation.” Baldeschwieler notes that Hadoop’s rise has been result of what he calls “a happy accident” where it was being developed by his Yahoo team for a specific use case: classifying, sorting, and indexing each of the URLs that were under Yahoo’s scope.

What ended up happening was that other Yahoo teams requested use of the Hadoop nodes and found success with it, leading to a much more significant investment from Yahoo. “Yahoo took this (Hadoop) prototype and then built an internal service that now runs on 42,000 computers with roughly 200 petabytes of raw storage involved and it took about 300 person-years of investment and open source software to make this thing work.” From there, folks like Baldeschwieler and Awadallah went off and formed other projects like Hortonworks and Cloudera to further add to Hadoop. 

While Hadoop’s rise makes for a fun success story, its status as somewhat of a happy accident has led to some inefficiencies and limitations, such that a new version entirely was necessary to continue its growth. “The existing Hadoop 1.0 base runs on about 4,000 computers whereas the target design is about 10,000 and that takes Moore’s law forward a few years. Our current target computer has about 12 TB of disk, the new one would have 36.”

Hadoop 2.0 is more than about improving its scale, however. Baldeschwieler would like to see programmers and data scientists able to work with more than MapReduce, in essence making it more ‘pluggable.’ He would also like to see new varieties of files introduced to Hadoop through version 2.0.

Making 2.0 more pluggable may also solve another Hadoop problem businesses are having. Baldeschwieler mentioned that every Fortune 500 company has Hadoop running in some form but many businesses are slow to make full use of it. Making Hadoop more pluggable will not help the businesses that hear of Hadoop, want to get into big data, and end up buying several nodes to accomplish that end without much thought.

 It will however assist those with competent technology departments that have analytics tools but are unable to integrate them with Hadoop for whatever reason. “We need to make sure that there’s the right APIs for everyone who’s building data products to plug into Hadoop in various ways.”

Finally, someone has to be doing all this research into the advancement of Hadoop into its second version. Baldeschwieler notes that while the Hadoop community welcomes good ideas and contributions, one should build a reputation in the community by doing interesting research with Hadoop before trying to add to it.

Share Options


» Subscribe to our weekly e-newsletter


There are 0 discussion items posted.


Most Read Features

Most Read News

Most Read This Just In


Sponsored Whitepapers

Planning Your Dashboard Project

02/01/2014 | iDashboards

Achieve your dashboard initiative goals by paving a path for success. A strategic plan helps you focus on the right key performance indicators and ensures your dashboards are effective. Learn how your organization can excel by planning out your dashboard project with our proven step-by-step process. This informational whitepaper will outline the benefits of well-thought dashboards, simplify the dashboard planning process, help avoid implementation challenges, and assist in a establishing a post deployment strategy.

Download this Whitepaper...

Slicing the Big Data Analytics Stack

11/26/2013 | HP, Mellanox, Revolution Analytics, SAS, Teradata

This special report provides an in-depth view into a series of technical tools and capabilities that are powering the next generation of big data analytics. Used properly, these tools provide increased insight, the possibility for new discoveries, and the ability to make quantitative decisions based on actual operational intelligence.

Download this Whitepaper...

View the White Paper Library

Sponsored Multimedia

Webinar: Powering Research with Knowledge Discovery & Data Mining (KDD)

Watch this webinar and learn how to develop “future-proof” advanced computing/storage technology solutions to easily manage large, shared compute resources and very large volumes of data. Focus on the research and the application results, not system and data management.

View Multimedia

Video: Using Eureqa to Uncover Mathematical Patterns Hidden in Your Data

Eureqa is like having an army of scientists working to unravel the fundamental equations hidden deep within your data. Eureqa’s algorithms identify what’s important and what’s not, enabling you to model, predict, and optimize what you care about like never before. Watch the video and learn how Eureqa can help you discover the hidden equations in your data.

View Multimedia

More Multimedia


Job Bank

Datanami Conferences Ad

Featured Events

May 5-11, 2014
Big Data Week Atlanta
Atlanta, GA
United States

May 29-30, 2014
St. Louis, MO
United States

June 10-12, 2014
Big Data Expo
New York, NY
United States

June 18-18, 2014
Women in Advanced Computing Summit (WiAC ’14)
Philadelphia, PA
United States

June 22-26, 2014

» View/Search Events

» Post an Event