Follow Datanami:
October 17, 2013

Apache Unveils Hadoop 2

Oct. 17 — Apache Software Foundation, which oversees the 150 or so open source projects under the famous Apache umbrella, this week announced Hadoop 2 – the latest version of the popular software framework for distributed computing.

Apache Hadoop enables data-intensive distributed applications to work with thousands of nodes and exabytes of data, providing foundation for many of the world’s big data analytics applications. The framework connects thousands of servers to process and analyze data at supercomputing speed. 

The project’s latest release has been more than four years in the making, and has now achieved the level of stability and enterprise-readiness to earn the General Availability designation, according to a foundation statement.

Apache Hadoop VP Chris Douglas said, “With the release of stable Hadoop 2, the community celebrates not only an iteration of the software, but an inflection point in the project’s development. We believe this platform is capable of supporting new applications and research in large-scale, commodity computing.”

Arun Murthy, release manager for Hadoop 2 and founder of Hortonworks, the company that provides one of the most popular Hadoop distributions, said, “It has been an honor and pleasure to work with the community and a personal thrill to see our four years of work on YARN finally coming to fruition in the GA of Hadoop 2.

“Hadoop is truly becoming a cornerstone of the modern data architecture by enabling organizations to leverage the value of all their data, including capturing net-new data types, to drive innovative new services and applications.” 

Hadoop is widely deployed at enterprise organizations around the globe, including some of the digital economy’s biggest names. These are the likes of Amazon Web Services, AOL, Apple, eBay, Facebook, foursquare, HP, LinkedIn, Netflix, Rackspace, and Twitter.

Many companies, such as Microsoft, IBM, Teradata and SAP, have integrated Hadoop into their services. Yahoo!, an early pioneer, hosts the world’s largest known Hadoop production environment to date, spanning more than 35,000 nodes. 

New in Hadoop 2 is the addition of YARN, which sits on top of HDFS and serves as a large-scale, distributed operating system for big data applications, enabling multiple applications to run simultaneously for more efficient support of data throughout its entire lifecycle.
 
Features include support support for:
 
 - Apache Hadoop YARN for running both data-processing applications (e.g. Apache Hadoop MapReduce, Apache Storm etc.) and services (e.g. Apache HBase)
 - High availability for Apache Hadoop HDFS
 - Federation for Apache Hadoop HDFS for significant scale compared to Apache Hadoop 1.x
 - Binary compatibility for existing Apache Hadoop MapReduce applications built for Apache Hadoop 1.x
 - Support for Microsoft Windows
 - Snapshots for data in Apache Hadoop HDFS
 - NFS-v3 Access for Apache Hadoop HDFS
Tags: ,
Datanami