May 28, 2013

YARN to Spin Hadoop into Big Data Operating System

Isaac Lopez

Hadoop is about to see a fundamental reset in its base functionality, says Arun Murthy, architect with Hortonworks and the Apache Software Foundation, who says that SQL in Hadoop via YARN is a part of the core of this metamorphosis.

While Hadoop has been garnering plenty of attention for its potential around the enterprise, one of its chief weaknesses has been that it was originally designed as a single application system – namely the batch-oriented MapReduce. As a system that was developed and grown specifically for web-scale data by the likes of Yahoo! and Facebook, this made sense at one point in time, however new trends and enterprise demands are emerging that are changing the paradigm.

One of these fundamental trends that is changing the picture is enterprises viewing “big data” as “all their data,” – not just specific, narrow aspects of it. Firms are looking ways to break down the data silos in their organizations and bring the data together in one central place where it can be accessed. Centrally Storing large amounts of data, of course, is something that Hadoop is strong at, however, once it’s there, bottlenecks can crop up where business analysts may be in competition against each other for cluster resources.

Tools and other capabilities have been designed and implemented to address these potential limitations of Hadoop, including vendor tools such as Platfora, as well as well-known projects such as Hive, Pig, and HBase. However, says Murthy, the YARN project is about opening up the entire framework for use cases that were previously not possible.

“When we set out to build Hadoop 2.0, we wanted to fundamentally re-architect Hadoop to be able to run multiple applications against relevant data sets,” writes Murthy. “And do so in a way where multiple types of applications can operate efficiently and predictable within the same cluster – this is really the reason behind Apache YARN, which is foundational to Hadoop 2.0. By managing the resource requests across a cluster, YARN turns Hadoop from a single application system to a multi-application operating system.”

Earlier this year, Hortonworks CTO Eric Baldeschwieler echoed this sentiment telling an audience that extensibility is a chief focus of the Hadoop 2.0 initiative, referencing YARN as a key foundation of the reworked framework.

According to Cloudera, YARN, which they say is an acronym for “Yet Another Resource Negotiator,” is a framework that facilitates the writing of arbitrary distribute processing frameworks and applications.

“Yarn provides the daemons and APIs necessary to develop generic distributed applications of any kind, handles and schedules resource requests (such as memory and CPU from such applications, and supervises their execution,” says Harsh Chouraria with Cloudera, who says that this means YARN can run applications that do not follow the MapReduce model.

This opens up Hadoop to a whole new paradigm of usage. Where before it could be considered a central storage place where you could run batch analytics, YARN essentially opens the framework up to being a big data operating system of sorts, where multiple applications can be running simultaneously. This means everything from machine learning, to real-time event processing, data modeling and more.

So while Hadoop has been virtually synonymous with MapReduce, it’s about to see what promises to be a fundamentally game-changing shift. These new capabilities are due for release this summer, says Murthy, as part of the Hadoop 2.0 roll-out.

Intel Hitches Xeon to Hadoop Wagon

Hortonworks Proposes New Hadoop Incubation Projects

Applications: Enterprise Analytics, Predictive Analytics, Research Analytics

Technologies: Frameworks, Storage, Systems

Sectors: Biosciences, Financial Services, Government, Healthcare, Other, Retail, Science

Tags: Arun Murthy, cloudera, Eric Baldeschwieler, Hadoop, Hortonworks, mapreduce, yarn

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

YARN to Spin Hadoop into Big Data Operating System

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

YARN to Spin Hadoop into Big Data Operating System

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link