August 28, 2012

Facebook Sees Hadoop Through Prism

Datanami Staff

A prism is used to separate a ray of light into its contingent frequencies, spreading them out into a rainbow. Maybe Facebook’s Prism will not make rainbows, but it will have the ability to split Hadoop clusters across the world.

This development, along with another Facebook project Corona which solves the single point of failure in MapReduce, was recently outlined with an emphasis on the role of the project for the larger Hadoop ecosystem.

Splitting a Hadoop cluster is difficult, explains Facebook’s VP of Engineering Jay Parikh. “The system is very tightly coupled,” said Facebook’s VP of Engineering Jay Parikh as he was explaining why this splitting of Hadoop clusters was not possible, “and if you introduce tens of milliseconds of delay between these servers, the whole thing comes crashing to a halt.”

So if a Hadoop cluster works fine when located in one data center and it is so difficult to change that, why go to the trouble? For a simple reason, according to Parikh. “It lets us move data around, wherever we want. Prineville, Oregon. Forest City, North Carolina. Sweden.”

This mobility mirrors that of Google Spanner, which moved data around its different centers to account for power usage and data center failure. Perhaps, with Prism, Facebook is preparing for data shortages/failures on what they call the world’s largest Hadoop cluster. That would be smart. Their 900 million users have, to this point, generated 100 petabytes of data that Facebook has to keep track of. According to Metz, they analyze about 105 terabytes every thirty minutes. That data is not shrinking, either.

Indeed, Parikh has an eye toward power usage and data failures. “It allows us to physically separate this massive warehouse of data but still maintain a single logical view of all of it. We can move the warehouses around, depending on cost or performance or technology…. We’re not bound by the maximum amount of power we wire up to a single data center.” Essentially, like Google Spanner, Prism will automatically copy and move data to other data centers when problems are detected or if needed elsewhere.

Facebook also improved Hadoop’s MapReduce system by introducing a workable solution that utilizes multiple job trackers. Facebook had already recently engineered a solution that takes care of the Hadoop File System which resulted in Hadoop’s HA NameNode, and Corona supplements that with a MapReduce fix. “In the past,” Parikh notes, “if we had a problem with the job tracker, everything kinda died, and you had to restart everything. The entire business was affected if this one thing fell over. Now, there’s lots of mini-job-trackers out there, and they’re all responsible for their own tasks.”

While both Corona and Prism seem like terrific additions to Hadoop, neither of these products have yet been released and Parikh gives no timetable for when that will happen. Further, according to Metz, MapR claims it has a solution similar to that of Facebook’s but nothing has hit the open source Hadoop.

Either way, there exist few companies who have as much at stake as Facebook when it comes to big data analytics. Improvements to Hadoop mean improvements to how Facebook handles their ridiculous amount of data. Hopefully, through Prism and Corona, they have found ways to split clusters among multiple data centers along with introducing backup trackers.

Related Stories

Six Super-Scale Hadoop Deployments

How 8 Small Companies are Retooling Big Data

Cloudera CTO Reflects on Hadoop Underpinnings

MapR Floating Google Cloud

Vendors: MapR

Tags: big data, clusters, Corona, facebook, Hadoop, hadoop clustres, prism

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Facebook Sees Hadoop Through Prism

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

April 19, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Facebook Sees Hadoop Through Prism

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

April 19, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link