March 7, 2017

Dr. Elephant Steps Up to Cure Hadoop Cluster Pains

Alex Woodie

Getting jobs to run on Hadoop is one thing, but getting them to run well is something else entirely. With a nod to the pain that parallelism and big data diversity brings, LinkedIn unveiled a new release of Dr. Elephant that aims to simplify the process of writing tight code for Hadoop. Pepperdata also introduced new software that takes Dr. Elephant the next step into DevOps.

The Hadoop infrastructure at LinkedIn is as big and complex as you likely imagine it to be. Distributed clusters run backend metrics, power experiments, and drive production data products that are used by millions of people. Thousands of internal users interact with Hadoop via dozens of stacks each day, while hundreds of thousands of data flows move data to where it needs to be.

The social media company appreciates a tidy cluster, just like everybody else, but getting things in an orderly manner on Hadoop was beginning to resemble an impossible task, according to Carl Steinbach, a senior staff software engineer at LinkedIn.

“We found that sub-optimized jobs were wasting the time of our users, using our hardware in an inefficient manner, and making it difficult for us to scale the efforts of the core Hadoop team,” Steinbach writes in a blog post yesterday.

While LinkedIn does have a team of technical Hadoop experts at the ready, it would be a waste of time to have them tune each user’s job manually. “At the same time, it would be equally inefficient to try and train the thousands of Hadoop users at the company on the intricacies of the tuning process,” Steinbach writes.

LinkedIn’s Dr. Elephant tool presents users with a Web-based dashboard that shows them how their Hadoop jobs are running

Instead, the company created Dr. Elephant. First released a year ago, Dr. Elephant was updated by LinkedIn yesterday with enhancements that should make it easier to get users and developers on the straight and narrow when it comes to writing well-crafted Hadoop jobs.

The idea behind Dr. Elephant is to identify poorly executing jobs, diagnose the root cause, and then provide developers and other users with recommendations on how to improve the performance of a job without impacting overall cluster efficiency. It’s all about helping users navigate through the bewildering array of choices they have for configuration settings and how to accomplish specific tasks on different frameworks, like MapReduce and Spark.

Just as real doctors don’t impose treatments on patients, the Dr. Elephant software doesn’t automatically make changes and doesn’t force users to accept its recommendations. Instead, it offers suggestions that users are free to adopt or to ignore. (You can bring an elephant to water, as the saying goes, but you can’t make her drink.)

This approach has worked well for LinkedIn, which says Dr. Elephant has been very well received by users. The company says the ease of checking the Dr. Elephant dashboard to see how well (or badly) Hadoop jobs are running creates a bit of positive peer pressure.

“No one wants to have a poorly tuned Hadoop job on the Dr. Elephant dashboard, just like no one wants to submit poorly-written code to an open source project,” Steinbach says.

LinkedIn’s success with Dr. Elephant has spilled over into open source, where the community has driven additional development work in areas like Oozie, Airflow, Azkaban, and Spark integration. It’s also been adopted by Foursquare, Pinterest, Hulu, and other companies, as well as commercial vendors.

One of those commercial vendors is Pepperdata, which today announced that it’s releasing a new product called Application Profiler that is based on Dr. Elephant.

Combined with other Pepperdata products, including the Cluster Analyzer—which collects more than 300 metrics per task in real time—the Application Profiler will provide Hadoop shops with a boost for their DevOps cycles, says Chad Carson, co-founder of Pepperdata.

“Application Profiler and Dr. Elephant are really bringing the Hadoop knowledge, the understanding of the different phases of the overall job, to the developer,” Carson tells Datanami. “A given job might be touching thousands of nodes, and App Profiler and Dr. Elephant understand that and can help the developer to understand how much parallelism to use.”

In some cases, Application Profiler may encourage a user may to totally rewrite a given application, says Pepperdata CEO Ash Munshi.

“They may find the algorithm they’re using has too much communication, too much network traffic, that it relies on storage too much, or that it doesn’t have the right type of locality,” says Munshi, who joined Pepperdata last year. “All of those are things that become evident in this case, and then they have changes they can make, which is how much parallelism you want to do, or make structural changes in the code, which is the algorithm I’ve got is not really a match for what I want to do.”

There are aspects of developing and operating Hadoop and big data clusters that are very different than traditional systems, and customers are struggling with that learning curve at the moment. Much of it boils down to how much parallelism a developer should strive to include in the code. But with hundreds of configuration switches to set, nailing the right pattern can be difficult for those lacking extensive experience, Carson says.

“You have a much more complex parallelism that than you do in traditional code,” he says. With so much complexity in modern distributed systems, it can be difficult to get jobs to return results in an acceptable period of time. This is a problem that many Hadoop users are struggling with, particularly as they seek to bring jobs from the lab into production.

“We like to say that performance can mean the difference between business critical and business useless,” Carson says. “Because with big data, you have not just a bunch of nodes doing things independently, but you’ve got a very complex computer system working at scale. You have lots of complicated interactions among the different jobs, among the different nodes.”

Enforcing Hadoop SLAs in a Big YARN World

Shining a Light on Hadoop’s ‘Black Box’ Runtime

Technologies: Frameworks, Middleware

Sectors: Financial Services, Other

Vendors: LinkedIn, Pepperdata

Tags: distributed, Dr. Elephant, Hadoop, LinkedIn, mapreduce, parallel, pepperdata, Spark

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Dr. Elephant Steps Up to Cure Hadoop Cluster Pains

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Dr. Elephant Steps Up to Cure Hadoop Cluster Pains

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link