February 4, 2014

Shining a Light on Hadoop’s ‘Black Box’ Runtime

Alex Woodie

Let’s face it: Writing MapReduce processes is not very fun. That’s the main reason that the Cascading framework is gaining such a big following–because it abstracts away the difficult part of MapReduce with an easy-to-use Java API and library. With today’s launch of a new product called Driven, the company behind Cascading is enabling users to instrument the data analytic apps developed with Cascading, in pursuit of faster troubleshooting and higher performance.

There is some serious momentum building up behind Cascading. According to Concurrent–the commercial open source company founded by Cascading creator Chris Wensel to sell support for Cascading–the open source framework is being downloaded 130,000 per month. What’s more, 6,000+ companies have deployed Cascading-built applications on production Hadoop clusters, including big names like Twitter, Kohl’s, and Nokia.

The way that Cascading allows mortal Java developers with average skills to build MapReduce-based applications that would normally require a super Java coder to construct has made Cascading a staple component of many Hadoop projects. “Being a Java API, the average Java developer can use it,” Wensel tells Datanami. “They can write tests and use their IDE. But also more importantly, they can think about the problem at hand and they don’t have to think in terms of MapReduce, MapReduce, MapReduce.”

While Cascading has helped many organizations build data analytic apps that run on Hadoop, the framework doesn’t address the overall lack of visibility into the inner-workings of Hadoop apps once they’re placed into production.

“One of the big problems in Hadoop today is it’s just a black box,” says Concurrent CEO Gary Nakamura. “Most people today deploy their applications and pray. What we’re doing [with Driven] is providing the visibility so you can actually see what’s going on, and if there’s a failure, we’ll take you to the exact spot that failure happened, so a developer can try and figure out what to do.”

From its GUI, Driven will show users exceptions and track traces in their Hadoop app, and track all the filters, joins, and other functions that are taking place within the software. “You’ll be able to see all of the details in your data application, the units of work and how it all ties together,” Nakamura says. “You’ll be able to see them in real-time, running on Hadoop, and see how your application is progressing.”

Nakamura says the software will help users, operators, and developers collaborate on improving their Hadoop applications–not only with broken apps, but with the working apps that could use a little optimization.

“We expect Driven to provide the capability to build more reliable applications,” he says. “Developers and operators will be able to look at those things and say, ‘Hmmm, we should have somebody take a look at this because everything else takes 5 minutes and this takes 25 minutes. We ought to be able to optimize that down to something more reasonable.'”

As Cascading grows in use, so will Driven, Nakamura syas. “The next version [3.0] of Cascading will support other fabrics like Spark, Storm, and Tez,” he says. “So that means applications that have been built using Cascading will be portable across the supported frameworks, across the supported fabrics.” As these Cascading-developed applications start moving to different fabrics, Driven will follow and provide the same type of troubleshooting and optimization capabilities.

The first release of Driven will focus on helping developers monitor, debug, and set alerts on their Hadoop apps. Later, Concurrent will add more operational capabilities to their Hadoop jobs, including setting service level agreements (SLAs), enforcing quality of service (QoS), and ensuring the integrity of data lineage.

The first beta release of Driven is available now as a cloud service. The service is free for development use. A paid enterprise version is in the works that will support production use and be installable on-premise; it’s expected in the second quarter.

Driven supports Cascading version 2.08 (it’s currently at version 2.5) and includes popular domain specific languages like Lingual (ANSI SQL), Pattern (PMML), Scalding (Scala), and Cascalog (Clojure).

Concurrent Completes the Big Data Hat Trick for Hadoop Applications

Concurrent Cascades Into Funding Round

Applications: Enterprise Analytics

Technologies: Middleware

Sectors: Financial Services, Manufacturing, Retail

Vendors: Startups and More...

Tags: Cascading, Hadoop, mapreduce, storm, tez

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Shining a Light on Hadoop’s ‘Black Box’ Runtime

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 16, 2024

April 15, 2024

April 12, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Shining a Light on Hadoop’s ‘Black Box’ Runtime

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 16, 2024

April 15, 2024

April 12, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link