April 11, 2013

Hadoop Data Management Set to Fly with Falcon

Isaac Lopez

Hadoop data management tools for the enterprise are on their way says a team of open source developers at Hortonworks and InMobi, who recently saw their project, dubbed Falcon, accepted as an Apache Software Foundation incubator project.

“There are two sets of problems [that Falcon addresses],” explained Hortonworks CTO, Eric Baldeschwieler in a recent keynote at the Hadoop Summit in Amsterdam. “One is data life cycle and data movement. How do you get data into the cluster – how do you move it between clusters and make sure that you keep the data in the right place for the right amount of time. The other is how do you automate ETL flows in a much simpler, more declarative fashion.”

According to the Apache page outlining the Falcon proposal, enterprises using Falcon will be able to relatively easily set up Falcon using declarative mechanisms to define infrastructure endpoints, data sets and processing rules. With dependencies between the configured entities explicitly defined, Falcon will then orchestrate data management functions automatically.

“If you look at where Hadoop is in its adoption lifecycle, these needs are starting to really emerge this year,” explains Shaun Connolly, VP of Corporate Strategy with Hortonworks saying that there is a growing need for this type of data management. “Increasingly over the last 6 to 9 months, we’ve seen an increase in more mainstream enterprises that have been embracing Hadoop for various needs. Now that they have a Hadoop cluster or two running, they’re going to double back and basically say ‘now how do I operationalize this.”

Currently these processes are being handled by the early adopters of Hadoop in disparate ways by IT teams who manually code them, explained Connolly – a process which can be tedious and prone to error. He says that once recognizing this gap, they moved to plug it with the open source Falcon solution, which Connolly says automate the processing of data lifecycle management scenarios in predictable and reliable ways.

“What Falcon does is provide a framework for addressing [data lifecycle management] needs within the context of Apache Hadoop, but it also provides a set of open APIs that enable those workflows to be orchestrated more broadly, so if you want to orchestrate data lifecycle workflows within Hadoop as well as with your Teradata system (as an example) concurrently, then enterprises would use the Falcon API from those other tools and be able to drive those workflows indirectly.”

A recent post on the Hortonworks website by Falcon contributor, Venkatesh Seetharam illustrates Falcon’s role as a data management tool for Hadoop:

While the Falcon project was just recently added as an Apache Software Foundation incubator project, the code itself is presently beginning its second year of maturity after having been developed by mobile ad network company, InMobi.

InMobi built the Falcon framework to scratch their own internal data management itch. According the mobile-oriented ad platform developer, their network receives in excess of 10 billion events (ad-serving and related) every day through multiple sources/streams originating from over ten geographically distributed data centers, requiring the processing tens of terabytes of data a day.

“As we explored cheaper and more effective ways of processing this huge amount of data, we came up with a simple in-house scheduler to manage job flows in our environment then,” explains Mohit Saxena at InMobi. “We realized that to be able to process data in a decentralized fashion, we needed to have the complexity pushed into a platform and allow the engineers to focus on the processing / business logic.”

After having developed the framework, InMobi engineers approached Hortonworks engineers about working together to bring it to the Apache for incubation and acceleration. According to Saxena, Falcon has been widely used for various processing pipelines and data management functions including SLA critical feedback pipelines, correctness critical revenue pipelines, and other applications over the last year, getting a heavy workout prior to its Apache incubation.

While Connolly was reluctant to give us an expected launch frame, he did hint that they didn’t expect that there would be a prolonged wait.

“The net out of it is that it’s been deployed in production at InMobi for about a year, so there’s some significant technology there, and the goal is to accelerate adoption.”

Deployment and Active Management of Hadoop in the Clouds

How Facebook Fed Big Data Continuuity

Applications: Complex Event Processing, Data Mining, Enterprise Analytics

Technologies: Frameworks, Network, Systems

Sectors: Financial Services, Government, Healthcare, Other, Retail

Vendors: Hortonworks

Tags: Apache Software Foundation, data management, enterprise, Eric Baldeschwieler, Falcon, Hadoop, Hadoop Summit, Hortonworks, incubator project, InMobi, Mohit Saxena, Shaun Connolly, Venkatesh Seetharam

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Hadoop Data Management Set to Fly with Falcon

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Hadoop Data Management Set to Fly with Falcon

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link