January 10, 2017

Google Lauds Outside Influence on Apache Beam

Alex Woodie

Apache Beam started life as a decidedly Google-ish technology designed to mask the complexity inherent in building sophisticated analytic pipelines that run across distributed systems. But over the course of becoming a top level project at the Apache Software Foundation during the past year, the unified big data programming framework now boasts a more varied cast of supporting characters.

The input from people outside of the Googleplex has made Beam a better project, says Tyler Akidau, Apache Beam PMC and a staff software engineer at Google, which donated Beam source code to the ASF one year ago.

“Though there were many motivations behind the creation of Apache Beam, the one at the heart of everything was a desire to build an open and thriving community and ecosystem around this powerful model for data processing that so many of us at Google spent years refining,” Akidau wrote in a blog post today. “But taking a project with over a decade of engineering momentum behind it from within a single company and opening it to the world is no small feat. That’s why I feel today’s announcement is so meaningful.”

Akidau shared some statistics about the influence on Beam that came from outside the company. For starters, at least 10 of the 22 large modules in Beam were developed from scratch by the community, “with little to no contribution from Google,” he says.

“Since September,” he added, “no single organization has had more than [about] 50% of the unique contributors per month.” And the majority of new committers added during incubation came from outside Google.

Some of the big names contributing to Beam include Hadoop developer Cloudera, data integration and ETL software developer Talend, and data Artisans, the company behind Apache Flink. The e-payment transaction company PayPal is also working with Beam.

Part of the Apache Beam support matrix that you can find here (Source: Google)

PayPal’s Director of Big Data Platform Assaf Pinhasi says Apache Beam helps us by making stream processing accessible to a broad audience of data engineers through a single API that’s decoupled from the underlying execution engine. “Our data engineers can now focus on what they do best – i.e. express their processing pipelines easily, and not have to worry about how these get translated to the complex underlying engine they run on,” he says in a press release.

When Beam started incubating at the ASF, it featured three so-called “runners,” or underlying execution engines that the Beam API could use, including the Google Cloud Dataflow itself, and runners for Apache Spark and Apache Flink, which were in development. Akidau says Apache Beam now supports five runners, including one for Apache Apex, the real-time processing engine originally developed by DataTorrent, which was founded by former Yahoo engineers.

While it’s early days still for Apache Beam, the progress gives Akidau reason to be optimistic. “Naturally, graduation is only one milestone in the lifetime of the project, and we have many more ahead of us, but becoming top-level project is an indication that Apache Beam now has a development community that is ready for prime time,” he writes.

Moving forward, the Beam community will be “pushing forward the state of the art in stream and batch processing,” Akidau writes. “We’re ready to bring the promise of portability to programmatic data processing, much in the way SQL has done so for declarative data analysis. We’re ready to build the things that never would have gotten built had this project stayed confined within the walls of Google.”

Google Reimagines MapReduce, Launches Dataflow

Technologies: Middleware

Sectors: Financial Services

Vendors: Cloudera, data Artisans, google, PayPal, Talend, yahoo

Tags: Apache Beam, Apache Software Foundation, batch processing, big data, data pipelines, real-time processing, unified programming model

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Google Lauds Outside Influence on Apache Beam

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Google Lauds Outside Influence on Apache Beam

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 26, 2024

April 25, 2024

April 24, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link