Follow Datanami:
January 18, 2022

ETL Tool Apache Hop Graduates Incubator

Apache Hop, a metadata-driven data orchestration tool used to design and build pipelines, today emerged from incubator status and was named a Top-Level Project at the Apache Software Foundation, clearing the way for more intensive production use.

Apache Hop, which stands for Hop Orchestration Platform, is a Java-based product designed to help data professionals manage a variety of data and metadata orchestration and integration needs. The software sports a visual design environment that allows users to create ETL pipelines, as well as an execution engine that can run by itself or embedded into Spark, Flink, Google Dataflow, or on AWS EMR via Apache Beam.

“Hop is entirely metadata driven,” it states on the Apache Hop website. “Every object type in Hop describes how data is read, manipulated or written, or how workflows and pipelines need to be orchestrated. Metadata is what drives Hop internally as well. Hop uses a kernel architecture with a robust engine. Plugins add functionality to the engine through their own metadata.”

The product, which started life as Kettle (Kettle Extraction Transformation Transport Load Environment), was acquired by Pentaho (now Hitachi Vantara) and brought to market as the Pentaho Data Integration (PDI) offering. The software was refactored over several years, and a fork of it entered the Apache Incubator as an open source project in September 2020.

Hop sports more than 250 pre-canned plug-ins, which should enable users to connect to a variety of applications, databases, and file systems running on Windows, Linux, OSX, and other environments, both on prem and in the cloud. The software also built-in lifecycle management, which will help users operate pipelines in a DevOps environment.

Bart Maertens, the vice president of Apache Hop, says the software enables people of all skill levels to build powerful and scalable data solutions without writing any code.

“We are pleased to successfully adopt ‘the Apache Way’ and graduate from the Apache Incubator,” Maertens said in a press release. “As an Apache Top-Level Project, Hop is developed and used by people across the globe. Hop’s full project life cycle support helps these data teams to successfully build, test and run their projects in ways that would otherwise be hard or impossible to do.”

One Hop adopter is Sergio Ramazzina, CEO and chief architect at Serasoft S.r.l., an Italian software company that adopted Hop in early 2021. Ramazzina, who is a member of the Apache Hop Project Management Committee, highlighted Hop’s “flexibility, scalability, and ease of use, in various scenarios ranging from classical DWH [data warehouse] ETL processes to highly critical, real-time processes.”

The Apache Software Foundation is home to more than 350 software projects. To achieve Top-Level Status, a project must adhere to a number of strict requirements surrounding the technical, governance, legal, community, branding, and funding aspects of the software itself and the project as a whole.