Follow Datanami:
April 16, 2024

Apache Software Foundation Announces New Top-Level Project Apache Paimon

WILMINGTON, Del., April 16, 2024 — The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 320 active open source projects and initiatives, today announced that Apache Paimon has graduated from incubation and is now a Top-Level Project (TLP).

Paimon is a data lake format that enables real-time lakehouse architectures built with Apache Flink and Apache Spark for streaming and batch operations. Paimon innovatively combines lake format and log-structured merge-tree (LSM) to bring real-time streaming updates into the data lake.

“I am really excited to see Paimon graduate and become a top-level ASF project. Paimon has begun enabling Alibaba to do real-time updates and analytics on lakehouse architecture, and we will also leverage Paimon to serve AI business in the future,” said Feng Wang, head of Open Data Platform at Alibaba Cloud.
As a streaming data lake platform, Paimon allows users to process data in both batch and streaming modes. Feature highlights and benefits include:

  • High-speed Data Processing: Paimon’s append table (no primary-key) provides large scale batch and streaming processing capability;
  • Flexible Updates: Paimon gives users the flexibility of choice when updating records including deduplication to keep last row; partial-updates; aggregation records; first-row updates;
  • Fast Real-time Analytics: By leveraging Flink Streaming, Paimon’s primary key table supports real-time streaming updates of large amounts of data. Paimon performs real-time query within one minute;
  • Simplified Changelog Production: Paimon simplifies users’ streaming analytics by producing accurate and complete changelog updates for merge engines; and
  • Low-latency Data Queries: Paimon supports data compaction with z-order sorting to optimize file layout. By using indexes such as minmax, Paimon also enables fast queries based on data skipping.

“Apache Paimon is a high-performance, low-latency real-time data lake that significantly reduces data computation and storage costs and markedly enhances data development efficiency in various scenarios, such as Ant Group’s risk control and the Wufu application,” said Zhigang Li, head of Real-time Computing at Ant Group.

“I was fortunate enough to participate in the entire lifecycle of Paimon to-date, from Flink Table Store to independent incubation and successful graduation, experiencing firsthand the practicality and excellence of community developers,” said Guanghui Zhang, head of Streaming Computing at ByteDance.

Formerly known as Flink Table Store, Paimon was first developed by the Flink community. Paimon is leveraged globally in production environments by companies such as Alibaba, Ant Group, Bytedance, China Unicom, and Tongcheng, among others.

About the Apache Incubator

The Apache Incubator is the primary entry path for projects and codebases wishing to become part of the efforts at The Apache Software Foundation. All code donations from external organizations and existing external projects enter the ASF through the Incubator to: 1) ensure all donations are in accordance with the ASF legal standards; and 2) develop new communities that adhere to our guiding principles. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision-making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

About The Apache Software Foundation (ASF)

Founded in 1999, the Apache Software Foundation exists to provide software for the public good with support from more than 75 sponsors. ASF’s open source software is used ubiquitously around the world with more than 8,400 committers contributing to 320+ active projects including Apache Superset, Apache Camel, Apache Flink, Apache HTTP Server, Apache Kafka, and Apache Airflow. The Foundation’s open source projects and community practices are considered industry standards, including the widely adopted Apache License 2.0, the podling incubation process, and a consensus-driven decision model that enables projects to build strong communities and thrive.


Source: ASF

Datanami