Follow Datanami:
April 19, 2024

Apache Software Foundation Announces New Top-Level Project Apache Paimon

With the introduction of Apache Paimon by the Apache Software Foundation (ASF), users can now process data in both batch and streaming modes. Paimon has been under incubation status for a year and has now graduated from incubation to a Top-Level Project (TLP). 

Apache Paimon is a data lake format designed to provide real-time lakehouse architectures built with Apache Spark and Apache Flink for streaming and batch operations. It provides a streaming storage layer and allows Flink to stream proceedings directly on the data lake.

This provides a flexible and reliable storage layer for streaming data. With Paimon, users can combine lake format and log-structured merge-tree (LSM) to bring real-time streaming updates into the data lake. 

“I am really excited to see Paimon graduate and become a top-level ASF project. Paimon has begun enabling Alibaba to do real-time updates and analytics on lake house architecture, and we will also leverage Paimon to serve AI business in the future,” said Feng Wang, head of Open Data Platform at Alibaba Cloud.

(mindscanner/Shutterstock)

All newly accepted projects pass through the ASF Incubator to ensure the projects meet the standards expected from ASF. Projects that reach a level where they have a healthy community and active development graduate to TLP status. 

Paimon was developed by the Flink community and was formerly known as the Flink Table Store. It is now used by Bytedance, Alibaba, Tongcheng, China Unicom, and several other organizations around the globe. In 2023, Confluent announced acquiring Flink startup for a rumored $100m

One of the key features of Paimon is its high-speed data processing that provides large-scale batch and streaming processing capability. It also features fast real-time analytics using Flink streaming. Paimon can perform real-time queries within a minute using indexes such as minmax,  that offer fast queries based on data skipping.

Additionally, Paimon supports a versatile way to read/write data and perform Online Analytical Processing (OLAP) queries. It supports Apache Flink, Apache Hive, Trino, Apache Spark, and other computation engines. With Flink streaming, users can do streaming of large volumes of data. Users also have more flexibility in updating records. For example, they can choose to perform first-row updates or include duplication to keep the last row. 

Apache Software Foundation is a decentralized open-source community of developers for a wide range of enterprise-grade projects. Founded in 1999, the ASF was founded to provide support for the Apache HTTP Server project. With its free and open nature, Apache HTTP Server saw wide adoption and became one of the most widely used web servers. 

ASF has now grown to have more than 8,400 committers and over 320 active projects including Apache Airflow, Apache Camel, Apache Kafka, and more. With the increasing popularity of open-source platforms and the addition of Apache Paimon, we can expect ASF to continue growing. 

Related Items 

2024 State of Apache Airflow Report Shows Rapid Growth in Airflow Adoption

Dremio Donates Fast Analytics Compiler to Apache Foundation

Linux Foundation Promotes Open Source RAG with OPEA Launch

Datanami