Apache Pulsar 2.0 Released
Apache Pulsar, the distributed “publish and subscribe” messaging platform released to the open source community by Yahoo nearly two years ago is generating streaming messaging capabilities aimed at enterprise data-driven applications.
The latest comes from Streamlio, developer of a real-time streaming analytics platform the also runs on top of Apache Heron, the real-time analytics system developed by Twitter (NYSE: TWTR). The startup based in Palo Alto, Calif., is targeting enterprise-class streaming data processing.
Streamlio previously claimed its approach delivered a 150-percent performance boost over Kafka for stream processing in recent benchmark tests.
The Apache Foundation’s 2.0 release of Pulsar adds new functionality designed to move data users “beyond batch” processing. Among these is a “stream-native” processing capability called Pulsar Functions designed to apply analytics to data as its flows through the Pulsar platform. Processing functions can be written in either Java or Python, the company said.
Debuted earlier this year as a preview feature, Streamlio announced general availability of Functions this week.
Another is a enhancement developed in conjunction with Apache Bookkeeper, a scalable storage system. Streamlio said the new features, called Topic Compaction, delivers streaming data storage designed to improve the performance of applications consuming data from Pulsar. It serves as a “broker” that builds a snapshot of the latest value for each topic key, the startup said.
It also includes a schema registry designed to ease developer of data applications through the ability to define and validate the structure of data. The method is billed as a way of tracking data streams at scale.
Streamlio said the upgrades, which take advantage of scaling optimizations in recent releases of the Apache BookKeeper stream storage solution, would extend performance, demonstrating what it said is a seven-fold greater throughput than Apache Kafka.
Matteo Merli, co-founder of Streamlio, previously served as architect and lead developer of Pulsar at Yahoo. The newest version of its platform is designed “to move beyond the limits of traditional batch-centric approaches to the data-driven future where [companies] can immediately process and act on fast-moving data as quickly as it arrives,” Merli said.
–Editor’s note: This article has been revised to reflect the open-source origins of Apache Pulsar.