

(metamorworks/Shutterstock)
Organizations today are required to process ever-bigger amounts of data in smaller and smaller windows of time. Those two factors are encouraging a movement away from processing architectures based on traditional databases and an approach based on real-time processing on streams of data. Message buses form the foundation for stream processing systems, and therefore are a critical component for organizations that want to develop stream processing applications.
In an ideal world, organizations would be able to land a piece of data before they need it to make a decision. But increasingly, that database-centric approach is viewed as a luxury. Whether the use case is fraud detection, risk management, network monitoring, or network monitoring, organizations increasingly cannot afford to wait around for data to land on traditional architectures to deliver the answers. They need the answers now.
This time-crunch is driving the industry to make sizable investments in building real-time processing systems that can both move and process data much quicker than traditional database-oriented systems.
There are two main components to real-time systems: the underlying message bus and a stream processing system that sits atop it. Let’s handle the underlying message busses first. The list starts out with the big dog in the space: Apache Kafka.
Apache Kafka
Apache Kafka is a distributed open source messaging bus that was written in Java and Scala. The software implements a publish and subscribe messaging system that’s capable of moving large amounts of event data from sources to sinks, in a high-throughput manner with minimal latency and strong consistency guarantees. The software relies on Apache Zookeeper for management of the underlying cluster.
Kafka is based on the concept of producers and consumers. Event data originating from producers is stored timestamped partitions that are housed within Kafka topics. Meanwhile, consumer processes can read the data stored in Kafka partitions. Kafka automatically replicates partitions across multiple brokers (or nodes in the cluster), which allows Kafka to scale its message streaming service in a fault-tolerant manner.
Kafka uses a pull-based model that’s based on consumers pulling data out of Kafka partitions. Kafka stores a complete history of event data within a given amount of time, which allows consumers to “rewind” history or to read the history from the beginning. This provides Kafka’s basis for streaming applications.
Four main APIs are included with the open source project. Two of those, producer and consumer, deliver the core functionality described above. Meanwhile, the Streams API allows an application to read streams of data from topics in the Kafka cluster (in an exactly once fashion, delivering support for transactions), while the Connect API allows a developer to build connectors that continually pull or push data into or out of Kafka.
Kafka was originally developed at LinkedIn to handle the high volume of event data, and was subsequently donated to the Apache Software Foundation in 2011. In 2014, Kafka creators Jay Kreps, Neha Narkhede, and Jun Rao founded Confluent, which offers a commercial version of Kafka that includes enterprise functions and cloud hosting.
In recent years, Kafka has become a very popular open source project, with thousands of companies building Kafka clusters on-premise or in the cloud. While organizations can build their own real-time streaming applications atop Kafka using the Streams API, many choose to couple their Kafka clusters with dedicated stream processing frameworks, such as Apache Flink, Apache Storm, or Apache Spark Streaming.
Apache Pulsar
Apache Pulsar is an open-source distributed pub-sub messaging system originally created at Yahoo that could pose a challenge to Kafka’s hegemony in the message bus layer.
Like Kafka, Pulsar uses the concept of topics and subscriptions to create order from large amounts of streaming data in a scalable and low-latency manner. In addition to publish and subscribe, Pulsar can support point-to-point message queuing from a single API. Like Kafka, the project relies on Zookeeper for storage, and it also utilizes Apache BookKeeper for ordering guarantees.
The creators of Pulsar say they developed it to address several shortcomings of existing open source messaging systems. It has been running in production at Yahoo since 2014 and was open sourced in 2016. Pulsar is backed by a commercial open source outfit called Streamlio, which employs some of Pulsar’s original creators and sells a commercial product that combines Pulsar with Apache Heron, a stream processing engine platform developed at Twitter.
Pulsar’s strengths, according to Streamlio founders, include multi-tenancy, geo-replication, and strong durability guarantees, high message throughput, as well as a single API for both queuing and publish-subscribe messaging. Scaling a Pulsar cluster is as easy as adding additional nodes, which Streamlio says gives it an advantage over other messaging buses.
RabbitMQ
RabbitMQ is a distributed, open source message bus that can be used to implement various data brokering schemes, including point to point, request/reply, and pub-sub communications. The software was written in Erlang, but today it features client libraries in a variety of languages, making it a more open alternative to message distribution and integration than Java Messaging Service (JMS).
Distributed under the Mozilla Public License, RabbitMQ originally implemented the Advanced Message Queuing Protocol (AMQP) but has since been extended with a plug-in architecture, and it now supports a variety of protocols including Streaming Text Oriented Messaging Protocol (STOMP), Message Queuing Telemetry Transport (MQTT), and others.
RabbitMQ can be deployed on clusters and is often used to offload work from busy Web servers, for workload balancing. Many consider its core strength to be reliable message delivery to large numbers of recipients. With more than 35,000 real-world deployments, it’s been battle-tested in the enterprise. RabbitMQ also benefits a large number of libraries that can extend the messaging software, including for complex messaging schemes.
The software was originally developed by Rabbit Technologies Ltd., which was acquired by a division of VMware in 2010. RabbitMQ became part of Pivotal Software in 2013, and today the company offers a hosted version of RabbitMQ on its Pivotal Cloud Foundry.
Apache ActiveMQ
Apache ActiveMQ is a distributed, open source messaging bus that’s written in Java and fully supports JMS. The software was originally developed at LogicBlaze as an open alternative to proprietary messaging buses, such as WebSphere MQ and TIBCO Messaging, and has been backed by the Apache Software Foundation since 2007.
In addition to being an open implementation of JMS, ActiveMQ also supports other protocols, including STOMP, MQTT, AMQP, REST, and WebSockets. The software scales horizontally, and support several modes for high availability, including use of ZooKeeper.
ActiveMQ is distributed the Apache 2.0 License. It forms the basis for Amazon Web Services‘ message queue service, Amazon MQ.
TIBCO Messaging
TIBCO is one of the original purveyors of high-speed message buses for enterprise customers. In fact, it’s right there in the name: The Information Bus COmpany (TIBCO).
Thousands of customers continue to use TIBCO Messaging, which provides a scalable platform for distributing high volume of messages among a variety of sources and sinks in a low-latency manner. The company’s core Enterprise Message Service is built around the JMS 1.1 and 2.0 standards.
TIBCO today extends its flagship Messaging platform with several other versions, including one based on Apache Kafka. It also offers the Eclipse Mosquitto Distribution of Messaging, which supports the MQTT protocol.
There are many other message buses out there, but these arguably are the most popular. In a future post, we’ll investigate stream processing frameworks that can sit atop these message buses.
August 21, 2025
- Salesforce Signs Definitive Agreement to Acquire Regrello
- Learn AI Skills Through the SAS Hackathon
- Elemental Machines Launches Flexible, Multitiered Business Intelligence Platform to Expand Lab Data Insights
- EDB Research Highlights Energy and Cost Savings for Fortune 500 Firms
- CVector Raises $1.5M in Pre-Seed Round Led by Schematic Ventures to Launch the Data Backbone for Industrial AI
- Esri Releases Practical New Guide to Creating Visually Stunning and Effective Map Apps
- Appian’s Latest Platform Release Delivers Enterprise-Ready AI
- Starburst Announces AI & Datanova 2025, an Exclusive In-Person Summit for Data and AI Leaders
- Dresner Advisory Research Confirms Agentic AI’s Rapid Rise as a Business Imperative
August 20, 2025
- dbt Labs Launches Reimagined Global Partner Ecosystem Program to Accelerate Strategic Growth
- Intel and AWS Drive Cloud Innovation Powered by Xeon 6 Processors
- Salesforce Completes Acquisition of Waii to Advance Natural Language-to-SQL in Data Cloud
- Treasure Data Earns Industry-First Certification for Responsible AI by TrustArc
- CData Announces ‘Foundations 2025’ Two-Day Conference: Building a Data Foundation for Analytics and AI
- O’Reilly to Host 2nd AI Codecon on Coding for the Agentic World
August 19, 2025
- Alation Unveils ‘Alation Chat with Your Data’ to Help Businesses Move from Searching to Solving
- Hitachi Vantara’s Virtual Storage Platform One Now Available in the Microsoft Azure Marketplace
- Apache Software Foundation Announces Apache Ozone 2.0.0
- Ataccama and Adastra Business Consulting Expand LATAM Partnership
- Fujitsu Signs New Licensing Agreement with Palantir
- Rethinking Risk: The Role of Selective Retrieval in Data Lake Strategies
- LinkedIn Introduces Northguard, Its Replacement for Kafka
- Why Metadata Is the New Interface Between IT and AI
- Top 10 Big Data Technologies to Watch in the Second Half of 2025
- Why OpenAI’s New Open Weight Models Are a Big Deal
- LakeFS Nabs $20M to Build ‘Git for Big Data’
- What Are Reasoning Models and Why You Should Care
- Doing More With Your Existing Kafka
- What Is MosaicML, and Why Is Databricks Buying It For $1.3B?
- Meet Vast Data CEO Renen Hallak, a 2024 BigDATAwire Person to Watch
- More Features…
- Mathematica Helps Crack Zodiac Killer’s Code
- BigDATAwire Exclusive Interview: DataPelago CEO on Launching the Spark Accelerator
- Promethium Wants to Make Self Service Data Work at AI Scale
- McKinsey Dishes the Goods on Latest Tech Trends
- Solidigm Celebrates World’s Largest SSD with ‘122 Day’
- The Top Five Data Labeling Firms According to Everest Group
- Google Pushes AI Agents Into Everyday Data Tasks
- GigaOm Rates the Object Stores
- AI Skills Are in High Demand, But AI Education Is Not Keeping Up
- Oracle Launches Exadata Service for AI, Compliance, and Location-Critical Workloads
- More News In Brief…
- Seagate Unveils IronWolf Pro 24TB Hard Drive for SMBs and Enterprises
- Gartner Predicts 40% of Generative AI Solutions Will Be Multimodal By 2027
- Gathr.ai Unveils Data Warehouse Intelligence
- StarTree Adds Real-Time Iceberg Support for AI and Customer Apps
- LF AI & Data Foundation Hosts Vortex Project to Power High Performance Data Access for AI and Analytics
- Dell Unveils Updates to Dell AI Data Platform
- Zscaler Unveils Business Insights with Advanced Analytics for Smarter SaaS Spend and Resource Allocation
- Collibra Acquires Deasy Labs to Extend Unified Governance Platform to Unstructured Data
- CodiumAI Launches Quality-First Generative AI Coding Platform for Enterprises
- Stack Overflow’s 2025 Developer Survey Reveals Trust in AI at an All Time Low
- More This Just In…