Follow Datanami:
May 8, 2023

Open Source Provides Path to Real-Time Stream Processing

Avtar Raikmo

(Blue Planet Studio/Shutterstock)

Consumers expect immediate, personalized gratification. Real-time distributed stream processing enables companies to meet those expectations. However, many see the technology as being out of reach for all but the biggest organizations, with the most skilled staff in the most time-sensitive of industries. That’s not the case anymore, and, with the availability of free, open source options — not to mention hosted models — organizations can see for themselves a new horizon of possibilities courtesy of real-time stream processing technologies.

Real-Time Dilemma

Real-time stream processing combines what’s known to be “normal” – thanks to the troves of historical data – with what’s happening in the moment – data from events/transactions, aka data-in-motion. Organizations can use the resulting insight to react instantaneously in that moment, not having to wait until after the data is written down to a store and analyzed. Financial institutions were early adopters, using the technology to improve fraud detection, identify opportunities to offer tailored loans and many more services. Now, companies of all sizes and industries are starting to see the possibilities of the next generation of streaming.

Experimenting with the technology is really the best way to determine whether and how it can work for an organization, especially since the idea of streams and building data pipelines can be difficult to conceptualize. That’s where free, open source offerings come into play. There are a number of open source stream processing platforms that companies can use to test out their own use cases. Hazelcast, for example, can be freely downloaded, and it comes with several data connectors that enable users to get stream processing up and running relatively quickly.

(voyager624/Shutterstock)

Where Hazelcast differs from other streaming solutions is the integration of a proven, resilient fast data store with the stream processing engine. This unique combination enables organizations to combine business-critical data from multiple systems (e.g. data lakes, databases, etc.) in a very fast data store that sits alongside a powerful stream processing engine in one platform and one process, pulling from multiple sources for historical and streaming data. There’s no need to link stream processing in one platform with data stored in another, as you would with other popular offerings, such as Flink. The performance benefits of this model are huge because everything is optimized to work together. Case in point: Hazelcast scales beyond a billion transactions per second, with extremely low latency. That kind of performance would be incredibly challenging to attain consistently with two separate systems — even best-of-breed systems — because they would behave differently and would have to be optimized, developed for and debugged differently.

Power of Community & Getting Started

In the end, it doesn’t really matter what open source platform companies get their feet wet with; we just want them to get their feet wet — to start exploring real-time stream processing and to try new things and develop emergent behavior.

This is happening in the Hazelcast community. We’re seeing architects and developers from a variety of industries modernize existing applications to take advantage of stream processing and now they’re deploying innovative new services that improve customer experiences. The best part, our community is just getting started and I’m truly excited to see where they apply the technology next.

(metamorworks/Shutterstock)

For example, community members have shared that they are using Hazelcast to automatically generate data lineage audit trails to accelerate their development cycles by leveraging the fast data store. They wanted to be able to compile their code quickly, ensuring they were able to benefit from only processing changes. You can imagine that same principle applied in any number of ways, including tracing users’ workflows through an application to determine what could be further optimized or even automated. It’s not what the Hazelcast Platform was designed for, but community members are opening themselves up to trying different things, which is exactly the kind of behavior that you want to see from the community.

The Horizon is Closer Than You Think

We’re actively seeing AI and ML use cases surface within the community. For example, users are utilizing streaming events to calculate and aggregate data over periods of time and use that data as an input for machine learning training. Defined ML features such as the trends for number of transactions executed in a specific time period or aggregated total value or even popular location information is being used to determine if this is a normal pattern of behavior for a particular customer. This level of transparency, that can lead to a deeper understanding of a customer, is the kind of information that can be effectively calculated only in near real time using stream processing. After the fact, stream processing audit trails can be used to identify why an AI/ML system did what it did.

The point is that the potential of real-time stream processing can be realized only if companies have an opportunity to use it. And, with instantaneous now the new normal, companies that don’t explore the technology — using platforms that demonstrate the true power of the technology — are in danger of being left behind.

Think of it like distributed computing, which was new and intimidating 10 years ago. Today, it’s table stakes. Real-time stream processing is on that same trajectory, especially given that the underlying infrastructure is only going to get faster, more capable and more intelligent. An open source platform optimized for performance, scale and resiliency enables organizations to test the potential of real-time stream processing, while the backing of a strong community (and the availability of enterprise support over time) helps them imagine how the technology can be effectively applied — now and in the future.

About the author: Avtar Raikmo started his career as a developer of Java, Python, C#, and C++ solutions before becoming a senior leader at Goldman Sachs, Morningstar and subsequently Meta (Facebook). Today, he is still just as passionate about technology and motivated by large scale data challenges, as well as raising two children. Based in the UK, his current position is the Head of Engineering for Hazelcast Platform and he is an active member on LinkedIn.

Related Items:

Five Drivers Behind the Rapid Rise of Apache Flink

Is Real-Time Streaming Finally Taking Off?

Real-Time Data Streaming, Kafka, and Analytics Part One: Data Streaming 101

 

 

Datanami