March 27, 2016

Streaming Architecture–Why Flow Instead of State?

Ted Dunning

(Toria/Shutterstock)

The way that computing is done is changing dramatically.  Instead of a program with a finite input, we now have programs with infinite streams as inputs. Why does this matter, and why is the change happening now?

This matters because life doesn’t happen in neatly defined batches. Neither should your code. By adopting a streaming data architecture, we get a better fit between applications and real life. The advantages of this type of design are substantial: systems become simpler, more flexible and more robust. Multiple consumers can use the same streaming data for a variety of different purposes without interfering with each other. This independent multi-tenancy approach makes systems more productive and opens the way to data exploration that need not jeopardize existing processes. And where real time insights are needed, low latency analytics make it possible to react to life (and business) as it happens.

Why now? The answer lies in part in the emergence of new technologies that make this approach feasible with high throughput and low latency, even at very large scale. It is not only tools such as Apache Spark Streaming or Apache Flink for low latency analytics of streaming data that have made this stream-based architecture possible. A fundamental requirement is a message delivery system with a combination of features crucial to the success of a streaming design.

The messaging system needs to be highly scalable, handle high throughput, and deliver the data provided by producers whether or not any particular consumer is running at the moment. This de-coupling of inputs and outputs has enormous impact: it supports a micro-services style of computing. Choosing an appropriate messaging technology, such as Apache Kafka or MapR Streams that can persist messages at speed and scale is key. The stream becomes a re-playable stream. Data is ready to be used immediately or used later.

Durability of streaming messages may seem like a surprising requirement, but it is the key to flow versus state. The stream serves as an immutable log of sequenced event data that can be referenced by multiple users instead of relying on shared databases that can result in unfortunate inter-dependencies. At modern scale and speeds, maintaining the fiction of consistent global state has become impossibly expensive. This means that we have to rethink how we design and build large systems: we must rely instead on a stream of business events and update private databases independently for specific projects.

In my Strata presentation, “Streaming architecture: Why flow instead of state?” I’ll explain how these technologies and revolutionize real world architectures. My session takes place on Wednesday from 5:10 to 5:50 p.m. in room 210 D/H. For more info, click here.

 

Ted Dunning

About the author: Ted Dunning is the chief application architect for MapR Technologies. Ted is also a PMC member for the Apache Zookeeper and Mahout projects. He bought the refreshments at the first Apache Hadoop meetup.

 

Share This