March 27, 2016

Streaming Architecture–Why Flow Instead of State?

Ted Dunning

(Toria/Shutterstock)

The way that computing is done is changing dramatically. Instead of a program with a finite input, we now have programs with infinite streams as inputs. Why does this matter, and why is the change happening now?

This matters because life doesn’t happen in neatly defined batches. Neither should your code. By adopting a streaming data architecture, we get a better fit between applications and real life. The advantages of this type of design are substantial: systems become simpler, more flexible and more robust. Multiple consumers can use the same streaming data for a variety of different purposes without interfering with each other. This independent multi-tenancy approach makes systems more productive and opens the way to data exploration that need not jeopardize existing processes. And where real time insights are needed, low latency analytics make it possible to react to life (and business) as it happens.

Why now? The answer lies in part in the emergence of new technologies that make this approach feasible with high throughput and low latency, even at very large scale. It is not only tools such as Apache Spark Streaming or Apache Flink for low latency analytics of streaming data that have made this stream-based architecture possible. A fundamental requirement is a message delivery system with a combination of features crucial to the success of a streaming design.

The messaging system needs to be highly scalable, handle high throughput, and deliver the data provided by producers whether or not any particular consumer is running at the moment. This de-coupling of inputs and outputs has enormous impact: it supports a micro-services style of computing. Choosing an appropriate messaging technology, such as Apache Kafka or MapR Streams that can persist messages at speed and scale is key. The stream becomes a re-playable stream. Data is ready to be used immediately or used later.

Durability of streaming messages may seem like a surprising requirement, but it is the key to flow versus state. The stream serves as an immutable log of sequenced event data that can be referenced by multiple users instead of relying on shared databases that can result in unfortunate inter-dependencies. At modern scale and speeds, maintaining the fiction of consistent global state has become impossibly expensive. This means that we have to rethink how we design and build large systems: we must rely instead on a stream of business events and update private databases independently for specific projects.

In my Strata presentation, “Streaming architecture: Why flow instead of state?” I’ll explain how these technologies and revolutionize real world architectures. My session takes place on Wednesday from 5:10 to 5:50 p.m. in room 210 D/H. For more info, click here.

About the author: Ted Dunning is the chief application architect for MapR Technologies. Ted is also a PMC member for the Apache Zookeeper and Mahout projects. He bought the refreshments at the first Apache Hadoop meetup.

Applications: Complex Event Processing, Data Mining, Enterprise Analytics, Predictive Analytics

Technologies: Frameworks, Middleware, Network

Sectors: Financial Services, Healthcare

Tags: big data, Data Analytics, data flow, mapr, stateful processing, streaming analytics, Ted Dunning

Streaming Architecture–Why Flow Instead of State?

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

October 4, 2024

October 3, 2024

October 2, 2024

October 1, 2024

Sponsored Partner Content

Designing a Copilot for Data Transformation

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Seven Innovative Trading Apps & Seven Best Practices You Can Steal

Quant Trading Data Management By the Numbers

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Streaming Architecture–Why Flow Instead of State?

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

October 4, 2024

October 3, 2024

October 2, 2024

October 1, 2024

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Share

Copy short link