Follow Datanami:
January 26, 2016

Real-Time Streaming Gone ‘Bananas’

Akuda Labs is touting the throughput and power of a real-time stream data processing system it released last week. The company says the software, dubbed Bananas, runs 10 times faster that Apache Spark Streaming, with considerably less latency.

The Spark comparison comes from an internal benchmark test that Akuda Labs performed on a 16-core virtual cluster equipped with 16GB of RAM and a 10Gbps connection. The goal of the test was to see how quickly Spark Streaming and Bananas could find specific patterns of text within a large amount of text—in this case, the complete work of William Shakespeare (how’s that for crazy?).

The company claims that Bananas was able to process 100,000 documents per second at a latency of 900 microseconds. By comparison, the Spark Streaming cluster that Akudas put together was able to process 105,000 documents per second at a latency of more than 20 seconds. That gives Bananas a 24,000x advantage over Spark Streaming, the company claims.

“Spark Streaming does not provide as reassuring an example of a modern high-throughput streaming system, as it exhibits astoundingly high latencies for relatively low throughputs and is unable to harness the complete processing power of multicore systems,” the company says in its SlideShare report on the tests.

While Spark Streaming has risen to the forefront as a favored mechanism for enabling real-time data processing, it wasn’t really designed for low-latency workloads. That’s because Spark Streaming uses a “micro-batch” architecture that essentially speeds up existing batch-style workloads and aggregates them into distinct windows of time using its in-memory processing capability.

Bananas, by comparison, was designed from the beginning to deliver that low latency demanded by some real-time applications, the company says. The company says its implementation of a “lockless shared memory queue management protocol” enables it to enable massively parallel processing pipelines, where each packet is processes upon its arrival.

Akuda says its Bananas real-time streaming platform scales nearly linearly

Akuda says its Bananas real-time streaming platform scales nearly linearly

“The commercial availability of Bananas comes at a time when the need for extremely high-performing, real-time stream processing systems is becoming urgent,” Akuda Labs co-founder and CEO Vince Schiavone says in a press release. “Systems that process data with variable latencies, or have increasing latency as more information needs to be processed, are simply not appropriate in an increasing number of scenarios across multiple industries.”

Akuda Labs, which is based in San Jose, California, unveiled its first real-time streaming applications, dubbed Pulsar, more than two years ago. Akuda co-founder and CTO Luis Stevens developed Pulsar based on the research he conducted while at Stanford University on the DASH cache-coherent multiprocessor, which Akuda says he commercialized at Silicon Graphics for organizations like the NSA, NASA, and Los Alamos National Laboratory.

Bananas is separate from Pulsar, which Akudas markets as a real-time streaming classification system. The company is positioning Bananas, which is now available, as a general-purpose data stream processing system. The company also markets an Unstructured Big Data Discovery Platform that captures, classifies, and indexes billions of documents and images daily from the Internet.

“These test results underscore the superiority of distributed system infrastructures that target shared-memory multiprocessors and exploit all their capabilities, as we’ve done with Bananas,” Stevens says in a press release. “Spark Streaming is essentially an abstraction over the Spark batch processing system and is unsuitable for practical streaming systems that require high throughput while performing computationally intensive tasks at sub-second latencies.”

In the future, the company plans to run additional tests pitting Bananas against other real-time streaming platforms, including Apache Storm, Apache Flink, Apache Tez, Apache Samza, Apache Apex, and Google Cloud Dataflow.

Related Items:

Spark Streaming: What Is It and Who’s Using It?

Survey Sees Spark Emerging in 2016

How Uber Uses Spark and Hadoop to Optimize Customer Experience