June 3, 2014

DataTorrent RTS Clocks In at 1.5B Events per Second

Alex Woodie

What would you do with a system that could process 1.5 billion events per second? That’s the mind-boggling rate at which DataTorrent’s Real-Time Streaming (RTS) offering for Hadoop was recently benchmarked. Now that RTS is generally available–DataTorrent announced its general availability today at Hadoop Summit in San Jose–we may soon find out.

That 1.5-billion-events-per-second figure was recorded on DataTorrent’s internal Hadoop cluster, which sports 34 nodes. Each node is able to process tens of thousands of incoming events (call data records, machine data, and clickstream data are common targets) per second, and in turn generates hundreds of thousands of secondary events that are then processed again using one of the 400 operators that DataTorrent makes available as part of its in-memory, big-data kit.

When all the levels of operators and events are added up, RTS handled a total of 1.5 billion events per second. To put that into perspective, DataTorrent says that processing 1 billion data events per second is the equivalent of processing 46 cumulative hours of streaming Twitter data in one second. That is a lot of data.

Not that anybody is asking for that much event-processing capacity at the moment. “We feel we’re at a point with 1.5 billion events per second where we have plenty of head room,” says John Fanelli, vice president of marketing at DataTorrent. “None of the prospects we work with have gotten anywhere near the headroom we provide with 1.5 billion events per second.”

RTS runs 1,000 times faster than Apache Storm, and about 100x faster than Spark Streaming. (Those numbers come from DataTorrent itself.) If the future of Hadoop is real-time data processing, then the folks at DataTorrent have to feel pretty good about where they are.

“I believe we can go further,” DataTorrent co-founder and CEO Phu Hoang says regarding the throughput figure. How much further do we need to go? How much data do we really need to process? “These things are coming faster than we think.”

Hoang sees a bright future for DataTorrent processing data coming off the Internet of Things (IoT), as well as “classic” big data use cases, such as online advertising analysis. “When you talk to some of these customers, it’s not about the incoming rate of streaming throughput. It’s what they’re thinking about or struggling with. In other words, what kind of processing do they want to do when they bring the event in?”

In the online advertising scenario, batch-oriented MapReduce applications running on Hadoop enabled advertisers to optimize the delivery of ads to people based on a number of factors. Companies crunch the various attributes of clickstream data, including which publisher, which advertisers, the ad size, the format, and location, to come up with an optimal blend. There can be more than two dozen attributes available, but there’s rarely the time to go that deep.

With DataTorrent, companies can perform the same type of processing, but do so in real time, and potentially go much deeper into the data. “Now when we give them this kind of infrastructure, it really allows us to open up and look at what other attributes we want to put in there,” Hoang tells Datanami. “So there’s an exponential explosion in terms of the number of events you have to look at. While nobody is doing it today, it’s only because they’ve been taught that no such thing exists. It can really open up the exploration of the data that they have.”

DataTorrent is one of a promising new class of YARN-enabled applications for the Hadoop 2 paradigm. The company took care to shield developers from the underlying plumbing going, with the goal of letting them focus on the business logic. The idea is let developers dip into the library of 400 operators to string together event-processing systems, using Java they’re familiar with.

Relying on YARN for resource management enables RTS to run on the same Hadoop cluster as batch-oriented MapReduce applications. While real-time streaming is sometimes seen as a replacement for batch applications, they can co-exist peacefully.

“We find we’re quite complementary to batch processing,” Hoang says. “When there’ a lot of I/O going on, we’re actually using the cluster for the memory and processing. You can have these real-time streaming applications that run concurrently with other batch doing a lot of I/O application within a single cluster.”

DataTorrent owes a lot to the MapReduce coders who came before it, as they popularized the concepts around parallel programing. “That in a way is something that Hadoop already included, the notion of distributed computing,” Hoang says. “So I think we got to stand on the shoulder of Hadoop…We think it’s a very smooth transition going for someone with familiarity with a distributed computing concept like MapReduce to us.”

Crossing the Big Data Stream with DataTorrent

Rethinking Real-Time Hadoop

Applications: Complex Event Processing, Predictive Analytics

Technologies: Middleware, Network

Sectors: Financial Services, Retail

Vendors: DataTorrent

Tags: event-processing, Hadoop, real-time analytics

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

DataTorrent RTS Clocks In at 1.5B Events per Second

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

DataTorrent RTS Clocks In at 1.5B Events per Second

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link