What’s Driving the Rise of Real-Time Analytics
Hadoop gave us a taste of the types of insights big data analytics can deliver, and provided a competitive advantage to early adopters. But as data volumes grow, as the time-to-insight window shrinks, and as technology continues to improve, companies who wish to remain competitive will have no choice but to adopt real-time analytics.
We’re on the cusp of a period of rapid growth in real-time analytics, according to industry experts. While real-time analytic systems and complex event processing (CEP) systems are nothing new –big telcos and retailers have been using it for years to detect credit card fraud, for example – the rapid maturation of big data technologies is quickly making real-time analytics a reality for the rest of us.
Big Data 2.0
If Hadoop represented big data analytics 1.0, then real-time analytics could usher in the era of big data analytics 2.0, according to Anand Venugopal, the head of product development for real-time analytics software vendor Impetus Technologies.
“The Hadoop revolution is making the whole pathway for real time streaming, especially on open source technologies, much faster,” Venugopal tells Datanami. “The technology-aware people inside of enterprises are saying streaming analytics may be the next big thing.”
Impetus Technologies is one of a handful of companies building real-time analytics atop open source technologies–in this case, Apache Storm, Apache Kafka, and RabbitMQ. While many of its customers in banking, healthcare, and telecommunications are using Hadoop or Cassandra as big data storage repositories, they’re increasingly looking to get value out of the data before it gets there.
That’s where its product, called StreamAnalytix, comes in. “The overall motivation is to [accelerate] the process of sensing an event, analyzing an event, and acting on an event,” Venugopal says. “Today the cycle can be days and weeks. They want to shrink that time window… down to seconds or milliseconds.”
Real-Time Use Cases
Impetus Technologies shipped its StreamAnalytix product in February and today a handful of clients in production, which tend to be bigger enterprises. Last month the company announced a free version of StreamAnalytix, which it hopes will plant a seed for wider adoption in the future.
Venugopal sees two main use cases for real-time streaming analytics, including customer enhancement and general operational modeling. “We’re seeing fraud analytics on point of sale [POS] transaction. We’re seeing clickstream or Web analytics. We’re seeing call center analytics,” he says. “What was available to very niche companies only with a large spend…now that technology is being democratized and put into the hands of pretty much everybody.”
The traditional Hadoop paradigm can’t meet the emerging big data needs of enterprises, especially when it comes to the Internet of Things (IoT), according to Steve Wilkes, the founder and CTO of WebAction, which this week released its new real-time streaming analytics platform, dubbed Striim (pronounced “stream”).
“The proliferation of devices is creating a tsunami of data,” Wilkes says, evoking the name of this very publication. “But there’s a big difference
between data and information. Being able to extract information from data as it’s being generated, and storing the information rather than the raw data, can be very useful in a lot of cases.”
While many organizations are building data lakes atop Hadoop so they can store the mass of unstructured and semi-structured data that they previously discarded, the Hadoop promise is not panning out in some cases, Wilkes says.
“The CIOs we’ve been talking to say ‘We put data into a Hadoop lake and it’s really hard, if not impossible, to get information out of it later,'” Wilkes says. “Part of that problem is you’re just storing the raw data in Hadoop…But if you can pre-process it, filter it, aggregate it and enrich it [as it lands in the enterprise], then it makes it easier to run queries later.”
Just as companies realized decades ago that trying to do reporting on normalized data as it sits in a production OLTP database wasn’t a good idea, early Hadoop adopters are running into problems with how data is landed and curated in the lake.
For example, Wilkes says, say a company wants to see which customers are the most active, and the source data is sorted by customer ID or the serial number of a smartphone. “You’re going to have a horrible join there,” he says. “But if you can do that preprocessing and enrichment on the fly, and denormalize the data somewhat before you land it, that query becomes much simpler.”
Finding Business Value
Phu Hoang, who built Hadoop-based systems at Yahoo before co-founding DataTorrent several years ago, sees real-time streaming analytics building atop Hadoop and existing within the Hadoop ecosystem. While Hadoop provided a step up from BI approaches that came before it, people are now looking beyond what basic Hadoop can provide, and in many cases that means real-time analytics.
“Hadoop did come and disrupt the traditional analytics,” Hoang says. “But once they were there, they want to get those insights sooner and sooner, and that’s where the MapReduce paradigm and batch paradigm started to break down.”
DataTorrent recently open sourced its core engine, which is incubating at the Apache Software Foundation under the name Apache Apex. For Hoang, the question isn’t whether the technology is ready (he insists Apex, which can process 1.5 billion events per second, is more enterprise-ready than Apache Spark at the moment). Rather the question is whether businesses can successfully harness the information it gives them.
“People who are seeing demonstrations of it, proofs of concepts, are starting to really face that incredible opportunity of ‘Oh my goodness, the data I’ve been computing, which I was getting once day or every eight hours, I literally can have in minutes or seconds,'” Hoang says. “There’s a spectrum of enterprises, in terms of maturity and knowing what to do with that.”
It’s hard enough to quantify the value of “traditional” big data analytics. But when we’re giving the power to generate insights in real time, it opens up a whole new world of possibilities. Hoang says it will take time for businesses to fully internalize, understand, and quantify the value of faster insight.
“There are lots of companies that are saying ‘That’s certainly good enough for us,'” he says. “But it’s my prediction that all companies will move to faster insight and a majority of them will get to real-time streaming within the next three to five years.”
It’s tough to conceptualize all the possible use cases that open up when so much information is available at our the finger tips. WebAction’s Wilkes credits the consumerization of technology—and the inverse of that, the “technologization” of consumers–with changing all that.
“The question is, if I can get instant message from Facebook and I can analyze what’s happening on my Twitter feed and I can do all these things on my smart devise, then why can’t I do that with enterprise data?” he asks. “Why do I have to wait until the end of day for a report from my data warehouse? Or why does it take weeks to run my queries in Hadoop, as opposed to getting that data right away?”
As the pace of data generation and the value of analytics accelerate, it’s clear that each organization with a big data analytics strategy will be asking itself those very questions.