Follow Datanami:
April 15, 2014

How Fast Data is Driving Analytics on the IoT Superhighway

Alex Woodie

The promise of big data is morphing into the fast data opportunity. Unless you have the capability to respond to the Internet of Things and the trillions of data points generated by smartphones, sensors, and social media, the business opportunities of fast data can pass you by.

For many commercial analytic applications, fast data is the inevitable endpoint of any big data project. Once your data scientists reach that “aha” moment of insight by carefully sifting through their big (but static) data sets, your business pros will say “Great, so how do we make money off this?” That’s where fast dynamic data comes into play.

TIBCO made its name in the IT business with its information bus, which provides high-speed and low-latency connectivity among disparate enterprise systems, such as stock markets and trading applications. Now the company that popularized the phrase “two second advantage” is bringing that concept to the Internet of Things (IoT) and fast data.

Last week, the company announced that BusinessWorks, its flagship data integration platform, has been bolstered with improved REST support that will enable customers to pull data from the APIs of smartphones, sensors, and other data-generating devices that make up the IoT.

“The first requirement of fast data is getting access to this data,” says TIBCO senior director of marketing Thomas Been. “Now we allow them to capture everything outside of their firewall. It can be social networks or anything that has an API.”

For example, a retailer could use BusinessWorks to capture geographic data from consumer’s smartphones and use that as the basis for a real-time offer generation system. “By looking at your profile, looking at the patterns of the insight you pull from big data, I will send you an offer on your preferred brand of jeans to get you into my shop,” Been says. “And then, I know, based on the information I have, that you will be spending money.”

When it comes to mining social media for analytical insights, speed is definitely of the essence. Yesterday Datanami covered a company called Blab and how it’s pulling signals from social media to help ad buyers and PR companies predict what topics will go viral, and which ones will go dead.

Another company that’s plying the IoT waters is Ugam, a developer of analytic applications. The Frisco, Texas-based company is seeing particular traction in the area of leveraging free consumer data emanating from the social media networks for the purpose of helping retailers decide what to sell and where to place it on the shelves. But beware of which social media networks you choose to monitor.

“Basically, Twitter is a bit ‘noisy’ when it comes to getting customer feedback for pricing and assortment decisions,” says Ugam chief innovation officer Mihir Kittur. “It’s too cluttered with complaints and generally non-relevant information. Instead, Ugam has found that the combination of product reviews, Google+s, Facebook likes, and Pinterest pins provide much better social signals for pricing and assortment intelligence.”

Retail’s rapid pace makes it a good place to test fast data theories to see if they’re profitable. But when it comes to actually helping people, nothing beats the nation’s biggest industry: healthcare. The folks at TIBCO are aiming to build fast data applications in hospital settings that find patterns in vast amounts of data pulled in from digital medical devices.

“We have customers who are looking at integrating medical devices in real time so we can identify diseases earlier and can propose the right cure to the patient earlier,” TIBCO’s Been says. “They do the big data thing to understand the patterns and how the diseases are spreading, and then using real time data to look for the symptoms.”

While Hadoop has become synonymous with big data, it’s not seen perceived favorably when it comes to fast data. TIBCO, for one, isn’t a huge fan of Hadoop. You will recall how the company’s CTO Matt Quinn pleaded with people to stop chasing yellow elephants at the company’s annual user conference last year.

Hadoop has comes under fire for its perceived lack of interactivity and real-time capabilities. But there are several initiatives to add real-time capabilities to Hadoop, if not remake it into a fast data platform. Two of the most prominent include Apache Spark and Apache Storm.

Spark is gaining a tremendous amount of momentum as a time replacement for MapReduce, which up to this point has been the analytical brains behind the Hadoop data platform. Spark is not only easier to code (supporting not only Java but also Python and Scala); it’s also faster, and comes with pre-built hooks for SQL (Shark), real-time streaming (Spark Streaming), machine learning (MLLib), and graph processing (GraphX).

One Hadoop software vendor that’s adapting to the realities of big fast data is MapR Technologies, which recently announced that it’s partnered with Databricks to bring the in-memory Apache Spark technology to its Hadoop product fold. MapR’s competitor Cloudera is also distributing Spark; Hortonworks supports it as a technology preview, with full support expected later this year.

Storm is also gaining followers as the real time needs of big data morph into fast data. Like Spark, Storm gives the user the option of programming in a variety of languages, including Ruby, Python, JavaScript, Perl, and PHP.

One company that’s using Storm in production is LivePerson, the provider of Web-based communications software. In a recent video, Ido Shilon, team lead at platform engineering group at LivePerson, explains how the company rebuilt its back-end infrastructure to make its offerings more resilient.

The core elements of LivePerson’s real-time system are made up of Storm, Apache Kafka, and the Couchbase NoSQL database. As part of its information process initiative, the company collects information about every session, such as what websites users come from, what browser they’re using, and what pages they’ve accessed. This information is streamed via Kafka to Storm for analysis, and then stored in documents in the Couchbase database. Eventually, these three products will form the hub of its “wisdom repository,” where it will be able to analyze this information, Shilon says.

The pieces of the fast data puzzle are still coming into view. The Internet of Things promises to flood us with more machine generated data that we could ever dream. Making something useful out of all information will be neither easy nor intuitive. But its very existence will demand action and fuel data-driven competition among companies for years to come.

Related Items:

A Prediction Machine Powered by Big Social Data

Rudin: Big Data is More Than Hadoop

Please Stop Chasing Yellow Elephants, TIBCO CTO Pleads