Follow Datanami:
September 4, 2012

TIBCO Shines Light on Spotfire Concepts

Ian Armas Foster

As organizations grow accustomed to taking in big data, the demand for processing and analyzing it quickly rises. Unfortunately, traditional methods of hosting big data, such as disks and tapes, do not offer much in way of speed since the time costs in retrieving the memory from the spinning disks are significant.

However, hosting data in the RAM, otherwise known as in-memory, reportedly decreases processing time and increases enterprises’ satisfaction with their data. Nathaniel Rowe of the Aberdeen Group and Spotfire’s Senior Director of Analyics, Dr. Michael O’Connell, discussed these in-memory developments in a webinar hosted by TIBCO Spotfire Analytics.

Rowe defined in-memory simply as “moving the data as close to the processor as possible, usually serving it directly on the RAM in the server.” Placing the data directly in the RAM eliminates the tedious activating and transmitting of data required when retrieving it from a warehouse. Of course, the data cannot stay in the RAM for long, making in-memory unfeasible for long-term storage, but that is beside the point.

As Rowe noted, in a survey of 196 companies with a big data initiative (defined as having at least five terabytes of active business data), 53% feel they do not receive their information fast enough. Some of that dissatisfaction is a result of ridiculous expectations. According to Rowe, some users wanted to process hundreds of terabytes to petabytes in only minutes or even in real time. Rowe noted that rate was slightly unrealistic at this time. Nothing short of a reality check, which in-memory technology does not offer, can cure unrealistic expectations.

What in-memory does offer is a quicker query response time. According to the survey, those with in-memory had their queries answered in 42 seconds (on average) while those without had to wait 75 minutes. That means those with in-memory get their responses 107 times faster on average. Also, in-memory businesses were able to analyze over a petabyte in just one hour while other businesses could only manage about 3.2 terabytes per hour, a factor of 375 less.

Further, of the 196 businesses, 47% felt their data was underutilized, speaking primarily to unstructured data. “Twitter has a lot of value,” Rowe said. Indeed, Twitter can be a remarkable resource for not only guiding marketing campaigns but also for heading off PR disasters. Rowe gave Progressive Insurance as an example, who recently got caught up in what Rowe called a fiasco that stemmed from a single blog post. This single blog post got picked up by bigger blogs and pretty soon went viral. At one point, 53% of all tweets related to Progressive were negative.

Ideally, an analytics platform could monitor the Twitterverse and alert Progressive when a problem is stewing. As it stands now, it takes some random PR person at Progressive to look through Twitter and say “Oh crap, everyone is badmouthing Progressive!” for the company to take action.

According to the survey, 59% of companies with in-memory technology are satisfied with the quality and relevance of the data analyzed (here, relevant pertains to the Twitter information Progressive could have used) versus only 42% of companies without in-memory. Sure, 59% is nowhere near perfect, but it’s a step up at least.

Automation is important when wants to analyze that Twitter information. Otherwise, one relies on people constantly checking the social media sites and hopes someone catches something. In-memory has a huge automation advantage, where 67% of businesses with in-memory report satisfaction with their data systems’ automation in indexing versus only 18% of those without.

While it should be noted that there may be a small sample size at work here (only 33 of 196 respondents use in-memory), the differences are large enough to be considered statistically significant.

Statistics are fun and can tell a lot of stories, but they are useless without a sense of how they are generated. This is where O’Connell came in.

O’Connell showcased his company’s in-memory enhanced analytics platform in impressive fashion, actually demonstrating test cases in various industries.

In one instance, O’Connell showed and explained the analytics behind a credit card company targeting a certain demographic for a promotion. The first step was to gather all of the information regarding what O’Connell calls the “explanatory variables,” which here include ATM transactions, age, profession, monthly balance among others. Putting together all those variables seems simple enough, except that transactions and monthly balance are relational while profession is not.

O’Connell next simply compiled the information, and out spit an array of graphs, where O’Connell reveals a certain population (he did not reveal the specifics of the population) would jump at the credit card opportunity at a rate of 75% versus the normal rate of 5-10%. The credit company could obviously take that information and market accordingly to that demographic, whatever it was.

The most important part was that it all happened instantly. O’Connell clicked the button and the software immediately produced his graphs. Granted, these were test cases with perhaps a limited amount of data run on his desktop. But it is the mark of a good computer system to make something incredibly difficult look just as simple.

Again, in-memory is not yet perfect. It is not the answer to every big data problem that exists. But judging by the responses to Rowe’s survey and O’Connell’s program, it continues to show a significant step forward in analytics.