Follow Datanami:
May 3, 2013

Operationalizing Big Data and Making it Small

Isaac Lopez

With data growing exponentially, most organizations aren’t getting full value from their big data implementations, said Radhika Subramanian at the IDC HPC User Forum this week in Tucson, Arizona, where she explained that automation is the path toward operationalizing big data.

Radhika Subramanian, CEO of Emcien

Radhika Subramanian, CEO for Emcien, addressed the audience in Tucson about the challenges of big data. Data is growing and the more it grows, the less operational it becomes in most big data paradigms, where Subramanian claims that less than 1% is being analyzed (Source: IDC).

Enter Emcien, whose pattern recognition technology dives into big data to create a graph data model that reveals connections and patterns in the data that can be further analyzed to surface insights about the connections. The process works like an automated machine, says Subramanian, which takes the dirt (data), and automatically gets the gold out of it.

Emcien’s pattern detection platform works with both structured and unstructured data, says Subramanian, who noted in her discussion that the real leaders of data are those that are able to deliver results across all data types.

In discussing a use case, she noted that their platform is being used by the Gangs and Guns Squad of the Atlanta Police Department, where 12 months ago two gangs in the area merged. Using the various data streams available to the police department, they were able to graph together a network that gave context into how each point of contact related to the others.

As conversation patterns emerged in the data, a word surfaced that the detectives had not seen before: “famerica.” After investigating the data further, they noted that the connections using this new word were from users on the graph that were associated with rival gangs. Subramanian says that using these techniques, within 48 hours the detectives were able uncover a gang merger – an even that changed the operational reality in the field.

You don’t get this kind of intelligence from big data approaches that rely on search engines for operational, day to day intelligence, says Subramanian. “Any process that puts the burden on a human being is flawed – it’s not going to work. It has to be done through automation. You’re better of getting a B minus automation solution than getting one really good, what they call at this point, data scientist, because that guy doesn’t exist.”

Subramanian explained that their solution puts the data stack at the bottom – both structured and unstructured. The data then goes through the infrastructure stack for storage, management and processing, and after that is their algorithmic layer, which is the layer that builds the graphs and teases out patterns in the data.

“Once the graph is built, this is where the magnet really comes in: the algorithms.” She explained that as the graphs are built, sections emerge that look like they could be broken out like individual constellations. “Those could actually be market segments, and there’s not a search query in a database that will give that to you.”

The graphs can reveal several things, including how certain data points are connected, but also detecting substitute nodes. “It’s like doppelgangers,” she explained. “Two people who are exactly the same pretending to be something different. In banking, that’s ID theft, fraud, etc. In language, it’s synonyms. In marketing, it’s substitutes. So you can see it’s the same algorithms, but as you look across verticals, you solve a different set of problems.”

The technology, she says, can be used for a number of applications, including machine data for things like network security where she says their algorithms are useful for finding the so-called “needle in a haystack” threat in the system that doesn’t want to get noticed. “The pattern is four, five, or six things that come together and happen over and over again, and like any good virus, it’s trying to stay inside and not make you sick. That’s what’s really so hard with detecting this.” With pattern detection, says Subramanian, they’re able to identify these threats as patterns early on.

“When your data comes in, if there are ten things that you need to know about – maybe I’ll give you twenty, but I’ll give you those ten things.”

In that way, she says, organizations are able to take big data and make it small.

Related Items:

MapR Revs HBase with M7; Plots Search Integration 

Cloudera Releases Impala Into the Wild 

Stanford Receives DARPA Grant to Study Big Data 

Datanami