Follow Datanami:
August 20, 2013

Data Hoarders In Need of Quality Treatment

Alex Woodie

The term “big data” has rocketed to popularity in the last 12 months, and for good reason: organizations are struggling like never before to deal with, and benefit from, the massive influxes and availability of data. And while quantity is certainly one aspect of the explosion of data phenomenon, it may lead to an unhealthy fixation on size that overlooks the singular most crucial aspect: quality.

In fact, focusing on the quantity of data at hand may actually lead an organization down the path toward degrading the quality of information and their level of decision-making, argues Matt Asay, vice president of business development at 10Gen, the company behind the NoSQL database MongoDB.

“The point is not to see who can dump data into Hadoop, treating it like some digital landfill,” Asay writes in a Wired blog. “If anything, hoarding data simply increases the noise to signal ratio in an organization, making it even harder to determine the best course of action.”

As proof of his argument, Asay points to large corporations, which have been dealing with big data sets for so many decades that size is no longer the major concern. No, the biggest data-related headache for these multi-nationals is finding out efficient ways to integrate all the disparate data sources into a cohesive whole.

“Just because we can store vast quantities of data doesn’t mean that we’ll derive any benefit from it,” Asay writes, citing a NewVantage survey of CIOs that found the size of data is the primary driver of big data projects at 28 percent of enterprises. By comparison, 64 percent of enterprises say their big data projects are driven by a desire to ingest disparate data sources and makes sense of them in real time.

The life insurance company MetLife faced a similar problem, as Asay explains. The company desired the “360-degree view” of its customers, as many companies do. But with more than 70 data sources to feed into this customer view, the technological limitations of the relational database management system (RDBMs) model began to show.

MetLife’s solution, it turns out, was MongDB. According to Asay, it took MetLife just two weeks to create a common schema across all of the disparate data sources using this NoSQL-based system, and just three more months to take it into production.

“So let’s get real about Big Data,” Asay concludes. “What enterprises really care about is putting data to use, and that requires the ability to ingest diverse sets of structured, semi-structured and unstructured data and then put it to use in real time. The right tools for these jobs are Hadoop and NoSQL databases like MongoDB, two of the hottest job skills in the industry, and less RDBMS and proprietary data warehousing technology.”

Related items:

Big Data Garbage In, Even Bigger Garbage Out 


The Three T’s of Hadoop: An Enterprise Big Data Pattern 

Facebook Advances Giraph With Major Code Injection