Follow Datanami:
July 29, 2013

Big Data Garbage In, Even Bigger Garbage Out

Alex Woodie

People are doing some truly amazing things with big data sets and analytic tools. Tools like Hadoop have given us astounding capabilities to drive insights out of huge expanses of loosely structured data. And while the big data breakthroughs are expected to continue, don’t expect any progress to be made against that oldest of computer adages: “garbage in, garbage out.”

In fact, big data may even exacerbate the GIGO problem, according to Andrew Anderson, CEO of Celaton, a UK company that makes software designed to prevent bad data from being introduced into customer’s accounting systems.

“The ideal payoff for accumulating data is rapidly compounding returns,” Anderson writes in an essay on Economia, a publication of a UK accounting association. “By gaining more data on your own business, your clients, and your prospects, the idea is that you can make more informed decisions about your business and theirs based on clear insight. Too often however, these insights are based on invalid data, which can lead to a negative version of this payoff, to the power of ten.”

The problem may compound to the power of 100 if bad data is left to fester. Anderson calls this the “1-10-100 rule.” If a clerk makes a mistake entering data, it costs $1 to fix it immediately. After an hour–when the data has begun propagating across the system–the cost to fix it increases to $10.

Several months later, after the piece of data has become part of the company’s data reality and mailings have gone out to the wrong people and invoices have gone unpaid and new clients have not been contacted about new services, the cost of that single data error balloons to $100.

“Wait longer and the sky’s the limit,” Anderson writes. “Once trust in your data is undermined it becomes a difficult position to recover from.”

To avoid this fate, Anderson recommends that organizations implement a coherent data integration strategy. This involves taking a close look at the sources of data to ensure they’re trustworthy, and implementing a data quality program to ensure that data is cleansed.

“Once you’ve done this,” Anderson writes, “get the full ROI you deserve by partnering with a service that can detect and protect your organisation from the risks and costs of inaccurate and irrelevant data entering your line of business systems.”

Related Items:

Attention to Small Details a Key to Big Data Success 

Rainstor Offers Mastery of Time and Schema 

The Three Spokes of Wisdom in the Wheel of (Big Data) Life 

Datanami