August 20, 2013

Data Hoarders In Need of Quality Treatment

Alex Woodie

The term “big data” has rocketed to popularity in the last 12 months, and for good reason: organizations are struggling like never before to deal with, and benefit from, the massive influxes and availability of data. And while quantity is certainly one aspect of the explosion of data phenomenon, it may lead to an unhealthy fixation on size that overlooks the singular most crucial aspect: quality.

In fact, focusing on the quantity of data at hand may actually lead an organization down the path toward degrading the quality of information and their level of decision-making, argues Matt Asay, vice president of business development at 10Gen, the company behind the NoSQL database MongoDB.

“The point is not to see who can dump data into Hadoop, treating it like some digital landfill,” Asay writes in a Wired blog. “If anything, hoarding data simply increases the noise to signal ratio in an organization, making it even harder to determine the best course of action.”

As proof of his argument, Asay points to large corporations, which have been dealing with big data sets for so many decades that size is no longer the major concern. No, the biggest data-related headache for these multi-nationals is finding out efficient ways to integrate all the disparate data sources into a cohesive whole.

“Just because we can store vast quantities of data doesn’t mean that we’ll derive any benefit from it,” Asay writes, citing a NewVantage survey of CIOs that found the size of data is the primary driver of big data projects at 28 percent of enterprises. By comparison, 64 percent of enterprises say their big data projects are driven by a desire to ingest disparate data sources and makes sense of them in real time.

The life insurance company MetLife faced a similar problem, as Asay explains. The company desired the “360-degree view” of its customers, as many companies do. But with more than 70 data sources to feed into this customer view, the technological limitations of the relational database management system (RDBMs) model began to show.

MetLife’s solution, it turns out, was MongDB. According to Asay, it took MetLife just two weeks to create a common schema across all of the disparate data sources using this NoSQL-based system, and just three more months to take it into production.

“So let’s get real about Big Data,” Asay concludes. “What enterprises really care about is putting data to use, and that requires the ability to ingest diverse sets of structured, semi-structured and unstructured data and then put it to use in real time. The right tools for these jobs are Hadoop and NoSQL databases like MongoDB, two of the hottest job skills in the industry, and less RDBMS and proprietary data warehousing technology.”

The Three T’s of Hadoop: An Enterprise Big Data Pattern

Facebook Advances Giraph With Major Code Injection

Applications: Data Mining

Technologies: Storage

Sectors: Financial Services, Healthcare, Other, Retail

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Data Hoarders In Need of Quality Treatment

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Data Hoarders In Need of Quality Treatment

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 10, 2024

May 9, 2024

May 8, 2024

May 7, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link