September 11, 2015

Unstructured Data Analytics Shouldn’t Be Such a Mess

Kon Leong

The problem with searching for a needle in a haystack is that the process, by nature, is inefficient. So why has it become a popular analogy for analytics efforts within the enterprise? Because today’s analytics attempts – particularly for unstructured human data – are typically a mess.

Today’s analytics are often ad hoc and rely on incomplete or skewed sample sets. They commonly focus on only one narrowly-defined data type. So for each glimmering “needle” of insight, it seems that heaps of data are scattered and cast aside, often at the cost of subsequent business efficiency.

When it comes to business content such as files, email, social media, IMs, calendar entries, images, and more, firms are failing to extract meaningful insight despite the potential wealth of information contained within.

There’s a reason for this: Because the data is poorly managed to begin with. Businesses are treating analytics as a separate business function from data governance, when it’s actually fundamentally dependent on it. Analysis occurs downstream; so by neglecting initial information governance infrastructure and practices, the enterprise is essentially sampling tiny random buckets of data from a whitewater river of information.

Many firms struggle to manage or even understand what sort of unstructured content they even have, let alone begin to effectively manage it. History is partially to blame, to be sure; most attempts at managing unstructured content were hastily prompted by waves of regulatory and legal reform that demanded immediate action. A reactive response was triggered, and many of those initial “band-aid” information management fixes remain in place today. Simply scratching beneath the surface often reveals a tangled mess of siloed

Don't hate the hay -- hate the multiple stacks of hay

Don’t hate the hay — hate having multiple stacks of hay

data platforms such as enterprise content management systems (ECMs), legacy systems, duplicated copies, and even entire missing categories of data.

It’s no wonder that businesses are having trouble getting analytics value from this data.

For unstructured data, it makes sense; the technology required to process large volumes of diverse human-generated content is nascent compared to traditional BI and ERP systems that mine more mature and structured forms of data. Add to that the general state of mismanagement of most unstructured content, and you have an environment in which data never reaches its potential … or worse, results in erroneous business decisions.

We don’t necessarily need to get rid of the extra data “hay” that these stacks are composed of; that would entail getting rid of potentially valuable content. We just need to completely re-think how the data itself is managed. No more data “haystacks” means no more disparate data sources, no more ad hoc sampling attempts, and no more dirty or duplicated data. Furthermore, it vastly reduces the compliance risk associated with data mismanagement; with consolidated control, policies for management and eventual disposal can be implemented centrally and securely.

The lesson here is that data governance is the necessary foundation of all successful data analysis. The statistics axiom of “garbage in, garbage out” is used ad nauseam for a reason: because it’s accurate and timelessly relevant.

So at risk of abandoning our original metaphor, we’re trying to build a data lake, pooling all available resources into a single environment where they can be managed and analyzed in real time. Forward-leaning businesses are already making strides to achieve this, and they’re not doing it with flashy analytics tools – those can come later, once the foundation is built. In a cohesive governance environment, analytics can be brought TO the data, rather than data cumbersomely being sampled and brought TO the stand-alone tools.

If large organizations want better analytics, they need to start with better information governance practices. After all, they probably should have been better all along.

About the author: Kon Leong is CEO and Founder of ZL Technologies. For two decades, he has been immersed in large-scale information technologies to solve big data issues for enterprises. His focus for the last 14-plus years has been on massively scalable archiving technology to solve records management and eDiscovery challenges for the government and private sectors. He speaks frequently at records management and eDiscovery conferences on cutting edge trends and solutions. A serial entrepreneur, Mr. Leong earned a BS degree from Loyola (Concordia U) and an MBA from Wharton (U of Penn).

Related Items:

Beware the Dangers of Dark Data

Pulling Insights from Unstructured Data – Nine Key Steps

Applications: Data Mining, Enterprise Analytics

Technologies: Middleware

Sectors: Financial Services, Retail

Vendors: ZL Technologies

Tags: analytics, big data, haystacks, unstructured data

Unstructured Data Analytics Shouldn’t Be Such a Mess

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

July 3, 2025

July 2, 2025

July 1, 2025

June 30, 2025

June 27, 2025

Sponsored Partner Content

AI That Knows Your Business: Meet Cube D3

Mainframe data: A powerful source for AI insights

CData recognized in the 2024 Gartner ® Magic Quadrant™ Report

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Transforming Healthcare with Data

IDC Spotlight: Boosting AI Impact with Data Products

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Unstructured Data Analytics Shouldn’t Be Such a Mess

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

July 3, 2025

July 2, 2025

July 1, 2025

June 30, 2025

June 27, 2025

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Share

Copy short link