June 22, 2016

Avoid These Five Big Data Governance Mistakes

Alex Woodie

(Tashatuvango/Shutterstock)

If you’re embarking upon a big data project, then you’re likely running into one or more data management challenges. The decisions you make regarding how you enforce data governance and how you control data flows can make or break your project.

Here are five data governance mistakes you should avoid:

1. You Have No Data Governance Strategy

If you said to yourself, “Huh, what’s data governance?” then you’re likely

(Kues/Shutterstock)

making this mistake. Data governance refers to an overarching strategy that defines how organizations ensure the data they use is clean, accurate, usable, and secure.

As your organization embarks upon big data projects, you often solve one or more of these challenges in an ad-hoc manner. That approach may work for a while, but as you get big data successes under your belt and take on more complex projects, the lack of governance can come back to haunt you.

There are several components to a data governance strategy, including: setting up processes that dictate how data is stored and protected; setting up a set of standards and procedures for ensuring how authorized personnel can access and use data; and setting up controls and procedures to ensure the rules are being followed.

Like most things in life and IT, data governance doesn’t work with a “set it and forget it” mentality. Start small with your data governance initiative and then grow it over time to meet the specific needs of your organization.

2. Relying Too Much on Unicorns

Many shops turn to their data scientists (i.e. unicorns) for all matters relating to big data. Like the poor miller

Save your unicorns for data science

who found he could turn straw into gold, corporate bosses expect their unicorns to magically turn raw data into actionable insight.

That approach likely won’t work for long. The truth is, if you’re lucky enough to have landed a unicorn, you’re paying them way too much to ask them to be “data janitors,” let alone be in charge of an entire data governance strategy.

Data governance is best led by a collection of data stakeholders from the IT department, line of business, and compliance. The Data Governance Institute also recommends hiring a Data Governance Officer (DGO).

3. Letting Schemas Run Wild

This mistake is often made in tandem with the implementation of a data lake. The forgiveness of HDFS enables you to throw just about any kind of data, with any kind of schema, into a Hadoop data lake and worry about sorting it out later.

This “schema on read” approach may work for some types of data, especially ones that change often and can’t be pigeonholed into preconceived schemas. But schema on read can only take you so far, and at some point, schemas must be enforced.

Hadoop brings a plethora of data processing engines like Spark, Pig, and good old MapReduce to help you give shape and form to data – that is, to make it usable. The schema-on-read this runs counter to core data governance principals, which require that you know what kind of data you’re storing and processing.

4. Storing Everything Forever

One of the important facets of a good data governance strategy is data

(Graphicworld/Shutterstock)

retirement. At some point, every piece of data must enter that great recycling box in the sky. But all too often, organizations decide they’re never going to throw away another piece of data again.

If you’re organization follows this “keep everything” mandate, good luck. You’ll likely need lots of extra cycles just to keep the rotting trash heaps in order. Consider this statistic from the latest Veritas’ Data Genomics Index 2016 survey, which found that 40 to 60 percent of the data an average organization stores these days is redundant, obsolete, or trivial (ROT).

Organizations spend millions of dollars a year storing data they’ll never use. This is not just a failure of good business sense—it’s a failure of data governance.

5. Not Using Power Tools

So there’s a lot that goes into having an effective data governance strategy. You need the right people in place to implement it, you need a good policy that lays out the priorities and general strategy, and you need good processes that help you implement data governance on a day-to-day basis.

(Andrey Eremin/Shutterstock)

But there’s also a case to be made for getting the right products in play. No one tool will solve every data governance challenge for you. But the big data ecosystem is delivering an increasingly compelling collection of tools that can help automate big chunks of it.

For example, tools such as Apache Atlas (incubating), which is the open source data governance framework that came out of Hortonworks‘ Data Governance Initiative, are helping to enforce data controls in the Hadoop environment. Data quality tools are also helping to solve a particular aspect of the data governance challenge.

At the recent Leverage Big Data ’16 event, Asif Alam, the Global Business Director for the Technology Sector at Thompson Reuters, acknowledged that data governance was a big and growing challenge, but added that tools were making things better. “Problems we’re solving now were impossible to solve three years ago,” Alam said.

Related Items:

The Growing Menace of Data Hoarding

Data Science Operationalization in the Spotlight at Leverage Big Data ’16

Why Self-Service Prep Is a Killer App for Big Data

Applications: Enterprise Analytics

Technologies: Middleware

Sectors: Financial Services, Healthcare, Retail

Tags: big data, Data Governance, data science, Hadoop, operationalization

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Avoid These Five Big Data Governance Mistakes

1. You Have No Data Governance Strategy

2. Relying Too Much on Unicorns

3. Letting Schemas Run Wild

4. Storing Everything Forever

5. Not Using Power Tools

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

April 19, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Avoid These Five Big Data Governance Mistakes

1. You Have No Data Governance Strategy

2. Relying Too Much on Unicorns

3. Letting Schemas Run Wild

4. Storing Everything Forever

5. Not Using Power Tools

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

April 19, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link