Follow Datanami:
April 16, 2021

8 Key Considerations for Embarking on a Data Integrity Journey

Dave Langton

(Billion Photos/Shutterstock)

Modern enterprises are reliant on data, and as the volume of it increases, making that data useful is absolutely critical. However, more data also means the likelihood of incomplete, inconsistent data sets is on the rise. Bad data, according to a 2019 estimate from Gartner, can cost businesses an average of $9.7 to $14.2 million a year.

As data professionals work to prepare data for analysis, maintaining data integrity is absolutely critical and has increasingly become an area of focus for modern data teams. Data integrity — or the completeness, accuracy, consistency and compliance of data within systems — is an aspirational state data teams aim to achieve, and also covers the processes that are used to achieve it. It is inclusive of several areas, from the physical integrity of the data (is it stored safely?), logical integrity (is it accurate, complete and correct?) and compliance (does it meet necessary standards, such as GDPR?).

When data integrity is achieved, data teams are ultimately ensuring better performance, reliability and access for their organization. As teams embark on data integrity initiatives, it’s imperative to avoid four common pitfalls:

  • Missing, or just plain wrong, data. Taking on more data can make incomplete or inaccurate records harder to spot. Joining data from multiple disparate systems that were captured at different points in time can leave blank spots or inaccuracies that become buried deeper and deeper into the growing data pool. Integrity requires not only being correct, but being able to withstand the demands on your data down the line.

    (Emiel de Lange/Shutterstock)

  • Overlapping and outdated systems. Are customer phone numbers all formatted in the same way in the database? Are different groups within your organization working with the same datasets? Consistency is another tenet of data integrity, most often compromised by overlapping and outdated systems. Inconsistent data inhibits the broader effort of data quality by creating duplicate records, data that is invalid for certain criteria, and may not be accessible at any time.
  • Losing the trail. Even more costly than errors in your data is the complications brought on by trying to track those mistakes down and resolve them weeks, months, or years down the road. Not having reliable audit trails for your data means uncertainty about who made changes and when. Some establish audit trails without reviewing them, rendering them less effective.
  • Who is accountable? A key to any organization’s success, accountability is especially important in managing data. Lack of accountability means uncertainty about who is ultimately responsible for the integrity of your data. Without uniform standards, entering and working with data can create inconsistencies throughout the data system.

Once teams are aware of the areas to watch when it comes to maintaining data integrity, implementing a plan to achieve and maintain data integrity is critical. Because data touches every aspect of the organization — and data teams are under pressure to manage and deliver it properly — establishing a comprehensive plan to keep data clean is essential. There are four key pillars to a data integrity plan that modern data teams should adopt:

  • Invest in integration. As with many long-term investments, the time and resources required to integrate data now can pale in comparison to the money and manpower it can save as datasets grow. Solutions such as data preparation and ETL applications can improve consistency by not only organizing data, but cleansing it in the process. The ETL process detects and removes data inconsistencies, a critical step as data volumes increase and data types vary more widely.

    (Profit_Image/Shutterstock)

  • Train and appoint a steward. Regular training sessions with employees can minimize errors at the point of entry. These are opportunities for an organization to establish a system of accountability and a clear rubric for managing data. Give employees a place to turn by appointing a ‘data steward’ to oversee a specific set of data – or the organization’s data system as a whole.
  • Audit and validate. These stewards can also be responsible for monitoring audit trails and take quick corrective action as soon as possible. Audit trails reveal what changes have been made, and by whom, tracking alterations down to the date they were made. This ensures that inaccurate or incomplete data is not only identified, but tracked to its source. Through this process, stewards can also confidently validate the data being relied upon to guide the organization’s future.
  • Test and test again. As we’ve learned, audit trails do little good when they’re not being reviewed on a regular basis. Avoid guessing at data accuracy by creating a regular testing system that augments a strong validation process, including ensuring data isn’t being entered into conflicting field types. Just like going to the doctor, finding the problem is often the only way to solve it.

Building for the future often involves identifying and addressing issues before they become major problems. Though the volume and complexity of data will invariably grow, the value of an organization’s data can’t grow with it unless data integrity is ensured. If data represents a new form of doing business, then success in that new climate relies on equipping users with the tools needed to meet the challenges of a changing world – and give your organization a place in it.

About the author: Dave Langton is the vice president of product at Matillion, a leading data integration platform. He is a seasoned software professional with over 20 years of experience creating award-winning technology and products. Prior to his role at Matillion, he worked as a data warehouse manager and contractor in the financial industry.

Related Items:

Cloud Data Warehousing: Understanding Your Options

Can We Stop Doing ETL Yet?

The Real-Time Future of ETL

Datanami