Follow Datanami:
December 5, 2014

Embedding Data Quality and Stewardship Into Your Information Management Processes

Jean-Michel Franco

Although established as a best practice in information management, data quality is a relatively young discipline. Unsurprisingly, its roots began in situations where the data needing to be managed was very heterogeneous.

Take the case of business intelligence and data warehousing: the goal is to derive new perspectives and insights by sourcing and integrating information from multiple systems. This is often a more painful exercise than initially expected because the business is attempting to use data for an entirely different purpose than initially intended. In this situation, data quality comes to the rescue by applying new rules and controls to the existing content.

Apart from business intelligence, other typical data quality projects include migrations to a new system, mergers and acquisitions, multi-channel CRM, or any initiatives mandating the need to share standardized information with third parties, such as a regulatory agencies or business partners.

Regardless of whether or not such data quality projects are fully successful, they should always be considered as a step forward rather than the end of the journey. As long as data quality is managed on a project by project basis, it will focus mainly on fixing errors after they occur. The approach to data quality needs to be improved so that it is preventive as well as corrective, less resource-intensive, and a best practice, rather than a point solution applied only when the pain has become unmanageable.

Data profiling, or allowing users to discover the data before a project begins, is a small initial step forward. It won’t avoid the pain, but it will provide an easier way to detect the origin of data so that businesses can plan efforts accordingly and measure progress while curating the pain.

The next step is data stewardship. This is about onboarding people – ideally the lines of business that deal with the data at an operational level – into the data quality process and turning it into a collaborative effort. Suppose you are shipping an order to a customer, but you have the wrong address listed. In this instance, it is unlikely that your data quality tool or your order management system will help you to fix the issue. What will help are well-defined processes for data governance and organization of data stewardship at the operational level. In addition, data stewards would work more efficiently if backed by a master data management platform with embedded data quality capabilities such as standardization or record matching and deduplication.dirty data

Through these dedicated organizations, processes, and tools, data becomes a sanctioned asset that can be used as a shared service across systems, activities, and information lifecycles. This represents a massive move along the maturity curve. Although sometimes challenging to achieve, this is also a major step forward in aligning IT and internal lines of business because it redefines the relationship between information producers and consumers.

Now that data quality is managed in a repetitive way, does this mean that we’ve finished the journey? Unlikely. The fact remains that the later data quality problems are detected, the greater the effort and cost it takes to correct them. This is where data quality practices may have to reinvent themselves, given that in many cases they have been designed as separate, standalone processes. Data quality controls have to become ubiquitous in order to move up in the information supply chain to the point of entry. Organizations need to be able to manage data in motion in real-time, rather collecting and storing data in one location before it can be proofed.

Once data quality gets these capabilities, it can be, for instance, embedded into a Web or mobile application that can deal with customer identity. Or, it can be included in the connector that brings a lead from the marketing team to the sales force management applications across the cloud. This then becomes a gatekeeper to ensure that controls are applied before the data even enters into the information systems.

Lastly, there is big data. In this context, data quality not only has to deal with “extreme information management,” but also needs to serve as a new partition – as with big data, the lines between sanctioned and unsanctioned data can become blurred.

This creates a huge opportunity for data quality to turn from a gatekeeping and back-office discipline to a competitive advantage. Take the example of online applications that are applying real-time matching to automatically recognize each customer and recommend a personalized offer to them based on their timeline and purchasing patterns. This typically requires capabilities found in big data quality solutions such as profiling, parsing, standardization, and entity resolution. Data quality provides the building blocks to enrich data and connect it to business context. For example, an IP address from a log file can be turned into location analytics once enriched with geo-localization, or can be matched with a business partner for customer journey analysis.

Big data may even give birth to new roles: people who are discovering new sources of data and turning it into meaningful insights by connecting it to other sources and business context.  Some call them data scientists, but when their role is to take care of the information and to turn it into a shared asset, perhaps data curator would be a better title.

These individuals need tools to shape the data and turn it into something that can be safely distributed. We all know that analytics has been struggling for years with the ongoing challenge of satisfying business users who are complaining that they spend too much time searching data rather than analyzing it. A recent survey from Ventana Research indicates that the most time-consuming tasks in big data integration are linked to reviewing data for quality and consistency (52%) and preparing data for integration (46%). So what if we finally find a way to turn what is perceived as wasted time into collaborative work that would make data easier to share through curation?

Note that these evolutions sound similar to what has been experienced in other areas of quality management. Take General Electric as an example. In 1995, the company announced the Six Sigma initiative as a top priority, in order to get as close as possible to zero defects across business processes. GE then turned this into a cornerstone of company culture and organizations. GE is now moving one step beyond this process-centric approach with what the company refers to as the “industrial internet.” General Electric’s vision is to take advantage of big data and the Internet of Things to predict potential downtimes and fix problems before they even occur, saving hundreds of millions of dollars for GE and its customers.

What is the most important takeaway from such evolutions? When established as a shared service and fully embedded in your Information Management processes, data quality has the potential to move forward from its traditional “back office” capability into a much more proactive practice that can address data quality issues before they even occur. This has the potential to fuel data-driven processes by allowing organizations to set up the standardized “data plugs” that connect the dots between data points and turn this into meaningful insights.JeanMichel


About the author: Jean-Michel Franco is Director of Product Marketing for Talend’s Data Governance solutions. He has dedicated his career to developing and broadening the adoption of innovative technologies in companies. Prior to joining Talend, he started out at EDS (now HP) by creating and developing a business intelligence (BI) practice, joined SAP EMEA as Director of Marketing Solutions in France and North Africa, and then Business & Decision as Innovation Director. He authored four books and regularly publishes articles and presents at events and tradeshows.

Related Items:

How GE Drives Big Machine Optimization in the IoT

Forget the Algorithms and Start Cleaning Your Data

Data Hoarders In Need of Quality Treatment