Building a Successful Data Governance Strategy
One of the core elements of data analytics that organizations struggle with today is data governance. An organization could do everything right and still wonder why their analytics projects are failing if they haven’t taken the time to build and implement a governance strategy. Here are some (hopefully) helpful tips from experts for building a data governance framework that lasts.
There are many aspects to getting data governance right, including having the right people in place. Large enterprises are increasingly hiring chief data officers (CDOs) to oversee data governance, particularly when regulations like the European Union’s General Data Protection Regulation (GDPR) are involved. While the CDO title is rarer at smaller firms, there are still people with the same responsibility as a CDO – and at the top of the list it’s making sure the data is governed.
Beyond the people, it’s important to consider the core data governance processes that are critical for success. According to Vimal Vel, who’s the vice president and global head of master data solutions at data solutions provider Dun & Bradstreet, one of the first steps in building a successful data governance practice is understanding the business drivers and business outcomes that matter to the organization.
“A lot of time we notice customers dive into data strategy or data governance straight away,” he says. “It’s important to take the time to understand what kinds of business outcomes you’re looking to drive, and what is the culture of your organization, both locally and globally.”
Much of the information that’s generated, used, and managed in a data analytics project is tied directly to those business outcome measurements, Vel says. “So recognizing what your business outcomes are, and then organizing your data strategy around that, is step one,” he says.
A big mistake that organizations often make is trying to implement an enterprise-wide governance project that touches all aspects of their business right off the bat. Yes, an enterprise-wide governance strategy might be the ultimate goal, but big bang, boil-the-ocean type projects rarely succeed in any IT discipline.
To avoid this mistake, Vel recommends that organizations begin slowly by implementing a governance strategy for a single line of business or domain. Once they have found success implementing a data governance strategy in that area, they can expand from there.
“We often see customers biting off more than what they can chew, so our recommendation is to choose something that’s more bounded in scope,” Vel says. “It could be your billing and finance, or it could be your sales function. Start there and then expand and increase the scope as a way to get to that end state, which is enterprise data governance.”
Once a domain has been selected and the outcomes described, the next step in the governance process is to start establishing the master data standards that the organization will rely on to define their data going forward.
“One of the most common challenges I observe from customers is defining what a customer looks like, defining what a supplier looks like,” Vel says. “It’s simple things, like how do you define a customer, how do you define industry categorization, how do you define a company size, because sales channel assignments are often driven by size and industry standards.”
Organizations can invent their own data models, structures, and processes as they they implement a governance program, or they can use standard established by third-party providers, including D&B. “We strongly recommend to customer using a reference standard to establish governance,” Vel says.
Increasingly, larger enterprises are using semantic and graph technology to establish and manage their data definitions and models, Vel says. “The level of definition you can put around a customer, a supplier, or a partner using a semantic model or graph-based semantic model is much more robust than what you can do just using traditional definitions, like listing out name, address etc.” he says. “The biggest difference there is how you capture relationships, which are critical to how you take action around information around that customer or partner.”
Catalogs: A Proxy for Governance
One of the tools that organizations are widely adopting to make sense of their big data sets are catalogs. Data catalogs are hot these days because they provide order and ease-of-access to large collections of data, and vendors are responding by rolling out new data catalogs, practically every month.
“It seems that nowadays, everybody and their dog has a catalog,” quips Stan Christiaens, the CTO and co-founder of data catalog vendor Collibra.
There’s a strong connection between data catalogs sales and the need for governance, although enterprise software buyers may dispute that notion that they’re buying catalogs as part of their data governance strategy (if they even have one).
“When people are looking for a catalog, in a lot of case they’re actually looking for governance-type features,” Christiaens says. “They just for the life of them could not be convinced to actually call it that way, because the word ‘governance’ in has a negative, policing-like connotation in organizations.”
When enterprise buyers go data catalog shopping, they’re looking to get features like business glossaries, data dictionaries, and data lineage management, Christiaens says. “So a lot of the time, when people look for a catalog, they’re inevitably look for governance types features,” he says, “but they don’t want to call it that.”
Catalogs are popular because they work with the different business personas that are involved with creating, managing, accessing, analyzing and (yes) governing data. Christiaens says catalogs like Collibra’s are successful because they can cater to a variety of different needs.
“The personas that are involved in data are quite wide and varied,” he tells Datanami. “A data architect will be talking about enterprise data models and logical models, whereas a business person will say ‘All I care about are KPIs and report and metrics.'”
Catalogs are also critical tools for controlling access to data, which is another aspect of governance that cannot be ignored. “So you have to cater to different personas on the one hand, and on the other you have to establish that trust,” Christiaens says. “They need to get governance controls in place, however minimal at first. But they do need to be established. Otherwise it’s not a sustainable initiative.”
One of the biggest governance challenges that organizations face today is the push and pull that exists between centralized control and decentralized action. Organizations need a certain element of centralized control over their data, which exists in a mostly decentralized manner.
Data catalogs, to some extent, can help by giving a logical view of an organization’s data, whether it lives in relational databases, Hadoop data lakes, NoSQL databases, and S3 buckets in the cloud. Increasingly, though, companies are needing to implement a “catalog of catalogs” to keep on top of their growing data sets, particularly when it comes to cloud data stores.
According to D&B’s Vel, a failure to adhere to centralized data model can lead to bad results down the line when multiple data sets are being integrated and joined for analytics use cases. For example, if an organization is planning to introduce intent data into a marketing campaign, it runs the risk of not properly identifying customers if they haven’t nailed down the data that identifies the customers.
“There has to be a mechanism and governance framework around integrating that intent data, or who’s most likely to buy, into their customer or account information,” Vel says. “If you do that without the right governance models, then the risk of you attaching the wrong propensity model or intent data to the wrong account is very high.”
D&B advocates that organizations have a global data model, and then build multiple localized data model for each line of business or domains. “A lot of times customers will do that attachment, that linkage, using our master data,” Vel says “You cannot stop with governance frameworks just around centralized and global governance models. Depending on the use case, you might need to establish localized models and frameworks that are necessary for that business function or that application.”
Governance is a dirty word in many organizations. As Collibra’s Christiaen points out, many people associate governance with regulatory compliance, and view it as a net drain on finite resources, not something that adds to the solution.
However, as organizations continue to struggle with their analytics processes, they’re beginning to realize that they need step back and rethink their data strategies from the ground up. When they do so, and implement their data strategies with a good governance plan from the outset, their chances of success increase dramatically.