Getting to the Heart of Governance for Today’s Data-Driven Business
The phrase “big data” is probably the understatement of the century. Data today isn’t just big, it’s overwhelming. Studies show that 90% of the world’s data was created in just the past two years. If today your data seems big – wait until your organization starts drinking from the Internet of Things firehose. The flood of information keeps growing.
Simultaneously, so do the challenges of keeping this vast amount of data timely, accurate and trustworthy. Organizations can’t afford the risk of losing control over and access to reliable and trustworthy data. This is particularly critical considering the competitive pressures of meeting customer expectations and looming regulatory and legislative deadlines.
Still, most businesses approach data governance from the wrong angle. So many times, the conversation revolves around technical, system-oriented challenges and procedures instead of the business case. And it’s no surprise IT saves the day by implementing data warehousing and data management tools that provide some metadata and technical data lineage capabilities. But in reality, these tools are just quick (and limited) fixes that address only the immediate needs of the organization. If the business wants to be data-driven, what you really need is a business capability to make sense of data.
The current fragmented approach involves integrating systems and moving data based on requirements and analysis of sources and targets — as opposed to establishing rules, standards and policies around how the data will be used by the business users within different departments. In addition, IT typically records their findings and designs in a flurry of archaic paper documents that detail how the data will be moved and how frequently (daily vs. hourly vs. real time), quality thresholds that need to be respected, which rules need to be checked, and more. After analysis and design, the solution needs to be implemented and someone in IT builds the code. Finally, the solution is tested before it goes into production. At each of these points the organization knows exactly where the data came from and how it moves between systems.
But what happens later, when there are new requests for specific information, such as a bank or healthcare organization scrambling to meet a new regulatory deadline? The staff who worked on the original project may have moved on, or the documentation of the design is misplaced, or worse, completely missing. Trying to go back into your businesses’ multiple databases and reconstruct where information has come from and whose hands have touched it after it’s been created is a time-consuming, expensive, and imperfect process. It’s like your basement is flooded, and now you’re down there with a mop to try to clean up the mess. Throwing more IT man- and technology power – essentially a bigger mop — at a data cleanup project isn’t going to help.
The better approach is to proactively stop the water from flooding the basement in the first place. This can be accomplished by putting an automated and systematic control process in place right from the start, or formalizing the process already in place by integrating the business case with IT interaction.
This much-needed, technology-enabled approach to data governance takes an enterprise-wide, sophisticated and systematic approach to handling data, where many user groups are involved across the organization to ensure the availability, usability, integrity and security of the data, critical for businesses leveraging big data as an asset and staying in line with regulatory compliance demands. And most important, data governance creates an agreed upon, collaborative and executable framework, or operating model, for determining enterprise-wide policies, business rules and assets for the data governance team (including the chief data officer, stewardship committee and working groups) and business users to follow.
Adopting an operating model of policies and standards for governing data goes beyond IT’s paper-based trail of data movement between systems, and at a much larger scale, helps determine data inventory, data ownership, critical data elements (CDE), data quality, information security, data lineage and data retention. It’s through this operating model, including data lineage, where critical insights can be gained and any user, at any moment, can see where each piece of data has come from, which users have interacted with it, where it’s going, and what other databases it will feed into.
Here are five important points to keep in mind when putting your data governance policies in place to ensure significant impact from your initiative:
- Technical metadata (i.e., columns, tables, processes, repositories)alone is not enough to help a DBA or data steward understand and model the data in a way that allows efficient data management. A semantic layer needs to be built on top of the metadata to offer meaning to the data for proper data modeling and better data performance. This is why it’s essential for data governance to be an integrated process, with the business side of the organization working hand-in-hand with IT.
- Lineage of data from source to target systems along with transformations, as well as to business metadata like business term definitions and rules, is critical to data stewards and for technical purposes. However, traceability, or a 360 degree view on data assets, is essential for business users looking to answer questions like “Where does my data come from? What policies were used? What standards are applied?” For instance, policy managers will want to see the impact of their security policy on the different data domains, ideally before they enforce the policy; analysts want to have a high level overview of where the data comes from, what systems and what rules were applied; and an auditor might want to see a trace of a data issue to the impacted systems and business processes. Traceability is essential to get the more insightful answers that straight lineage alone can’t provide.
- Use true enablingartifacts such as mapping specifications and data sharing agreements to proactively drive the process. By driving the movement of data from the business needs you create transparency and control. SLAs included in the data sharing agreements establish clear ownership and accountability between data producers and consumers, which is a cornerstone of trust and agility.
- Create system sensors: control points that scan the data source and target systems when something has changed, and automatically notify data stewards when an issue is identified. By alerting the data team of any changes made to the system gives data stewards and others in the organization time to react to the changes, either by making adjustments to rules and standards for governing the data, and/or to avoid a larger issue from occurring in data performance. The business can now deal with data exceptions, rather than having to deal with exceptions as the business.
- Implement a data governance platform that not only smoothly integrates the landscape of surrounding tools and techniques, but is scalable and adaptable to quickly meet the evolving needs of a business. This will help reduce operational costs and ensure a laser focus on data quality, while also eliminating the need to rely on IT to scramble for answers when it comes to data requests and exceptions.
Data governance isn’t only about risk management. It’s about getting to the heart of your data and making it easier for everyone in the organization to use and trust the data for business advantage. A good data governance system will not only proactively prevent problems, but will make it easier for users throughout your company to look at your data in a more intuitive, understandable way. Data governance is a framework for setting data-usage policies and implementing controls designed to ensure that information remains accurate, consistent and accessible in a consistent and timely manner so your company is in the driver’s seat to capitalize on the many opportunities made available through big data.
About the author: Stan Christiaens is Co-founder and Chief Technology Officer of data governance software developer Collibra.
October 23, 2020
- GoodData Adds Enhanced Self-Service Tools to Drive Business Intelligence Adoption
- IBM and R3 Collaborate to Expand Blockchain Capabilities and Services Across Hybrid Cloud
- Amperity and Zendesk to Help Brands Offer Customer Personalization
- Quantum Tape Systems Safeguard Scientific Data for British Antarctic Survey
October 22, 2020
- Minitab Launches Launches New Solutions to Help Organizations Accelerate Digital Transformation
- AccuWeather Sponsors Climate Change Machine Learning Research Competition at University of Toronto
- Precisely Delivers First End-to-End Data Integrity Suite for Confident Business Decisions
- Centerity Recognized for Market-Leading AIOps Platform with Integrated Cyber Security
- Protegrity Unveils Vision for the Secure AI Era
- Qlik Acquires Blendr.io to Drive Real-time Data into SaaS Applications and Automate Enterprise Processes
October 21, 2020
- The AA Executes Hybrid Cloud Strategy for Data Analytics with Actian’s Avalanche Cloud Data Warehouse
- Exasol and Nuqleous Join Up to Bring Data Analytics to Retail and Consumer Product Companies
- InterSystems Partners with AtScale on New Adaptive Analytics Within IRIS Data Platform
- YottaDB Announces Octo 1.0, a Plugin for Using SQL to Query Data
- Anyscale Announces $40M in Series B Funding Led by NEA
- KIOXIA America to Showcase Next-Gen SSDs at Dell Technologies World
- Medallia Partners with Tableau to Help Companies Visualize Customer Experience Data
- Enterprise Strategy Group Finds Massive Cost and ROI Benefits from Yellowbrick Data Warehouse
- Machine Learning Capabilities Come to the Majority of Open Source Databases with MindsDB AI-Tables
October 20, 2020
Most Read Features
- Big Data File Formats Demystified
- Systemic Data Errors Still Plague Presidential Polling
- Do You Need a Chief Data Scientist?
- Data Culture ‘Disconnect’ Identified in New Index
- How to Build a Better Machine Learning Pipeline
- VC Ben Horowitz Dishes on Hadoop, AI, and Data Culture
- How Geospatial Data Drives Insight for Bloomberg Users
- Is Python Strangling R to Death?
- 10 Big Data Statistics That Will Blow Your Mind
- Understanding Your Options for Stream Processing Frameworks
- More Features…
Most Read News In Brief
- Qubole is Latest Acquisition Target
- Testing Data Literacy on Main Street
- Informatica Likes Its Chances in the Cloud
- Pandemic Driving ‘Back to Basics’ in Big Data, Study Suggests
- TigerGraph Offers Free Graph Database for On-Prem Analysis
- Palantir Looks to Build on Snowflake’s IPO Success
- AI Startup Uses FPGAs to Speed Training, Inference
- Patchwork of Data Privacy Laws Sows Confusion
- War Unfolding for Control of Elasticsearch
- New AI Tool Maps the Families of the Bible, A Song of Ice and Fire
- More News In Brief…
Most Read This Just In
- Datanami Reveals Winners of Fifth Annual Readers’ and Editors’ Choice Awards
- Tableau Launches Free Data Literacy Training Program
- NASA, ICIJ, ATPCO, Lyft and More Choose Neo4j for their Knowledge Graphs
- Hazelcast to Provide Additional Capabilities to IBM Cloud Pak for Multicloud Management
- Collibra Launches New Partner Program
- Fujitsu Enters Strategic Alliance with Palantir Technologies
- Alida Integrates Stratifyd AI-powered Analytics Engine into New CXM Platform
- KNIME and H2O.ai Accelerate and Simplify End-to-end Data Science Automation
- Data Science Professor Receives $1.25 Million Grant from Department of Defense
- Nutanix Delivers Advanced Data Management Platform for Hybrid and Multicloud Environments
- More This Just In…