Finding the Data Access Governance Sweet Spot
In part one of this two-part series, I presented the three most common “tried and failed” approaches that large enterprises take to implementing data access controls to increase security and enable compliance with evolving privacy regulations. The three failed approaches all reveal ways in which complexity is the enemy of security. Creating secure copies of data, defining policies as “views,” and using Apache Ranger to enable fine-grained access controls all lead to fragmentation and mounting complexity, opening the door to a data management nightmare, potential security gaps and compliance failures. Increasing complexity can also make it impossible to provide the right access to the right people at the right time, inhibiting business productivity and innovation.
In this follow-on article, I will discuss three additional lessons learned that many successful large enterprises have applied to reach that “sweet spot” where big data can be used responsibly, compliance can be automated, and data management can be made easier.
Lesson 1: Strive for a Single Source of Authoritative Data
The opposite of curating secure copies or views of data is the ability to implement dynamic data access policies on top of a single source of authoritative data. This is the foundation of a successful data access management program. A single source of truth eliminates the proliferation of redundant and ungovernable data silos – while making access management far simpler.
This doesn’t mean you have to consolidate all your data in one place. If you subscribe to the idea of a data lakehouse, for example, great! But if your organization wants or needs to operate disparate data platforms ala a data mesh, that’s fine too. The learning here is that within each system, don’t curate multiple versions of the same data for security purposes. It gets ugly and you quickly lose control, which is the opposite of what you’re trying to achieve.
Instead, implement dynamic data access policies on your authoritative data sources. Modern, universal data authorization platforms allow you to apply fine-grained access controls – mask/hide/tokenize information at the file, column, row and cell level – in real time based on the user’s entitlements and query context. Dynamic data access policies ensure the twin goals of effective governance and user productivity.
A single source of truth also makes it easier to manage and standardize a continuous integration/continuous delivery (CI/CD) pipeline, enabling administrators to catalog and classify only a single data set. This in turn enables a change-once-implement-immediately approach. It also supports efficient auditing to allow the business to demonstrate compliance to regulators.
Lesson 2: Separate the Policy from the Platform
To fully understand an organization’s requirements for security and compliance, the data governance team must collaborate with all other data stakeholders. And instead of dwelling on the technical complexity of policies and policy enforcement, the collaborative discussion should focus on which data consumer roles get to use which classifications of data.
Collaboration with and input from the following teams will help create the optimal foundation for your data program.
- Compliance – Regulatory compliance requirements, such as the right to be forgotten and the personally identifiable information (PII) that must be redacted or obfuscated
- Security – Requirements for Zero Trust data access policies and how to optimize them to minimize risks
- IT – Requirements for a modern data platform, such as cloud-first, containerization and sufficient scalability to support massive data lakes and the required number of users, use cases and computing nodes, etc.
- Lines of business – Their needs for the data program, such as dashboards, machine learning (ML) models, customer 360 views, etc.
By working together within the context of a collaborative platform that recognizes all data stakeholders, the organization can define what consistent policy enforcement across the enterprise looks like – which then allows for automation of policy enforcement. This information is essential for shifting from a limited role-based access control (RBAC) strategy to a combined RBAC and attribute-based access control (ABAC) strategy.
Why RBAC + ABAC? Role-based access control (RBAC) is the standard in most organizations today. But it is insufficient in our post-big data era when the three Vs of volume, velocity, and variety are real and present problems. For example, every data analyst in a financial firm – or group of analysts in a line of business (LOB) within the firm – may be assigned a “card analyst” role so only they can be given access to transaction databases. While this simple RBAC strategy works for simple use cases, the roles must be managed manually, and every new use case requires the creation of a new role, with new permissions granted to the user or users. Further, RBAC is usually limited to coarse-grained access (e.g. an entire table or file), and each system handles role definition and permission management differently. So as the data platform grows in scale, the organization experiences “role explosion,” and complexity abounds.
Attribute-based access control (ABAC), by contrast, allows for far more flexible access policy definitions by leveraging attributes to make a context-aware decision regarding any individual request for access. For example, if data is classified “SSN,” only people with certain roles should be able to work with it. You no longer have to assign roles to individual resources by name. Combined with RBAC, ABAC scales very granular policy requirements to support more people and use cases without hard coding, manual configuration or role explosion. And since the definitions are abstracted out, administrators benefit from easy repeatability and policy reusability across multiple data sources.
The benefits of ABAC include ensuring reliable policy change management, avoiding policy drift across the enterprise, eliminating manual effort to stay in compliance as policies change over time, and increasing data usage intelligence thanks to full visibility.
Lesson 3: Choose Universal Policy Enforcement
Abstract policies need concrete enforcement. Choose a universal data authorization platform that dynamically applies policies consistently and reliably. For example, policies should apply equally to data scientists running Spark on AWS EMR and LOB analysts running Looker queries against Snowflake. Only a universal platform approach enables policies to be automatically and intelligently enforced everywhere without the need for user intervention.
As enterprises are finding in multiple disciplines, from network security to content marketing, relying on a technology platform that can seamlessly integrate partner technologies is the most efficient way to implement and manage a particular strategy. A data policy platform also centralizes auditing and can position an organization to implement distributed stewardship. When looking at the platform for data access governance, be sure the platform is technology and data platform agnostic. This is the only way to allow for a single policy that is understandable and usable for every data system and stakeholder, independent of the underlying solutions.
Make Simplicity the Ally of Security
Big data and evolving privacy regulations have introduced unprecedented information management complexity for enterprises, making security and compliance more difficult than ever. However, as some of the world’s most well-known brands have learned through trial and error, this complexity can be reduced and effectively managed – and security and compliance can be enhanced – when organizations:
- Strive for a single source of authoritative data and enforce fine-grained access controls using ABAC.
- Take a collaborative approach to implementing controls for who can access what sensitive data.
- Adopt a technology-agnostic universal data authorization platform.
About the author: Nong Li is the co-founder and CTO of Okera. Prior to co-founding Okera in 2016, he led performance engineering for Spark core and SparkSQL at Databricks. Before Databricks, he served as the tech lead for the Impala project at Cloudera. Nong is also one of the original authors of the Apache Parquet project. He has a bachelor’s in computer science from Brown University.
May 20, 2022
- Elastic Announces Expanded Collaboration With AWS
- IBM Enhances Global Data Platform to Address AI Adoption Challenges
May 19, 2022
- VAST Data Announces Newest Feature Releases
- Franz’s AllegroGraph 7.3 Extends GraphQL to Knowledge Graph Developers
- Tamr Introduces Tamr Enrich to Simplify and Improve the Data Mastering Process
- Yugabyte Partners With Banking Software Firm Temenos
- New Relic Expands Instant Observability Ecosystem
- Confluent Report: Real-Time Data Streams Boost Revenue and Customer Satisfaction
- Alteryx Announces New Cloud Capabilities
- Komprise Automates Unstructured Data Discovery with Smart Data Workflows
May 18, 2022
- Sylabs Readies for Native OCI Compatibility with Release of SingularityCE 3.10
- Qlik Announces 2022 Global Transformation Awards
- Ahana Announces New Presto Query Analyzer to Bring Instant Insights into Presto Clusters
- Imply Announces Dates and Details for Druid Summit On The Road
- TileDB Secures investment From Verizon Ventures
- New Relic Introduces Low-Overhead Kubernetes Monitoring
- Inspur’s AIStation Passes the CNCF Certified Kubernetes Conformance Program
- Neo4j ICIJ Announce the 2022 Connected Data Fellowship
- Heartex Raises $25M Series A to Accelerate Data-Centric AI
- Global Logistics Company Completes Data Modernization Milestone with Datometry
Most Read Features
- Five Ways Big Data Projects Can Go Wrong (And What You Can Do About Them)
- Google’s Massive New Language Model Can Explain Jokes
- The Future of Data Management: It’s Already Here
- d-Matrix Gets Funding to Build SRAM ‘Chiplets’ for AI Inference
- Payment Fraud at Record Lows Thanks to Analytics and AI, Visa Says
- How to Stop Failing at Data
- All Eyes on Snowflake and Databricks in 2022
- AI That Works on Behalf of Workers
- Meet Andrew Ng, a 2022 Datanami Person to Watch
- Five Emerging Trends in Enterprise Data Management
- More Features…
Most Read News In Brief
- Anaconda Unveils PyScript, the ‘Minecraft for Software Development’
- Looker Founder Helps Create New Data Exploration Language, Malloy
- Google Cloud Launches New Postgres-Compatible Database, AlloyDB
- Why So Few Are Mastering the Data Economy
- Google Debuts LaMDA 2 Conversational AI System and AI Test Kitchen
- Anaconda’s Commercial Fee Is Paying Off, CEO Says
- SalesForce Taps LLM for Programming Boost with CodeGen
- Data Visualization Platform Enso Emerges from Stealth with $16.5M
- Big Data Career Notes: May 2022 Edition
- Data Integrity Firm Highlights High-Risk Wildfire Areas in Texas
- More News In Brief…
Most Read This Just In
- CData Software and HULFT Announce Interoperability Partnership to Break Down Data Silos
- MariaDB Survey Reveals COVID-19’s Impact on Cloud Adoption
- MariaDB Puts $25K on the Table in Distributed SQL Throwdown
- Splunk Extends its Data-to-Everything Platform with Cloud and Machine Learning Advancements
- MarkLogic Announces Data Hub Central
- Tableau Announces New Capabilities to Empower Developers
- Penn State Launches Master’s Degree in Spatial Data Science
- Grid Dynamics Unveils New ML-Based Price Optimization Starter Kit for Google Cloud Vertex AI
- Cisco Cloud Security Deploys Matillion ETL for Snowflake Solution
- BattleFin Announces a Collaboration with AWS Data Exchange
- More This Just In…
Sponsored Partner Content
May 24 @ 8:00 am - May 25 @ 5:00 pmNew York NY United States
May 29 - June 2
June 23 - June 24London United Kingdom
September 13 @ 1:00 pm - September 14 @ 5:00 pm
October 10 - October 12Boston MA United States