Navigating Modern Data’s Dual Mandates: Access and Governance
We’re in the midst of a tumultuous shift when it comes to how organizations use data. On the one hand, the drive to use data to make as many decisions as possible is palpable. On the other hand are the growing security concerns and privacy rights of individuals. How an organization navigates those dual mandates is proving to be one of the tougher challenges in data’s current state of evolution.
Thanks to high-profile exposés of the data abuses by the tech giants and laws like GDPR and CCPA, organizations are coming to a new realization about what can be done with data, what is ethically acceptable, and what should not legally be permitted. It’s taken some time, but the Wild West phase of rampant abuse of personal data seems to be getting smaller when we look in the rear-view mirror. When it comes to individual rights in a democratic society, it’s hard to see the flowering of new privacy rights as a bad thing.
However, this realization is arriving just as thousands of organizations around the world are discovering exactly how powerful and profitable data can be. Following the lead of tech giants that have already developed their own big data apparatus, companies across industries are scrambling to build their own big data pipelines to feed rear-facing analytics and forward-looking machine learning systems. And with today’s powerful cloud tools, it’s never been easier.
The calling card for this build-up of data materiel, ironically, is data democratization. The goal: Strip away the silos slowing down the movement of data, and figure out how to put as much actionable data into the hands of as many decision makers as technically–and legally–possible.
Thus was born today’s data dual mandate: build the data out as fast as possible, but keep the data governed at the same time. “This dual mandate is real,” says Balaji Ganesan, the CEO and co-founder of Privacera, a company that develops data governance software. “They have to meet those dual mandates. It’s not ‘either or.’”
Scaling Up Data Access
Helping companies navigate this dual mandate is what Privacera and other companies like it are designed to do. The technical challenges are steep, however, considering the large variety in the types of data consumers (everyone from junior analysts to senior data scientists) as well as the locations of the data (on-prem, in the cloud, and everywhere in between).
“You can’t say, ‘I’ll not give you anything,’” Ganesan says. “But it also can’t be the Wild Wild West. So how do you meet that dual mandate? It is becoming a big challenge in the enterprise world.”
Existing approaches to data governance that may have worked when data was largely centralized in a data warehouse or a smaller number of source databases won’t work with today’s highly distributed data environments. There is simply too much data, and too many users, to funnel all data requests to a centralized IT-based team to handle. Instead, Ganesan and his Privacera colleagues are seeking to empower each department to have the tools necessary to provision data to their own users.
“The way we do that is architecturally not coming between the user and the data,” he tells Datanami. “That’s the fundamental principle we have taken. In the traditional world, security used to be a provided as a layer, a virtualization layer on top–that’s how you can control [the data]. Our approaches has been different, to say, you don’t need to be in the middle. In a cloud world and highly scalable distributed world, you can’t do that. You have to take an approach where the user experience is paramount” but without hurting governance.
However, the leaders of the organization still need to know that data regulations are being followed, and that GDPR violations are not being tolerated in the organizations. But instead of centralization of access control, the new data order calls for unification of policies. By enabling each department to set the specific rules that control access to data within the framework of unified data access policies, it can enable organizations to move quickly with data and abide by the demand for governance.
Automation in the Cloud
For Privacera, which is built on its founders’ heritage developing Apache Ranger at XA Secure (acquired by Hortonworks in 2014), the ability to deliver data governance while not slowing down user access to data is the key.
Ganesan says the Privacera software, running in the cloud as a managed service, functions as a side car to the analytics tools, including BI tools and cloud data warehouses like Snowflake, Databricks, Redshift, BigQuery, and Azure Synapse Analytics. Users simply have access to the data they’re allowed to use, and have no visibility into the data sets they don’t.
“[Users] will not even notice our tool is working,” he says. “In some cases, we can even mask data or push rules to mask data, so they’re only seeing the data they’re supposed to. But if they need additional access to the data, they can always go into our tool or any other tool and request access. By automating this, we have actually made this whole process efficient.”
The open source Apache Ranger software plays a role in Privacera’s software. But software alone can’t solve this challenge. Ganesan understands that the right combination of people, process, and technology will be necessary if the dual mandates of data governance and data access are going to be enabled.
“It’s really an exciting space. It’s still the very early innings. There’s a lot of DIY and manual work that happens in the organizations. But privacy is real, compliance is real,” Ganesan says. “You can actually have both. You can actually have governance, and you can actually data democratization. That’s kind of our mission right–how do you make this whole movement responsible.”