March 7, 2019

Okera Bolsters Access Control for Unstructured Data

Alex Woodie

(bluebay/Shutterstock)

Some of the most interesting data that’s analyzed by organizations is unstructured data, including images, videos, and text files. But the lack of structure of that data poses regulatory challenges to organizations, which face potential legal jeopardy if consumer data rights are violated. Okera today released new data access control software that it claims can alleviate much of the regulatory burden hanging over big data analytic activities.

Okera emerged from stealth about a year ago with a fine-grained data access control system that was designed to solve a vexing big data problem: How do you give analysts, data scientists, and other stakeholders access to multiple disparate data sources, such as HDFS, S3, and relational databases, while at the same time complying with strict new data regulations like GDPR and CCPA?

The approach taken with the Okera Active Data Access Platform (ODAP) is to move the data access abstraction up a level. Instead of defining and enforcing access control in each individual file system, streaming data platform, or database, the company lets customers configure that access in ODAP, which then federates control down to the individual data stores.

The software did that by essentially viewing data as a series of tables, including data stored in file systems like HDFS and S3. By viewing data through that table construct, ODAP was able to enforce column-level access control, thereby granting different user groups with different levels of access on the same file.

The first release of ODAP primarily targeted structured data, says Amandeep Khurana, Okera co-founder and CEO of the San Francisco, California, company. With today’s update, the company has added support for managing access to unstructured data, while also delivering more secure access to data stored on S3.

By providing file-level access control to unstructured data, ODAP can streamline access to data for a greater number of users and use cases while adhering to strict regulations, Khurana says.

“Let’s say you have a CSV in HDFS, but you don’t really know the structure of the CSV,” Khurana says. “You start with HDFS Access Control Lists, then you can move it into Sentry ACLs. So now actually you have to maintain two types of access control lists, or two kinds of policies, for the same data set, because you didn’t know the structure. You can start to see how complicated this can get at scale, very very quickly.

With the new software, Okera provides a single pane of glass to manage access to those CSV files and the various access paths that analysts and data scientists will use to get them. “So instead of managing two different kinds of policies, you manage only one kind of policy on one single system,” he says.

Okera’s ODAP centralizes fine-grained access control to a variety of data repositories

Okera also bolstered its support for Amazon‘s S3 file system, which is increasingly the data storage repository of choice for organizations building huge data lakes in AWS. Amazon lets users manage access to S3 buckets using Identity and Access Management (IAM) configuration files. However, managing those policies is “massively painful,” Khurana says.

“You actually have to go into S3 configuration bucket and write JSON, which is super complicated,” he says. “When you want to change those policies, you have to change that JSON. This becomes a data engineering nightmare for people. And you also lose all visibility. You don’t know what happened — who gave what access, who got what access.”

And AWS also enforces a limit on how many IAM policies you can have for any given S3 bucket. “So if you have more data sets that you need to manage than your limit,” Khurana continues, “you’re just [shoot] out of luck.”

Organizations are really struggling to manage their big data sets in accordance with emerging regulations, Khurana says. For each data use case, GDPR requires organizations to collect consent from individual users, with hefty fines for each violation. That has forced organizations to get creative with their data management.

One way they do that is by breaking up large datasets into lots of smaller files, each with its own fine-grained access controls. But administrators tasked with managing all those individual data sets soon hits the ceiling in what they can manage. Other organizations are taking a similar approach by duplicating source datasets and applying different access controls to adhere with specific regulatory requirements. But when data sets start getting into the tens or hundreds of terrabytes, that soon becomes prohibitively expensive.

Okera thinks that it has struck upon the right approach, by essentially empowering file and object systems with database-like qualities for fine-grained access control, but without giving up the richness, scalability, and diversity of data that file and object systems bring.

“Machine learning workloads don’t run on databases, but you still need the same kind of controls over them” Khurana says. “And you don’t want a separate system for machine learning workload access management, BI access management, and data lake access management. So it’s the unification that becomes very, very important.”

In addition to access control, ODAP provides real-time tokenization, redaction, and row-level filtering. The software supports a range of tools, from Amazon EMR and SageMaker, to Hadoop tools like Hive, Presto, and Spark, as well as BI tools like Tableau, Birst, and Qlik.

California’s New Data Privacy Law Takes Effect in 2020

Okera Emerges from Stealth with Big Data Fabric

Applications: Enterprise Analytics

Technologies: Middleware

Sectors: Financial Services

Vendors: Amazon, Okera, Qlik, Tableau

Tags: ACL, big data, CCPA, data access, fine-grained access control, GDPR, governance, IAM, Okera, s3, unstructured data

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

April 25, 2024

April 24, 2024

April 23, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Okera Bolsters Access Control for Unstructured Data

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In