Follow Datanami:
March 7, 2018

StreamSets Balances Streaming Data Demands for Security, Access

It can be difficult to find the right balance between protecting data and utilizing it. With its new Data Protector offering unveiled yesterday at Strata Data Conference San Jose, StreamSets thinks it has found a happy medium, at least for data in motion.

StreamSets develops tools to help organizations create and manage data pipelines, and make that data available for processing, analysis, and other uses. Its software helps organizations prevent large data flows from turning chaotic as the various systems (Hadoop, etc.), frameworks (Kafka etc.), and databases (Impala, etc.) involved with the analysis change over time.

The twin pillars of security and privacy are big concerns at the moment, particularly for those companies that must comply with the General Data Protection Act (GDPR) when it goes into effect May 25. For these companies, making sure streaming data flows do not contain compromising data and comply with GDPR (and HIPAA and other regulations, for that matter) is of paramount concern.

However, without the capability to read fine-grained details contained in streaming data flows, companies would severely hamper their analytic programs. Therefore, a balance is required between the competing demands of privacy and access.

“There are conflicting forces within an organization,” says Rick Bilodeau vice president of marketing. “On the one hand you have the risk oriented people who want to make sure they lock down data and do the things they need to protect data. On the other side, there are analysts who see the obfuscation as a barrier to doing their job. They need to have a mechanism to intercept data at some point downstream basically to be able to meet the needs of both of those parties.”

StreamSets thinks it has found that balance with Data Protector, a new offering that allows companies to implement fine-grained security and privacy policies that meet those twin demands.

The new product provides several capabilities. The first step is identifying personally identifiable information (PII) as it flows through into the organization. When Data Protector encounters a piece of PII that falls under regulatory purview, it flags it for downstream processing.

The next step is protecting the PII or other sensitive data. Data Protector includes 13 different algorithms that can scramble, encrypt, or other obfuscate the data so that it no longer poses a regulatory hazard. Some of these algorithms are reversible, allowing the protection to be wound back downstream if greater fidelity is required for analytical purposes.

The higher up in the data flow that organizations deal with the PII, the better, says Bilodeau.

“Because we can pick this stuff upstream sooner, it allows us to fork the data stream,” he tells Datanami. “We’re going to create these things called security zones. If the data has these characteristics, it’s anti-money laundering, and it goes this way and we apply a certain algorithm for obfuscation. If it has these characteristics for GDPR, it goes this way. We will be able to apply different types of algorithms for different situations.”

The security zones allow security architects to design defense-in-depth strategies around data, the company says. The software also complements data governance solutions for data at rest, and integrates with data catalogs, such as Apache Atlas, Cloudera Navigator, and products from Alation, Collibra, and Waterline Data, StreamSets says.

Data protection is critical today, particularly with the potential for heavy fines and damage to brands, says Girish Pancha, CEO of StreamSets. “There is a gap in data protection today as current solutions only deal with data at rest and are blind to PII arriving from new or updated sources,” he says. “StreamSets Data Protector closes this coverage gap by extending policy-based control over sensitive data out to the point of ingestion, while gracefully handling unexpected data drift whenever it occurs.”

Related Items:

Fueled by Kafka, Stream Processing Poised for Growth

Keeping on Top of Data Drift