July 5, 2019

Data Privacy and Smart Streaming Discovery: Getting Past the Furor

Rohit Mahajan

(Lightspring/Shutterstock)

There’s a lot of furor worldwide about all of the data privacy regulations that are either in place, about to go into place, or are being debated as to whether they should go into place. Just because there are rules like the General Data Protection Regulation (GDPR), it doesn’t mean everyone is following them, or is necessarily compliant.

So, while there may be bewilderment about the rules and what they can and should be capable of doing, there is one point that should be made very clear: the rules will ultimately be sorted out, and companies should begin their efforts to comply with their spirit, as opposed to the mere letter of the law. Doing that will require them to deploy advanced data discovery efforts to figure out exactly what information they have under their control and where it resides.

But it really goes far beyond that. Data privacy is at the heart of all of these rules, whether it’s GDPR, California’s forthcoming Consumer Privacy Act, the Washington Privacy Act or more. They all focus on Personally Identifiable Information (PII), which the U.S. government says “refers to information that can be used to distinguish or trace an individual’s identity, either alone or when combined with other personal or identifying information that is linked or linkable to a specific individual.” Your name, address, Social Security number, phone number, bank account details…all of it, PII.

(MimaCZ/Shutterstock)

It’s in that context that companies must realize that PII flowing into their systems becomes immediately part of their privacy efforts, and that they must be accountable for its tracking regardless of whether the data is at rest or in motion. The instant they have PII, they are responsible for making sure it’s not accessed or used in a manner that violates the law, either in a civil or criminal manner.

The sheer amount of data, meanwhile, continues to escalate: IDC’s annual report on the growth in data estimates that worldwide data will grow to 175 zettabytes within five years. The data is coming from virtually every source imaginable, but the explosion of data from Internet of Things (IoT) devices has the potential to dwarf that growth; the IDC study estimates 90ZB of data, or more than half of the total amount, will be created on IoT devices by 2025.

IDC’s analysts go on to estimate that by 2025, “each connected person will have at least one data interaction every 18 seconds. Many of these interactions are because of the billions of IoT devices connected across the globe.” That data, in motion, will be flowing across and into corporate data lakes and warehouses literally every second of every day, which presents companies with both a challenge as well as an opportunity.

The challenge, as mentioned above, is ensuring that data — much of it PII — is accounted for from the moment it enters the corporate ecosystem. However, companies that can manage that data in all its forms will have the opportunity to gain and maintain the confidence of their customers and prospects, who see these firms as trustworthy guardians. This has the potential to deliver positive benefits in terms of reputation, cost, streamlining processes and more.

(bannosuke/Shutterstock)

Which is why many companies and their Chief Data Officers are already reviewing the potential for “Smart Streaming Discovery” in their organizations. They know that GDPR and other emerging laws and regulations will become increasingly consistent and specific on the compliance requirements. Artificial (or augmented) intelligence will be required to power the machine learning needed to track data throughout petabytes of information located across multiple sites. And the ability to detect the data, whether it’s at rest or in motion, will become paramount. Put another way, you can’t search what you don’t know you have.

Leveraging advanced technology and machine learning algorithms, Smart Streaming Discovery enables organizations to think and act in real-time, benefitting from the timely analysis and extraction of insights from data streams to discover data “in motion,” as opposed to stored data “at rest.” From the point of data ingestion, organizations should be able to quickly and automatically detect PII and other streaming sensitive data in structured, semi-structured and some unstructured formats. Leveraging deep learning techniques to automatically tag data as sensitive and flag it before it lands in data stores, to proactively manage PII and sensitive data, will allow Subject Matter Experts (SMEs) time to focus on remediation activities, and the organization to move towards automated data governance.

While the requirements and the standards of the privacy laws are not yet standardized, they will be. They must be, for consumers to gain a uniform of level of trust. As such, the technology which is making compliance a reality must look ahead to what’s next. It is already no longer good enough to merely think of discovering, managing and understanding data at rest. Smart Streaming Discovery will be needed, to help everyone address this issue, and to deliver solid business benefits in the process.

About the author: Rohit Mahajan is the CTO/CPO of Io-Tahoe, a provider of machine learning-powered data discovery solutions. Rohit is an ex Wall Street executive turned entrepreneur who is passionate about developing disruptive technology for data discovery using machine learning. He is an experienced technologist with a proven track record of implementing global solutions at financial institutions for DevOps, testing, security and data center transformation. In his 20 year technology career, Rohit has held a number of senior roles at Dun and Bradstreet, Morgan Stanley, and Deutsche Bank.

ML Powers Discovery In GE’s 500 PB Lake

The Wild West and Last Frontier of Big Data

Applications: Data Mining

Technologies: Frameworks

Sectors: Financial Services, Retail

Vendors: Io-Tahoe

Tags: CCPA, data discovery, GDPR, iot, machine learning, personally identifiable information, PII, privacy, Rohit Mahajan, security

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

April 19, 2024

April 18, 2024

April 17, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Data Privacy and Smart Streaming Discovery: Getting Past the Furor

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

April 17, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In