October 14, 2019

Synthetic Data Market Gets Real

George Leopold

via Shutterstock

A growing list of data privacy regulations along with demand for better training data is spawning new AI-based approaches to managing “personally identifiable” information, including “synthetic” data sets that remove personal information covered by current and pending privacy rules.

Diveplane, an AI startup based on Raleigh, NC, said its synthetic data platform dubbed Geminai generates a twin data set that “acts and feels realistic” for the purposes of data modeling while stripping out personal information. The synthetic data tool targets business and government users seeking to analyze and share data sets while also complying with a growing list of data privacy rules.

For instance, the California Consumer Privacy Act (CPAA) scheduled to take effect on January 1, 2020, limits the dissemination of “personally identifiable information.” The EU’s General Data Protection Regulation (GDPR) contains similar restrictions.

Diveplane said it synthetic data tool can address the widening gap in training data brought about by privacy restrictions.

“Many businesses are forced to use inaccurate or incomplete data to train their AI due to privacy requirements, which can lead to the AI making poor or misleading decisions,” said Diveplane CEO Michael Capps. The Geminai tool creates a synthetic “twin” dataset that can be verified by users as they train AI models.

For example, proponent of the synthetic data approach note it can be used to test algorithms, allowing developers to develop prototypes that can help justify risky AI initiatives. In another scenario, synthetic data can be used to develop large, labeled data sets customized for a specific project.

Diveplane claims its approach goes beyond simply masking slices of private information such as names and social security numbers. Instead, it addresses what the startup calls the balance between privacy and accurate data used for model training.

Other startups offer data discovery approaches to assist with regulatory compliance, including tools that use machine learning algorithms to help track down and manage “personally identifiable information.”

Along with compliance with U.S. and EU privacy regulations, Diveplane is also targeting medical research, including the ability to anonymize patient records so investigators can use those data sets without violating the Health Insurance Portability and Accounting Act, or HIPAA.

Other applications include generating granular data for training neural networks, thereby improving AI functionality, as well as “de-identifying” data sets to allow more data sharing.

Ultimately, the startup’s goal is enabling “understandable AI” that is “trainable, interpretable and auditable.”

Other synthetic data proponents note the emerging approach can be used to produce large volumes of labeled data faster and cheaper than manual labeling. Another use case is generating unique training data that would otherwise be difficult to capture “in the wild.”

Recent items:

Five Reasons Synthetic Data Is the Electrolyte to Speed Up Your AI Initiatives

Faulty Data is Stalling AI Projects

Tools Emerge to Comply with California Data Law

Applications: Artificial Intelligence, Complex Event Processing, Data Mining, Enterprise Analytics, Research Analytics

Technologies: Frameworks

Sectors: Biosciences, Financial Services, Government, Healthcare

Vendors: Diveplane

Tags: AI, CPAA, data management, data privacy, GDPR, Geminai, labeled data, machine learning, model training, synthetic data, training data, twin datasets

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

April 18, 2024

April 17, 2024

April 16, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Synthetic Data Market Gets Real

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 18, 2024

April 17, 2024

April 16, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In