February 9, 2017

Tamr’s Data Prep Platform Gains U.S. Patent

George Leopold

via Shutterstock

A new approach for integrating large numbers of data sources using a combination of machine learning techniques along with human expertise has earned U.S. patent protection.

Data preparation specialist Tamr Inc. said Thursday (Feb. 9) the U.S. Patent and Trademark Office has awarded a patent (US9,542,412) covering its “data unification” platform. The company’s machine learning approach is used to prepare data from multiple sources by “normalizing, cleaning, integrating, and de-duplicating” data sources.

“Our goal was to build an end-to-end system for enterprise-scale data curation that leveraged modern machine learning techniques to radically reduce the time and cost of producing clean, unified data sets,” explained Mike Stonebraker, Tamr’s co-founder and CTO.

The patent describes new features implemented in the company’s software. These include the techniques used to obtain training data for machine learning algorithms along with a methodology for linking attributes and database records. It also describes various methods for “pruning the large space of candidate matches for scalability and high data volume considerations,” the company said.

The data unification system features “data cleaning” of raw data that is both “dirty” and “noisy,” along with extensive use of automation algorithms along with human intervention as needed to scale the platform.

Other features include incremental data integration and curation. “New data sources must be integrated incrementally as they are uncovered,” the company noted. “There is never a notion of the data integration task being finished.”

The startup based in Cambridge, Mass., was spun out of the Massachusetts Institute of Technologies’ Computer Science and Artificial Intelligence Laboratory in 2014. It differentiates itself from a growing number of data prep specialists who apply rules to combine a limited number of data sources. By contrast, Tamr said it approach combines machine-learning techniques with human experts. That, the startup asserts, allows it to scour data for correlations and duplications in hundreds of source files.

The U.S. patent award comes as the data preparation market is booming. Market researcher Gartner predicted last year that the self-service data preparation software sector could reach $1 billion by 2019, and that the current adoption rate of 5 percent would grow to 10 percent by 2020.

Tamr’s machine learning approach seeks to exploit the dirty data problem in pursuit of software license and maintenance revenue. The startup, which was launched by Vertica founders Stonebraker and Andy Palmer, uses a combination of machine learning algorithms and crowd-sourced human oversight to automate much of the work that goes into combining and integrating siloed, semi-structured data so that it can be more effectively utilized in analytic systems.

Along with patent award, Tamr has raised $41.2 million in two funding rounds, including a $25.2 million Series B round closed in June 2015. Among Tamr’s early investors are Google Ventures (NASDAQ: GOOGL) and New Enterprise Associates.

Recent items:

Why Self-Service Prep is a Killer App For Big Data

Why Big Data Prep is Booming

Applications: Artificial Intelligence, Enterprise Analytics

Technologies: Frameworks

Sectors: Academia, Financial Services, Other

Vendors: google, Tamar

Tags: data cleaning, data preparations, dirty data, machine learning

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Tamr’s Data Prep Platform Gains U.S. Patent

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 23, 2024

April 22, 2024

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Tamr’s Data Prep Platform Gains U.S. Patent

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 23, 2024

April 22, 2024

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link