Follow Datanami:
February 17, 2015

Connecting the Dots on Dark Data

Alex Woodie

It’s estimated that 90 percent of the data in the average enterprise is “dark data”–that is, data that isn’t readily available for analytics—leaving only about 10 percent data exposed for doing analytics. While the inclusion of pertinent metadata and master data management (MDM) structure is the long term answer to the dark data problem, enterprises need a way to get value out of that data now.

The dark data problem runs deep in many organizations, especially those that have implemented (or acquired) their way to data silo hell. For example, one company with 300 different ERP systems had the types of trouble you might expect.

“For the one customer with 300 ERP systems, even asking a question like ‘How many suppliers do we have?’ requires going into each of these 300 ERP systems, running a query against them, getting the data, and then try to make sense of it,” says Nidhi Aggarwal, a product manager at Tamr, which makes data wrangling and transformation tools.

It turns out that particular client had 1,200 suppliers, but that answer opens up all sorts of other questions they could be asking, such as “What are those suppliers charging me for parts?” Trying to answer that type of questions is hard when the data is not readily available.

Since it launched in 2013, Tamr has been helping its clients get a handle on big dark data. The Cambridge, Massachusetts-based company’s software uses a mixture of machine learning algorithms and the domain knowledge of humans to automate the process of unifying disparate data sources for the purposes of analytics.

Today the firm unveiled a new platform in advance of the Strata + Hadoop World conference that’s taking place this week in San Jose, California. The company with 300 ERP systems would be a candidate for one of the new industry-specific offerings called Tamr for Procurement, which is designed to provide a unified view of part, supplier, and transaction data that is housed across disparate silos.tamr logo

According to Tamr, machine learning algorithms can automate upwards of 90 percent of the data matching tasks in a procurement setting, leaving customers to focus on the analytics and not the janitor work of munging, matching, and cleansing data.

The venture-backed firm is also targeting the pharmaceutical industry with a new industry-specific package that takes the pain out of transforming clinical trial data from a SAS format into the CDISC format that the Food and Drug Administration requires.

“The biggest companies run anywhere from 100 to 200 clinical trials every year, so this is a very expensive process,” Aggarwal says. “We have a drag-and-drop interface [for all] the SAS files, and we just convert them using Tamr to the CDISC format. So we cut down the time that you require to convert to this format, and now you have an automated way to do this over and over again.”

The company–which was co-founded by serial entrepreneur Andy Palmer and legendary database developer Mike Stonebreaker—is also looking to turn heads with its new Tamr Platform. The product brings a host of new features designed to help customers at the three phases of big data transformation, including cataloging existing data sources, connecting to those sources, and then exposing the transformed data so that it’s easy for data scientists and analysts to actually consume the data.

“We’re closing the enterprise analytics gap that today separates the big questions you want to ask – and the big answers you’re looking for – from your huge investments in siloed data,” Palmer says in a press release. “The Tamr platform puts high-value data integration and enrichment into the hands of more people, so businesses can solve the big P&L-level problems that have been starved for data.”

The company also announced three new customers, including Roche, Toyota, and General Electric.

Related Items:

Tamr Whips Semi-Structured Data Into Shape

Forget the Algorithms and Start Cleaning Your Data

Datanami