Follow Datanami:
June 30, 2016

ClearStory Patent Covers Data Harmonization Tool


A U.S. patent awarded this month to ClearStory Data, the big data preparation tool specialist, covers its automated data harmonization tool designed to work across disparate data sources and a variety of data types.

Such data prep tools are gaining favor as the amount and types of unstructured data from sources like social media and sensors continue to skyrocket.

ClearStory, Menlo Park, Calif., said U.S. Patent 9,372,913, “Apparatus and method for harmonizing data along inferred hierarchical dimensions,” covers its in-memory, Spark-based data harmonization platform. The tool is designed to speed data prep and blending.

According to the patent award, ClearStory’s proprietary technique produces a “first inferred” data type that is used to augment “first received” data with values that aggregate initial data with a “first hierarchical dimension.” Those steps are repeated on a second data type to create a second hierarchical dimension.

The two hierarchies are then harmonized to a lowest common unit value, the patent abstract states, adding: “A first visualization of the first received data is provided based upon the lowest common unit value. A second visualization of the second received data is provided based upon the lowest common unit value.”

The company said the patent protection covers its deep data inference capability along with semantic data recognition that it claims can converge and harmonize multiple data sets “on-the-fly.” In addition, Apache Spark in-memory framework is designed to eliminate the need for lengthy data “pre-modeling”. The goal is a faster path from data access to data preparation across sources that differ in terms of structure, size and “velocity,” the company said.

While Apache Spark is the native in-memory data processing engine addressed in the ClearStory patent, the company said its data harmonization platform is not limited to Spark in-memory processing. “The patent is associated with one of the core elements of ClearStory’s solution and interconnects data inference, data harmonization and the associated granular logical and physical metadata,” the company said in a statement. The patent “includes the architectural approach for how complex data is distributed in-memory, and data linkages across the data pipeline from inferring results to seeing harmonized results.”

ClearStory and other data prep specialists are seeking to overcome roadblocks faced by traditional business intelligence and data science approaches in accessing and combining a growing variety of evolving data sources. Data prep practitioners are steadily adopting automated data harmonization approaches to reduce costly and time-consuming data wrangling and pre-modeling to speed analysis of huge data sets.

ClearStory’s patented approach based on deep data inference and hierarchical dimensions automatically infers data types based on machine-based pattern recognition. That capability is said to eliminate the need to pre-model data or specify definitions for various attributes.

The data harmonization process then identifies and ranks data relationships based on inferred data types and semantics in the source data. The results are then blended, accounting for data sets that differ in terms of granularity and scale, the company said.

The U.S. patent award for the inference-based data harmonization platform is among five patent applications filed by ClearStory covering data prep technologies based on scalable, machine-based approaches to self-service data analysis.

Recent items:

Why Data Prep is Booming

Automating the Pain Out of Big Data Transformation