Follow Datanami:
January 22, 2020

Hitachi Vantara Buys Cataloger Waterline Data

Hitachi Vantara is acquiring data catalog startup Waterline Data as the U.S. subsidiary of the Japanese industrial giant seeks to meet growing demand for automation frameworks for data lake management via its AI-driven DataOps platform.

The Hitachi Ltd. (TSE: 6501) unit said Wednesday (Jan. 22) it would offer Waterline Data’s catalog technology based on proprietary “fingerprinting” technology as a separate product as well integrated with its flagship Lumada data services platform.

Financial terms of the deal were not disclosed. The acquisition is expected to close by the end of March 2020.

Waterline Data has ranked among the leaders in recent industry surveys of vendors offering data cataloging services designed to ingest raw data and deliver tracked data that can be combined with existing domain expertise. The startup’s machine learning cataloger automates metadata discovery to boost analytics and governance tasks from the network edge to the cloud.

The Hitachi unit, Santa Clara, Calif., also stressed Waterline Data’s hybrid deployment options, from inside datacenters through public clouds as well as handling large data volumes either in Hadoop or SQL databases.

The Hitachi unit said the acquisition would allow it to add automation features to its data management platform, including improved visibility, tighter quality control and improve compliance with data regulations.

“Data catalogs have emerged as a key building block of successful DataOps projects,” Brad Surak, president of Hitachi Vantara’ digital solutions unit, noted in a blog post announced the Waterline Data acquisition.

“Without proper data cataloging, it is incredibly difficult to guarantee data quality and governance, and to deliver the outcomes that depend on it,” Surak added.

Waterline Data’s customers include financial services, healthcare and pharmaceutical firms seeking cataloging tools for improving data governance. That capability allows user to pinpoint sensitive data covered by a growing list of privacy regulations.

The company’s proprietary fingerprinting technology uses AI and rules-based frameworks to automate the discovery, classification and analysis of distributed data. The system can be used to tag large data sets based on common traits.

That approach addresses a key impediment to leveraging data lakes: the ability to “de-swamp” petabyte-scale data lakes by ingesting and organizing large volumes of raw data.

Hitachi’s Surak stressed that Waterline’s approach does not require detailed labeling, meaning a single properly identified data set would suffice to propagate labeled data to similar sets across a data lake.

Hitachi Vantara, which emerged from Hitachi Data Systems in 2017, has aggressively pushed its Lumada analytics platform to enable, for example, Internet of Things services from the network core to the edge. Waterline Data is the latest in a series of Hitachi acquisitions, including its 2015 deal for analytics specialist Pentaho.

Recent items:

How the Machine Learning Catalogs Stack Up

Data Catalogs Emerge as Strategic Requirement for Data Lakes

Inside Hitachi Vantara’s Very Ambitious Data Agenda