Follow Datanami:
June 12, 2024

Anomalo Adds AI-Powered Monitoring of Unstructured Text to Its Data Quality Platform

SAN FRANCISCO and PALO ALTO, Calif., June 12, 2024  — Today at the Data + AI Summit 2024, Anomalo, the complete data quality platform company, announced that it has expanded its platform that monitors the quality of structured data in data warehouses and data lakes to monitor unstructured text. Anomalo’s unstructured capability makes it possible for enterprises to discover, curate, leverage and ingest high volumes of text data without the risk of using low quality data, which is especially critical for Generative AI applications. This new feature is currently in private beta.

Ninety percent of enterprise data is unstructured. Unstructured data does not comply with traditional standard formats which makes it extremely challenging to organize, store, search, retrieve and analyze. Unstructured data itself is also problematic as it often contains inconsistencies, errors and duplicated content. Even more problematic is that unstructured data can contain sensitive confidential information, including company intellectual property, personal identifiable information (PII) and abusive language. These combined challenges can lead to privacy, security and performance risks, especially as this data gets incorporated into Generative AI models and applications.

Organizations are implementing Generative AI and ingesting unstructured text for the purposes of model training, fine tuning and Retrieval Augmented Generation (RAG) at a volume and velocity previously unseen. As a result, organizations need to be able to identify and resolve quality issues with such data before it gets incorporated into Generative AI models and impacts their performance.

With Anomalo’s new unstructured capability, unstructured text documents can be curated and evaluated for data quality around various document and document collection characteristics, including document length, duplicates, topics, tone, language, abusive language, PII and sentiment. Users are able to quickly evaluate the quality of a document collection and identify issues in individual documents, dramatically reducing the time needed to curate, profile and leverage high-value unstructured text data.

Elliot Shmukler, co-founder and CEO of Anomalo, said: “It’s been well known that higher quality data leads to better data products, including traditional dashboards and machine learning models. The same is true in the world of Generative AI, where the quality of the text used to fine-tune or prompt the model via RAG could be the difference between a high performing application and one that is at best underwhelming and at worst, a privacy and compliance risk. We’re supporting data teams in using high quality data for all of their critical initiates and with our new unstructured text monitoring capability, to support their Generative AI efforts as well.”

Anomalo’s new unstructured text capability expands its robust platform that uses AI to automatically detect data issues and understand their root-causes before anyone else, allowing teams to resolve any hiccups with their data before making decisions, running operations or powering models.

“Finding the data quality problem is just the first step, you’ve got to solve the issue. Anomalo helps our enterprise data teams find the hard to predict data quality issues and reduce time to resolution. Anomalo’s monitoring on unstructured data capability is just another step to help our teams resolve issues on data-critical projects,” said Sid Stephens, data governance business lead for a top three quick service restaurant company.

Anomalo will be giving a talk on “Data Quality: the Greatest Challenge for Enterprise GenAI Adoption” today at 5:10 p.m. at the Data + AI Summit 2024.


Source: Anamalo

Tags:
Datanami