

(khunkornStudio/Shutterstock)
Informatica today announced CLAIRE GPT, the latest release of its AI-powered data management platform in the cloud, as well as CLAIRE Co-pilot. The company claims that, by using large language models (LLMs) and generative AI, CLAIRE GPT will enable customers to reduce the time spent on common data management tasks such as data mapping, data quality, and governance by up to 80%.
Informatica has been using AI and machine learning technology since it launched its new flagship platform CLAIRE back in 2017. The company recognized early on that data management, in and of itself, is a big data problem, and so it adopted AI and ML technologies to spot patterns across its platform and generate useful predictions.
While some legacy PowerCenter users remain stubbornly on prem, plenty of Informatica customers have followed CLAIRE into the cloud, where they not only benefit from advanced, AI-powered data management capabilities, but also help Informatica to generate them.
According to Informatica, every month CLAIRE processes 54 trillion data management transactions, representing a wide array of ETL/ELT, master data management (MDM) matching, data catalog entry, data quality rule-making, and other data governance tasks. All told, CLAIRE holds 23 petabytes of data and is home to 50,000 “metadata-aware connections,” representing every operating system, database, application, file systems, and protocol imaginable.
Now the longtime ETL leader is taking its AI/ML game to the next level with CLAIRE GPT, its next-generation data management platform. According to Informatica Chief Product Officer Jitesh Ghai, Informatica is able to leverage all that data in CLAIRE to train LLMs that can handle some common data quality and MDM tasks on behalf of users.
“Historically, AI/ML has been focused on cataloging and governance,” Ghai tells Datanami. “Now, in the cloud, all of that metadata and all of the AI and ML algorithms are expanded to support data integration workloads and make it simpler to build data pipelines, to auto identify data quality issues at petabyte scale. This was not done before. This is new. We call it DQ Insights, a part of our data observability capabilities.” DQ Insights will leverage LLM’s generative AI capabilities to generate fixes for data quality problems that it detects..
The company is also able to automatically classify data at petabyte scale, which helps it to generate data governance artifacts and write business rules for MDM tasks, which are other new capabilities. Some of these generative AI capabilities will be delivered via CLAIRE Copilot, which is part of CLAIRE GPT.
“What we’re doing now is enabling folks to look at, select the sources they want to master and point them to our master-data model and we will auto generate that business logic,” Ghai says. “What you would have to drag and drop as a data engineer, we will auto generate, because we know the schemas at the sources, and we know the target model schema. We can put together the business logic.”
The result is to “radically simplify master data management,” Ghai says. Instead of an MDM project that takes 12 to 16 months from ragged start to master data glory, having CLAIRE GPT learning from Informatica’s massive repository of historical MDM data and then using GPT-3.5 (and other LLMs) to generate suggestions cuts the project time to just weeks. “
For example, one of Informatica’s customers (a car maker) previously employed 10 data engineers for more than two years to develop 200 classifications of proprietary data types within their data lake, Ghai says.
“We pointed our auto classification against their data lake and within minutes we generated 400 classifications,” he says. “So the 200 that they had identified, [plus] another 200 different [ones]. What would have taken their 10 data engineers another two years to develop, we just automatically did it.”
CLAIRE GPT will also provide a new way for users to interact with Informatica’s suite of tools. For example, Ghai says a customer could give CLAIRE GPT the following order: “CLAIRE, connect to Salesforce. Aggregate customer account data on a monthly basis. Address data quality inconsistencies with date format. Load into Snowflake.”
While it’s unclear if CLAIRE GPT will feature speech recognition or speech-to-text capabilities, that would seem to be just an implementation detail, as those challenges are not as great as the core data management challenges that Informatica is tackling.
“I think it’s a pretty transformative leap because…it makes data engineers, data analysts more productive,” Ghai says. “But it opens up prompt-based data management experiences to many more personas that have radically less technical skill sets….Anybody could write that prompt that I just described. And that’s the exciting part.”
CLAIRE GPT and CLAIRE Co-Pilot, which will ship in Q3 or Q4 of this year, will also find use automating other repetitive tasks in the data management game, such as debugging, testing, refactoring, and documentation, Informatica says. The goal is to position them as subject matter expert stand-ins, or something similar to pairs programming, Ghai says.
“Pairs programming has its benefits with two people supporting each other and coding,” he says. “Data management and development equally can benefit from an AI assistant, and Claire Copilot is that AI assistant delivering automation, insights and benefits for data integration, for data quality, for master data management, for cataloging for governance, as well as to democratize data through the marketplace to our data marketplace.”
When looking at the screen, CLAIRE users will see a lightning bolt next to the insights and recommendations, Ghai says. “If we identify data quality issues, we will surface those up as issues we’ve identified for a user, to then validate that yes, it is an issue,” he says. The user can then select CLAIRE GPT’s fix, if it looks good. This “human in the loop” approach helps to minimize possible errors from LLM hallucinations, Ghai says.
Informatica is using OpenAI’s GPT-3.5 to generate responses, but it’s not the only LLM, nor the only model at work. In addition to a host of traditional classification and clustering algorithms, Informatica is also working with Google’s Bard and Facebook’s LLaMA for some language tasks, Ghai says.
“We have what we think of as a system of models, a network of models, and the path you go down depends on the data management operation,” he says. “It depends on the instruction, depends on whether it’s ingestion or ETL or data quality or classification.”
The company is also using models developed specifically for certain industries, such as financial services or healthcare. “And then we have local tenanted models that are for individual customers bespoke to their operations,” Ghai says. “That’s magic of interpreting the instruction and then routing it through our network of models depending on the understanding of what is being asked and then what data management operations need to be conducted.”
Related Items:
Has GPT-4 Ignited the Fuse of Artificial General Intelligence?
Informatica Raises $840 Million in NYSE IPO
Informatica Likes Its Chances in the Cloud
June 12, 2025
- ThoughtSpot Unveils DataSpot to Accelerate Agentic Analytics for Every Databricks Customer
- Databricks Donates Declarative Pipelines to Apache Spark Open Source Project
- OpsGuru Signs Strategic Collaboration Agreement with AWS and Expands Services to US
- Databricks Unveils Databricks One: A New Way to Bring AI to Every Corner of the Business
- MinIO Expands Partner Program to Meet AIStor Demand
June 11, 2025
- Databricks Launches Lakebase, a New Class of Operational Database for AI Apps and Agents
- New Relic Unveils Support for MCP to Enable True End-to-End Observability of AI Applications
- Virtualitics Joins Palantir’s FedStart Program to Accelerate Mission-Ready AI for Defense
- Striim Expands SQL2Fabric-X to Azure Databricks
- Databricks Launches Free Edition and Announces $100M Investment to Develop Data and AI Talent
- CData Launches Databricks Integration Accelerator to Enhance Enterprise Data Integration
- Ataccama Launches Inaugural Partner Advisory Board to Shape Platform Strategy
- Databricks Launches No-Code Lakeflow Designer for Visual Pipeline Building
- Qlik Expands Integration with the Databricks Data Intelligence Platform
- Zencoder Launches Zentester to Accelerate End-to-End Testing with AI Agents
- Indicium Launches AI Data Squads to Streamline Databricks Migrations
- Pentaho Releases Significant Updates to Pentaho Data Catalog
- SqlDBM Announces MCP Server Release and Growing Databricks Partnership
June 10, 2025
- What Are Reasoning Models and Why You Should Care
- The GDPR: An Artificial Intelligence Killer?
- Fine-Tuning LLM Performance: How Knowledge Graphs Can Help Avoid Missteps
- It’s Snowflake Vs. Databricks in Dueling Big Data Conferences
- Snowflake Widens Analytics and AI Reach at Summit 25
- Informatica Goes All-In on AI Agents for Data Management
- Top-Down or Bottom-Up Data Model Design: Which is Best?
- Why Snowflake Bought Crunchy Data
- dbt Labs Cranks the Performance Dial with New Fusion Engine
- Change to Apache Iceberg Could Streamline Queries, Open Data
- More Features…
- Mathematica Helps Crack Zodiac Killer’s Code
- It’s Official: Informatica Agrees to Be Bought by Salesforce for $8 Billion
- AI Agents To Drive Scientific Discovery Within a Year, Altman Predicts
- Solidigm Celebrates World’s Largest SSD with ‘122 Day’
- DuckLake Makes a Splash in the Lakehouse Stack – But Can It Break Through?
- The Top Five Data Labeling Firms According to Everest Group
- Who Is AI Inference Pipeline Builder Chalk?
- IBM to Buy DataStax for Database, GenAI Capabilities
- Hex Raises $70M to Power Its Ambition for a Virtuous Cycle of Data Work
- Databricks Nabs Neon to Solve AI Database Bottleneck
- More News In Brief…
- Astronomer Unveils New Capabilities in Astro to Streamline Enterprise Data Orchestration
- Yandex Releases World’s Largest Event Dataset for Advancing Recommender Systems
- Astronomer Introduces Astro Observe to Provide Unified Full-Stack Data Orchestration and Observability
- BigID Reports Majority of Enterprises Lack AI Risk Visibility in 2025
- Gartner Predicts 40% of Generative AI Solutions Will Be Multimodal By 2027
- Databricks Announces Data Intelligence Platform for Communications
- MariaDB Expands Enterprise Platform with Galera Cluster Acquisition
- Databricks Announces 2025 Data + AI Summit Keynote Lineup and Data Intelligence Programming
- FICO Announces New Strategic Collaboration Agreement with AWS
- Cisco: Agentic AI Poised to Handle 68% of Customer Service by 2028
- More This Just In…
Sponsored Partner Content
-
Mainframe data: A powerful source for AI insights
-
CData recognized in the 2024 Gartner ® Magic Quadrant™ Report
-
Introducing AIStor, the most powerful version of MinIO to date
-
Designing a Copilot for Data Transformation
-
Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!
-
Supercharge Your Data Lake with Spark 3.3