

(khunkornStudio/Shutterstock)
Informatica today announced CLAIRE GPT, the latest release of its AI-powered data management platform in the cloud, as well as CLAIRE Co-pilot. The company claims that, by using large language models (LLMs) and generative AI, CLAIRE GPT will enable customers to reduce the time spent on common data management tasks such as data mapping, data quality, and governance by up to 80%.
Informatica has been using AI and machine learning technology since it launched its new flagship platform CLAIRE back in 2017. The company recognized early on that data management, in and of itself, is a big data problem, and so it adopted AI and ML technologies to spot patterns across its platform and generate useful predictions.
While some legacy PowerCenter users remain stubbornly on prem, plenty of Informatica customers have followed CLAIRE into the cloud, where they not only benefit from advanced, AI-powered data management capabilities, but also help Informatica to generate them.
According to Informatica, every month CLAIRE processes 54 trillion data management transactions, representing a wide array of ETL/ELT, master data management (MDM) matching, data catalog entry, data quality rule-making, and other data governance tasks. All told, CLAIRE holds 23 petabytes of data and is home to 50,000 “metadata-aware connections,” representing every operating system, database, application, file systems, and protocol imaginable.
Now the longtime ETL leader is taking its AI/ML game to the next level with CLAIRE GPT, its next-generation data management platform. According to Informatica Chief Product Officer Jitesh Ghai, Informatica is able to leverage all that data in CLAIRE to train LLMs that can handle some common data quality and MDM tasks on behalf of users.
“Historically, AI/ML has been focused on cataloging and governance,” Ghai tells Datanami. “Now, in the cloud, all of that metadata and all of the AI and ML algorithms are expanded to support data integration workloads and make it simpler to build data pipelines, to auto identify data quality issues at petabyte scale. This was not done before. This is new. We call it DQ Insights, a part of our data observability capabilities.” DQ Insights will leverage LLM’s generative AI capabilities to generate fixes for data quality problems that it detects..
The company is also able to automatically classify data at petabyte scale, which helps it to generate data governance artifacts and write business rules for MDM tasks, which are other new capabilities. Some of these generative AI capabilities will be delivered via CLAIRE Copilot, which is part of CLAIRE GPT.
“What we’re doing now is enabling folks to look at, select the sources they want to master and point them to our master-data model and we will auto generate that business logic,” Ghai says. “What you would have to drag and drop as a data engineer, we will auto generate, because we know the schemas at the sources, and we know the target model schema. We can put together the business logic.”
The result is to “radically simplify master data management,” Ghai says. Instead of an MDM project that takes 12 to 16 months from ragged start to master data glory, having CLAIRE GPT learning from Informatica’s massive repository of historical MDM data and then using GPT-3.5 (and other LLMs) to generate suggestions cuts the project time to just weeks. “
For example, one of Informatica’s customers (a car maker) previously employed 10 data engineers for more than two years to develop 200 classifications of proprietary data types within their data lake, Ghai says.
“We pointed our auto classification against their data lake and within minutes we generated 400 classifications,” he says. “So the 200 that they had identified, [plus] another 200 different [ones]. What would have taken their 10 data engineers another two years to develop, we just automatically did it.”
CLAIRE GPT will also provide a new way for users to interact with Informatica’s suite of tools. For example, Ghai says a customer could give CLAIRE GPT the following order: “CLAIRE, connect to Salesforce. Aggregate customer account data on a monthly basis. Address data quality inconsistencies with date format. Load into Snowflake.”
While it’s unclear if CLAIRE GPT will feature speech recognition or speech-to-text capabilities, that would seem to be just an implementation detail, as those challenges are not as great as the core data management challenges that Informatica is tackling.
“I think it’s a pretty transformative leap because…it makes data engineers, data analysts more productive,” Ghai says. “But it opens up prompt-based data management experiences to many more personas that have radically less technical skill sets….Anybody could write that prompt that I just described. And that’s the exciting part.”
CLAIRE GPT and CLAIRE Co-Pilot, which will ship in Q3 or Q4 of this year, will also find use automating other repetitive tasks in the data management game, such as debugging, testing, refactoring, and documentation, Informatica says. The goal is to position them as subject matter expert stand-ins, or something similar to pairs programming, Ghai says.
“Pairs programming has its benefits with two people supporting each other and coding,” he says. “Data management and development equally can benefit from an AI assistant, and Claire Copilot is that AI assistant delivering automation, insights and benefits for data integration, for data quality, for master data management, for cataloging for governance, as well as to democratize data through the marketplace to our data marketplace.”
When looking at the screen, CLAIRE users will see a lightning bolt next to the insights and recommendations, Ghai says. “If we identify data quality issues, we will surface those up as issues we’ve identified for a user, to then validate that yes, it is an issue,” he says. The user can then select CLAIRE GPT’s fix, if it looks good. This “human in the loop” approach helps to minimize possible errors from LLM hallucinations, Ghai says.
Informatica is using OpenAI’s GPT-3.5 to generate responses, but it’s not the only LLM, nor the only model at work. In addition to a host of traditional classification and clustering algorithms, Informatica is also working with Google’s Bard and Facebook’s LLaMA for some language tasks, Ghai says.
“We have what we think of as a system of models, a network of models, and the path you go down depends on the data management operation,” he says. “It depends on the instruction, depends on whether it’s ingestion or ETL or data quality or classification.”
The company is also using models developed specifically for certain industries, such as financial services or healthcare. “And then we have local tenanted models that are for individual customers bespoke to their operations,” Ghai says. “That’s magic of interpreting the instruction and then routing it through our network of models depending on the understanding of what is being asked and then what data management operations need to be conducted.”
Related Items:
Has GPT-4 Ignited the Fuse of Artificial General Intelligence?
Informatica Raises $840 Million in NYSE IPO
Informatica Likes Its Chances in the Cloud
August 28, 2025
- EDB Outlines Future of Lakehouse and Strategies for Intelligent Applications at Supermicro Open Storage Summit
- Cerebras and Core42 Launch Global Access to OpenAI’s gpt-oss-120B
August 27, 2025
- Acceldata Announces General Availability of Agentic Data Management
- Data Streaming Summit 2025 Returns to San Francisco with 30-Plus Sessions Across Four Tracks
- CDAO Fall 2025 Opens Registration for Boston Conference
- Ataccama Data Trust Assessment Reveals Data Quality Gaps Blocking AI and Compliance
- Apache Software Foundation Expands Tools, Governance, and Community in FY2025
- Coalesce Launches JOIN Community Discussions on Data Strategy and AI
- Alluxio Reports Q2 Growth as Enterprise AI 3.7 Advances AI Data Performance
- OpenText and Ponemon Institute Survey of CIOs Finds Lack of Information Readiness Threatens AI Success
- Domo Announces Enhanced Cloud Integration Capabilities with BigQuery
August 26, 2025
- MariaDB Accelerates Cloud Deployments, Adds Agentic AI and Serverless Capability with Acquisition of SkySQL
- OpenLight Raises $34M Series A to Scale Next-Gen Integrated Photonics for AI Data Centers
- Domo Unveils Enhanced Cloud Integration Upgrades for Snowflake
- NVIDIA: Industry Leaders Transform Enterprise Data Centers for the AI Era with RTX PRO Servers
- Hydrolix Accelerates Growth with $80M Series C
- Ai2 Launches Asta: A New Standard for Trustworthy AI Agents in Science
- IDC: Agentic AI to Dominate IT Budget Expansion Over Next 5 Years, Exceeding 26% of Worldwide IT Spending, and $1.3T in 2029
August 25, 2025
- Rethinking Risk: The Role of Selective Retrieval in Data Lake Strategies
- Why Metadata Is the New Interface Between IT and AI
- What Are Reasoning Models and Why You Should Care
- Why OpenAI’s New Open Weight Models Are a Big Deal
- Doing More With Your Existing Kafka
- LinkedIn Introduces Northguard, Its Replacement for Kafka
- What Is MosaicML, and Why Is Databricks Buying It For $1.3B?
- Meet Vast Data CEO Renen Hallak, a 2024 BigDATAwire Person to Watch
- Beyond Words: Battle for Semantic Layer Supremacy Heats Up
- Top-Down or Bottom-Up Data Model Design: Which is Best?
- More Features…
- Mathematica Helps Crack Zodiac Killer’s Code
- BigDATAwire Exclusive Interview: DataPelago CEO on Launching the Spark Accelerator
- Solidigm Celebrates World’s Largest SSD with ‘122 Day’
- McKinsey Dishes the Goods on Latest Tech Trends
- GigaOm Rates the Object Stores
- The Top Five Data Labeling Firms According to Everest Group
- Google Pushes AI Agents Into Everyday Data Tasks
- Promethium Wants to Make Self Service Data Work at AI Scale
- Oracle Launches Exadata Service for AI, Compliance, and Location-Critical Workloads
- Databricks Now Worth $100B. Will It Reach $1T?
- More News In Brief…
- Gartner Predicts 40% of Generative AI Solutions Will Be Multimodal By 2027
- Seagate Unveils IronWolf Pro 24TB Hard Drive for SMBs and Enterprises
- LF AI & Data Foundation Hosts Vortex Project to Power High Performance Data Access for AI and Analytics
- Deloitte Survey Finds AI Use and Tech Investments Top Priorities for Private Companies in 2024
- Dell Unveils Updates to Dell AI Data Platform
- Stack Overflow’s 2025 Developer Survey Reveals Trust in AI at an All Time Low
- Redpanda Partners with Databricks to Deliver One‑Step Stream‑to‑Table Iceberg Integration for Real‑Time Lakehouses
- Computing Community Consortium Outlines Roadmap for Long-Term AI Research
- Transcend Expands ‘Do Not Train’ and Deep Deletion to Power Responsible AI at Scale for B2B AI Companies
- Anaconda Raises Over $150M in Series C Funding to Power AI for the Enterprise
- More This Just In…