Informatica today announced CLAIRE GPT, the latest release of its AI-powered data management platform in the cloud, as well as CLAIRE Co-pilot. The company claims that, by using large language models (LLMs) and generative AI, CLAIRE GPT will enable customers to reduce the time spent on common data management tasks such as data mapping, data quality, and governance by up to 80%.
Informatica has been using AI and machine learning technology since it launched its new flagship platform CLAIRE back in 2017. The company recognized early on that data management, in and of itself, is a big data problem, and so it adopted AI and ML technologies to spot patterns across its platform and generate useful predictions.
While some legacy PowerCenter users remain stubbornly on prem, plenty of Informatica customers have followed CLAIRE into the cloud, where they not only benefit from advanced, AI-powered data management capabilities, but also help Informatica to generate them.
According to Informatica, every month CLAIRE processes 54 trillion data management transactions, representing a wide array of ETL/ELT, master data management (MDM) matching, data catalog entry, data quality rule-making, and other data governance tasks. All told, CLAIRE holds 23 petabytes of data and is home to 50,000 “metadata-aware connections,” representing every operating system, database, application, file systems, and protocol imaginable.
Now the longtime ETL leader is taking its AI/ML game to the next level with CLAIRE GPT, its next-generation data management platform. According to Informatica Chief Product Officer Jitesh Ghai, Informatica is able to leverage all that data in CLAIRE to train LLMs that can handle some common data quality and MDM tasks on behalf of users.
“Historically, AI/ML has been focused on cataloging and governance,” Ghai tells Datanami. “Now, in the cloud, all of that metadata and all of the AI and ML algorithms are expanded to support data integration workloads and make it simpler to build data pipelines, to auto identify data quality issues at petabyte scale. This was not done before. This is new. We call it DQ Insights, a part of our data observability capabilities.” DQ Insights will leverage LLM’s generative AI capabilities to generate fixes for data quality problems that it detects..
The company is also able to automatically classify data at petabyte scale, which helps it to generate data governance artifacts and write business rules for MDM tasks, which are other new capabilities. Some of these generative AI capabilities will be delivered via CLAIRE Copilot, which is part of CLAIRE GPT.
“What we’re doing now is enabling folks to look at, select the sources they want to master and point them to our master-data model and we will auto generate that business logic,” Ghai says. “What you would have to drag and drop as a data engineer, we will auto generate, because we know the schemas at the sources, and we know the target model schema. We can put together the business logic.”
The result is to “radically simplify master data management,” Ghai says. Instead of an MDM project that takes 12 to 16 months from ragged start to master data glory, having CLAIRE GPT learning from Informatica’s massive repository of historical MDM data and then using GPT-3.5 (and other LLMs) to generate suggestions cuts the project time to just weeks. “
For example, one of Informatica’s customers (a car maker) previously employed 10 data engineers for more than two years to develop 200 classifications of proprietary data types within their data lake, Ghai says.
“We pointed our auto classification against their data lake and within minutes we generated 400 classifications,” he says. “So the 200 that they had identified, [plus] another 200 different [ones]. What would have taken their 10 data engineers another two years to develop, we just automatically did it.”
CLAIRE GPT will also provide a new way for users to interact with Informatica’s suite of tools. For example, Ghai says a customer could give CLAIRE GPT the following order: “CLAIRE, connect to Salesforce. Aggregate customer account data on a monthly basis. Address data quality inconsistencies with date format. Load into Snowflake.”
While it’s unclear if CLAIRE GPT will feature speech recognition or speech-to-text capabilities, that would seem to be just an implementation detail, as those challenges are not as great as the core data management challenges that Informatica is tackling.
“I think it’s a pretty transformative leap because…it makes data engineers, data analysts more productive,” Ghai says. “But it opens up prompt-based data management experiences to many more personas that have radically less technical skill sets….Anybody could write that prompt that I just described. And that’s the exciting part.”
CLAIRE GPT and CLAIRE Co-Pilot, which will ship in Q3 or Q4 of this year, will also find use automating other repetitive tasks in the data management game, such as debugging, testing, refactoring, and documentation, Informatica says. The goal is to position them as subject matter expert stand-ins, or something similar to pairs programming, Ghai says.
“Pairs programming has its benefits with two people supporting each other and coding,” he says. “Data management and development equally can benefit from an AI assistant, and Claire Copilot is that AI assistant delivering automation, insights and benefits for data integration, for data quality, for master data management, for cataloging for governance, as well as to democratize data through the marketplace to our data marketplace.”
When looking at the screen, CLAIRE users will see a lightning bolt next to the insights and recommendations, Ghai says. “If we identify data quality issues, we will surface those up as issues we’ve identified for a user, to then validate that yes, it is an issue,” he says. The user can then select CLAIRE GPT’s fix, if it looks good. This “human in the loop” approach helps to minimize possible errors from LLM hallucinations, Ghai says.
Informatica is using OpenAI’s GPT-3.5 to generate responses, but it’s not the only LLM, nor the only model at work. In addition to a host of traditional classification and clustering algorithms, Informatica is also working with Google’s Bard and Facebook’s LLaMA for some language tasks, Ghai says.
“We have what we think of as a system of models, a network of models, and the path you go down depends on the data management operation,” he says. “It depends on the instruction, depends on whether it’s ingestion or ETL or data quality or classification.”
The company is also using models developed specifically for certain industries, such as financial services or healthcare. “And then we have local tenanted models that are for individual customers bespoke to their operations,” Ghai says. “That’s magic of interpreting the instruction and then routing it through our network of models depending on the understanding of what is being asked and then what data management operations need to be conducted.”
Related Items:
Has GPT-4 Ignited the Fuse of Artificial General Intelligence?
Informatica Raises $840 Million in NYSE IPO
Informatica Likes Its Chances in the Cloud
April 26, 2024
- Google Announces $75M AI Opportunity Fund and New Course to Skill One Million Americans
- Elastic Reports 8x Speed and 32x Efficiency Gains for Elasticsearch and Lucene Vector Database
- Gartner Identifies the Top Trends in Data and Analytics for 2024
- Satori and Collibra Accelerate AI Readiness Through Unified Data Management
- Argonne’s New AI Application Reduces Data Processing Time by 100x in X-ray Studies
April 25, 2024
- Salesforce Unveils Zero Copy Partner Network, Offering New Open Data Lake Access via Apache Iceberg
- Dataiku Enables Generative AI-Powered Chat Across the Enterprise
- IBM Transforms the Storage Ownership Experience with IBM Storage Assurance
- Cleanlab Launches New Solution to Detect AI Hallucinations in Language Models
- University of Maryland’s Smith School Launches New Center for AI in Business
- SAS Advances Public Health Research with New Analytics Tools on NIH Researcher Workbench
- NVIDIA to Acquire GPU Orchestration Software Provider Run:ai
April 24, 2024
- AtScale Introduces Developer Community Edition for Semantic Modeling
- Domopalooza 2024 Sets a High Bar for AI in Business Intelligence and Analytics
- BigID Highlights Crucial Security Measures for Generative AI in Latest Industry Report
- Moveworks Showcases the Power of Its Next-Gen Copilot at Moveworks.global 2024
- AtScale Announces Next-Gen Product Innovations to Foster Data-Driven Industry-Wide Collaboration
- New Snorkel Flow Release Empowers Enterprises to Harness Their Data for Custom AI Solutions
- Snowflake Launches Arctic: The Most Open, Enterprise-Grade Large Language Model
- Lenovo Advances Hybrid AI Innovation to Meet the Demands of the Most Compute Intensive Workloads
Most Read Features
Sorry. No data so far.
Most Read News In Brief
Sorry. No data so far.
Most Read This Just In
Sorry. No data so far.
Sponsored Partner Content
-
Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!
-
Supercharge Your Data Lake with Spark 3.3
-
Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]
-
Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]
-
Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023
-
The Art of Mastering Data Quality for AI and Analytics
Sponsored Whitepapers
Contributors
Featured Events
-
AI & Big Data Expo North America 2024
June 5 - June 6Santa Clara CA United States -
CDAO Canada Public Sector 2024
June 18 - June 19 -
AI Hardware & Edge AI Summit Europe
June 18 - June 19London United Kingdom -
AI Hardware & Edge AI Summit 2024
September 10 - September 12San Jose CA United States -
CDAO Government 2024
September 18 - September 19Washington DC United States