Informatica today announced CLAIRE GPT, the latest release of its AI-powered data management platform in the cloud, as well as CLAIRE Co-pilot. The company claims that, by using large language models (LLMs) and generative AI, CLAIRE GPT will enable customers to reduce the time spent on common data management tasks such as data mapping, data quality, and governance by up to 80%.
Informatica has been using AI and machine learning technology since it launched its new flagship platform CLAIRE back in 2017. The company recognized early on that data management, in and of itself, is a big data problem, and so it adopted AI and ML technologies to spot patterns across its platform and generate useful predictions.
While some legacy PowerCenter users remain stubbornly on prem, plenty of Informatica customers have followed CLAIRE into the cloud, where they not only benefit from advanced, AI-powered data management capabilities, but also help Informatica to generate them.
According to Informatica, every month CLAIRE processes 54 trillion data management transactions, representing a wide array of ETL/ELT, master data management (MDM) matching, data catalog entry, data quality rule-making, and other data governance tasks. All told, CLAIRE holds 23 petabytes of data and is home to 50,000 “metadata-aware connections,” representing every operating system, database, application, file systems, and protocol imaginable.
Now the longtime ETL leader is taking its AI/ML game to the next level with CLAIRE GPT, its next-generation data management platform. According to Informatica Chief Product Officer Jitesh Ghai, Informatica is able to leverage all that data in CLAIRE to train LLMs that can handle some common data quality and MDM tasks on behalf of users.
“Historically, AI/ML has been focused on cataloging and governance,” Ghai tells Datanami. “Now, in the cloud, all of that metadata and all of the AI and ML algorithms are expanded to support data integration workloads and make it simpler to build data pipelines, to auto identify data quality issues at petabyte scale. This was not done before. This is new. We call it DQ Insights, a part of our data observability capabilities.” DQ Insights will leverage LLM’s generative AI capabilities to generate fixes for data quality problems that it detects..
The company is also able to automatically classify data at petabyte scale, which helps it to generate data governance artifacts and write business rules for MDM tasks, which are other new capabilities. Some of these generative AI capabilities will be delivered via CLAIRE Copilot, which is part of CLAIRE GPT.
“What we’re doing now is enabling folks to look at, select the sources they want to master and point them to our master-data model and we will auto generate that business logic,” Ghai says. “What you would have to drag and drop as a data engineer, we will auto generate, because we know the schemas at the sources, and we know the target model schema. We can put together the business logic.”
The result is to “radically simplify master data management,” Ghai says. Instead of an MDM project that takes 12 to 16 months from ragged start to master data glory, having CLAIRE GPT learning from Informatica’s massive repository of historical MDM data and then using GPT-3.5 (and other LLMs) to generate suggestions cuts the project time to just weeks. “
For example, one of Informatica’s customers (a car maker) previously employed 10 data engineers for more than two years to develop 200 classifications of proprietary data types within their data lake, Ghai says.
“We pointed our auto classification against their data lake and within minutes we generated 400 classifications,” he says. “So the 200 that they had identified, [plus] another 200 different [ones]. What would have taken their 10 data engineers another two years to develop, we just automatically did it.”
CLAIRE GPT will also provide a new way for users to interact with Informatica’s suite of tools. For example, Ghai says a customer could give CLAIRE GPT the following order: “CLAIRE, connect to Salesforce. Aggregate customer account data on a monthly basis. Address data quality inconsistencies with date format. Load into Snowflake.”
While it’s unclear if CLAIRE GPT will feature speech recognition or speech-to-text capabilities, that would seem to be just an implementation detail, as those challenges are not as great as the core data management challenges that Informatica is tackling.
“I think it’s a pretty transformative leap because…it makes data engineers, data analysts more productive,” Ghai says. “But it opens up prompt-based data management experiences to many more personas that have radically less technical skill sets….Anybody could write that prompt that I just described. And that’s the exciting part.”
CLAIRE GPT and CLAIRE Co-Pilot, which will ship in Q3 or Q4 of this year, will also find use automating other repetitive tasks in the data management game, such as debugging, testing, refactoring, and documentation, Informatica says. The goal is to position them as subject matter expert stand-ins, or something similar to pairs programming, Ghai says.
“Pairs programming has its benefits with two people supporting each other and coding,” he says. “Data management and development equally can benefit from an AI assistant, and Claire Copilot is that AI assistant delivering automation, insights and benefits for data integration, for data quality, for master data management, for cataloging for governance, as well as to democratize data through the marketplace to our data marketplace.”
When looking at the screen, CLAIRE users will see a lightning bolt next to the insights and recommendations, Ghai says. “If we identify data quality issues, we will surface those up as issues we’ve identified for a user, to then validate that yes, it is an issue,” he says. The user can then select CLAIRE GPT’s fix, if it looks good. This “human in the loop” approach helps to minimize possible errors from LLM hallucinations, Ghai says.
Informatica is using OpenAI’s GPT-3.5 to generate responses, but it’s not the only LLM, nor the only model at work. In addition to a host of traditional classification and clustering algorithms, Informatica is also working with Google’s Bard and Facebook’s LLaMA for some language tasks, Ghai says.
“We have what we think of as a system of models, a network of models, and the path you go down depends on the data management operation,” he says. “It depends on the instruction, depends on whether it’s ingestion or ETL or data quality or classification.”
The company is also using models developed specifically for certain industries, such as financial services or healthcare. “And then we have local tenanted models that are for individual customers bespoke to their operations,” Ghai says. “That’s magic of interpreting the instruction and then routing it through our network of models depending on the understanding of what is being asked and then what data management operations need to be conducted.”
Related Items:
Has GPT-4 Ignited the Fuse of Artificial General Intelligence?
Informatica Raises $840 Million in NYSE IPO
Informatica Likes Its Chances in the Cloud
April 19, 2024
- Carahsoft to Showcase Cutting-Edge Solutions with 70+ Partners at GEOINT 2024
- BrainChip Highlights the 2nd Generation Akida at tinyML Summit 2024
- MathCo Named Microsoft Solutions Partner for Data and AI
- Salesforce Survey: Data Will Make or Break Workers’ Trust in AI
- Weights & Biases Announces Expanded Integration with NVIDIA NIM
- Dataminr Introduces ReGenAI to Enhance Real-Time Event Monitoring
- Cisco Reimagines Security for Data Centers and Clouds in Era of AI
- Gurucul Enhances Federated Search Capabilities Across Multiple Data Sources
- SAS-Sponsored Study Highlights Talent Shortages and Strategic Gaps in GenAI Adoption
- Redgate Launches Enterprise Edition of Redgate Monitor for Large-Scale Databases
April 18, 2024
- SAS Viya Expands Generative AI Capabilities with New Data Maker and Industry-Specific Assistants
- Moveworks Partners with Microsoft to Deliver Secure, Scalable Generative AI Solutions to Customers
- Rockset Announces 2024 Index Conference, Industry Event for Engineers Building Search, Analytics, and AI Applications at Scale
- SAS Advances Industry Solutions with Packaged AI Models
- Altair Acquires Cambridge Semantics, Powering Next-Gen Enterprise Data Fabrics and GenAI
- SAS Adds to Its Trustworthy AI Offerings with Model Cards and AI Governance Services
- Fujitsu and Oracle Collaborate to Deliver Sovereign Cloud and AI Capabilities in Japan
- Kore.ai Introduces Experience Optimization Platform V11.0, Accelerating AI Deployment
- Volumez Expands Collaboration with AWS, Joins ISV Accelerate Program
- AI Squared Raises $13.8M to Accelerate Widespread AI Adoption Within Organizations
Most Read Features
Sorry. No data so far.
Most Read News In Brief
Sorry. No data so far.
Most Read This Just In
Sorry. No data so far.
Sponsored Partner Content
-
Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!
-
Supercharge Your Data Lake with Spark 3.3
-
Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]
-
Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]
-
Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023
-
The Art of Mastering Data Quality for AI and Analytics
Sponsored Whitepapers
Contributors
Featured Events
-
Call & Contact Center Expo
April 24 - April 25Las Vegas NV United States -
AI & Big Data Expo North America 2024
June 5 - June 6Santa Clara CA United States -
AI Hardware & Edge AI Summit 2024
September 10 - September 12San Jose CA United States -
CDAO Government 2024
September 18 - September 19Washington DC United States