Informatica Claims 80% Speedup for Data Management Tasks with LLMs
Informatica today announced CLAIRE GPT, the latest release of its AI-powered data management platform in the cloud, as well as CLAIRE Co-pilot. The company claims that, by using large language models (LLMs) and generative AI, CLAIRE GPT will enable customers to reduce the time spent on common data management tasks such as data mapping, data quality, and governance by up to 80%.
Informatica has been using AI and machine learning technology since it launched its new flagship platform CLAIRE back in 2017. The company recognized early on that data management, in and of itself, is a big data problem, and so it adopted AI and ML technologies to spot patterns across its platform and generate useful predictions.
While some legacy PowerCenter users remain stubbornly on prem, plenty of Informatica customers have followed CLAIRE into the cloud, where they not only benefit from advanced, AI-powered data management capabilities, but also help Informatica to generate them.
According to Informatica, every month CLAIRE processes 54 trillion data management transactions, representing a wide array of ETL/ELT, master data management (MDM) matching, data catalog entry, data quality rule-making, and other data governance tasks. All told, CLAIRE holds 23 petabytes of data and is home to 50,000 “metadata-aware connections,” representing every operating system, database, application, file systems, and protocol imaginable.
Now the longtime ETL leader is taking its AI/ML game to the next level with CLAIRE GPT, its next-generation data management platform. According to Informatica Chief Product Officer Jitesh Ghai, Informatica is able to leverage all that data in CLAIRE to train LLMs that can handle some common data quality and MDM tasks on behalf of users.
“Historically, AI/ML has been focused on cataloging and governance,” Ghai tells Datanami. “Now, in the cloud, all of that metadata and all of the AI and ML algorithms are expanded to support data integration workloads and make it simpler to build data pipelines, to auto identify data quality issues at petabyte scale. This was not done before. This is new. We call it DQ Insights, a part of our data observability capabilities.” DQ Insights will leverage LLM’s generative AI capabilities to generate fixes for data quality problems that it detects..
The company is also able to automatically classify data at petabyte scale, which helps it to generate data governance artifacts and write business rules for MDM tasks, which are other new capabilities. Some of these generative AI capabilities will be delivered via CLAIRE Copilot, which is part of CLAIRE GPT.
“What we’re doing now is enabling folks to look at, select the sources they want to master and point them to our master-data model and we will auto generate that business logic,” Ghai says. “What you would have to drag and drop as a data engineer, we will auto generate, because we know the schemas at the sources, and we know the target model schema. We can put together the business logic.”
The result is to “radically simplify master data management,” Ghai says. Instead of an MDM project that takes 12 to 16 months from ragged start to master data glory, having CLAIRE GPT learning from Informatica’s massive repository of historical MDM data and then using GPT-3.5 (and other LLMs) to generate suggestions cuts the project time to just weeks. “
For example, one of Informatica’s customers (a car maker) previously employed 10 data engineers for more than two years to develop 200 classifications of proprietary data types within their data lake, Ghai says.
“We pointed our auto classification against their data lake and within minutes we generated 400 classifications,” he says. “So the 200 that they had identified, [plus] another 200 different [ones]. What would have taken their 10 data engineers another two years to develop, we just automatically did it.”
CLAIRE GPT will also provide a new way for users to interact with Informatica’s suite of tools. For example, Ghai says a customer could give CLAIRE GPT the following order: “CLAIRE, connect to Salesforce. Aggregate customer account data on a monthly basis. Address data quality inconsistencies with date format. Load into Snowflake.”
While it’s unclear if CLAIRE GPT will feature speech recognition or speech-to-text capabilities, that would seem to be just an implementation detail, as those challenges are not as great as the core data management challenges that Informatica is tackling.
“I think it’s a pretty transformative leap because…it makes data engineers, data analysts more productive,” Ghai says. “But it opens up prompt-based data management experiences to many more personas that have radically less technical skill sets….Anybody could write that prompt that I just described. And that’s the exciting part.”
CLAIRE GPT and CLAIRE Co-Pilot, which will ship in Q3 or Q4 of this year, will also find use automating other repetitive tasks in the data management game, such as debugging, testing, refactoring, and documentation, Informatica says. The goal is to position them as subject matter expert stand-ins, or something similar to pairs programming, Ghai says.
“Pairs programming has its benefits with two people supporting each other and coding,” he says. “Data management and development equally can benefit from an AI assistant, and Claire Copilot is that AI assistant delivering automation, insights and benefits for data integration, for data quality, for master data management, for cataloging for governance, as well as to democratize data through the marketplace to our data marketplace.”
When looking at the screen, CLAIRE users will see a lightning bolt next to the insights and recommendations, Ghai says. “If we identify data quality issues, we will surface those up as issues we’ve identified for a user, to then validate that yes, it is an issue,” he says. The user can then select CLAIRE GPT’s fix, if it looks good. This “human in the loop” approach helps to minimize possible errors from LLM hallucinations, Ghai says.
Informatica is using OpenAI’s GPT-3.5 to generate responses, but it’s not the only LLM, nor the only model at work. In addition to a host of traditional classification and clustering algorithms, Informatica is also working with Google’s Bard and Facebook’s LLaMA for some language tasks, Ghai says.
“We have what we think of as a system of models, a network of models, and the path you go down depends on the data management operation,” he says. “It depends on the instruction, depends on whether it’s ingestion or ETL or data quality or classification.”
The company is also using models developed specifically for certain industries, such as financial services or healthcare. “And then we have local tenanted models that are for individual customers bespoke to their operations,” Ghai says. “That’s magic of interpreting the instruction and then routing it through our network of models depending on the understanding of what is being asked and then what data management operations need to be conducted.”
Has GPT-4 Ignited the Fuse of Artificial General Intelligence?
Informatica Raises $840 Million in NYSE IPO
Informatica Likes Its Chances in the Cloud
June 2, 2023
- Esri Announces Winners of the 2023 ArcGIS Online Competition
- Accenture Acquires Nextira, Expanding Engineering Capabilities in AI & ML
- ReproCell, HNCDI, and IBM Introduce Pharmacology-AI to Optimize Drug Response Analysis
- BigID Revolutionizes Auto-Classification with Classifier Tuning
June 1, 2023
- Databricks Releases Keynote Lineup and Generation AI Programming for 2023 Data + AI Summit
- New Relic Launches Amazon Security Lake Integration
- Latest Couchbase Capella Release Features New Developer Platform Integrations and Greater Enterprise Features
- Anyscale Launches Aviary: Open Source Infrastructure to Simplify LLM Deployment
- Census Announces GitLink to Bring Software Engineering Best Practices to Data Activation Workflows
- GridGain Releases Conference Schedule for Virtual Apache Ignite Summit 2023
- Automation Anywhere and AWS Bring the Power of Generative AI to Mission Critical Mainstream Enterprise Processes
- Domino Reveals Breakthrough Innovations for Swift and Cost-effective Enterprise AI Deployment
- Acceldata to Illuminate Cloud-Based Management Solutions at Enterprise Data Summit
May 31, 2023
- AWS Announces General Availability of Amazon Security Lake
- Cloudera and Clalit Unite to Enhance Israeli Healthcare with Advanced Data Analytics
- SAS’s Intelligent Decisioning Earns Top Spot in Forrester’s AI Decisioning Platforms Evaluation
- MariaDB Ushers in New Era with Paul O’Brien as CEO, Unveils Ambitious Growth Plan
- Precisely Advances Leading Data Quality Portfolio, Providing Unparalleled Support to Customers on their Journey to Data Integrity
- Lightmatter Raises $154M to Deliver Photonic Products to Customers
- Aporia Partners with Databricks to Empower Organizations to Monitor ML Models in Real Time
Most Read Features
- Tableau Jumps Into Generative AI with Tableau GPT
- Data Mesh Vs. Data Fabric: Understanding the Differences
- Vector Databases Emerge to Fill Critical Role in AI
- Which BI and Analytics Vendors Are Incorporating ChatGPT, and How
- Google Claims Its TPU v4 Outperforms Nvidia A100
- LLMs Are the Dinosaur-Killing Meteor for Old BI, ThoughtSpot CEO Says
- The Semantic Layer Architecture: Where Business Intelligence is Truly Heading
- Open Source Provides Path to Real-Time Stream Processing
- Hallucinations, Plagiarism, and ChatGPT
- Beyond the Moat: Powerful Open-Source AI Models Just There for the Taking
- More Features…
Most Read News In Brief
- Microsoft Unifies Data Management, Analytics, and ML Into ‘Fabric’
- Mathematica Helps Crack Zodiac Killer’s Code
- Nine Things I Learned at Tableau Conference 2023
- Informatica Claims 80% Speedup for Data Management Tasks with LLMs
- Big Data Career Notes: May 2023 Edition
- AI Chatbots: A Hedge Against Inflation?
- IBM Embraces Iceberg, Presto in New Watsonx Data Lakehouse
- We’re Still in the ‘Wild West’ When it Comes to Data Governance, StreamSets Says
- Report: 80% of Global Workers Experience Information Overload
- Databricks Enhances Lakehouse Governance with Okera Acquisition and Immuta Investment
- More News In Brief…
Most Read This Just In
- DataStax and ThirdAI Announce Partnership to Democratize Access to Advanced AI Tech
- Pega Announces Pega GenAI to Infuse Generative AI Capabilities in Pega Infinity ’23
- Sumo Logic Names Joe Kim as President and CEO
- Google Cloud’s Generative AI Revolutionizing Workplace Applications: Major Enterprise Partnerships Announced
- ServiceNow and Hugging Face Release StarCoder LLM for Code Generation
- Red Hat OpenShift AI Accelerates Generative AI Adoption Across the Hybrid Cloud
- MariaDB Unveils Distributed SQL Vision at OpenWorks 2023, Boosting Scalability for MySQL and PostgreSQL Communities
- Francisco Partners Completes Acquisition of Sumo Logic
- Informatica Announces Expanded Industry Focus and Zero Cost Data Pipelines and Transformations with AWS
- Google Cloud Unveils A3 GPU Supercomputer: Next-Gen Power for Advanced AI Models
- More This Just In…
Sponsored Partner Content
Inside the ROI of Informatica iPaaS
Wakefield Survey: Monte Carlo’s 2023 State of Data Quality Survey
Achieving reliable data is a marathon not a sprint—get O’Reillys Data Quality Fundamentals
Get your single source of Snowflake data access truth, for free
40+ financial datasets, pre-integrated in Apperate.
Informatica Ranks as the #1 Data Engineering Vendor
IEEE Conference on Artificial Intelligence 2023June 5 @ 8:00 am - June 6 @ 5:00 pmSanta Clara CA United States
Enterprise Data SummitJune 7
CDAO Insurance 2023June 13 - June 14
ODSC Europe 2023June 14 - June 15London United Kingdom