Data Quality Top Obstacle to GenAI, Informatica Survey Says
A new survey of data leaders by Informatica points to data quality as the number one obstacle to implementing generative AI. The number of data management tools that companies are using, as well as the fact that a large fraction of companies are juggling more than 1,000 separate data sources, also are weighing on GenAI initiatives.
According to Informatica’s CDO Insights 2024 report, which is based on a survey of 600 data leaders at large companies around the world, 45% of companies have already implemented GenAI in some form, while another 53% plan to implement it (with 36% saying they will do so within two years). That leaves just 2% of firms saying GenAI isn’t for them–a remarkably low number for a technology that most people didn’t know existed 14 months ago.
However, having success with GenAI is not as easy as signing up for an OpenAI account and letting the GPT rip. While today’s pre-trained large language models (LLMs) are much easier to work with than natural language processing (NLP) tech of the past, having good data is still critical to making it all work, whether one is training a model from scratch, fine-tuning a pre-built model, or prompting an LLM at runtime. Bad data will torpedo a GenAI project just as effectively as it will sink any type of AI or ML project.
To that end, Informatica’s survey found that 42% of data leaders who are currently deploying GenAI or planning to (or about 588 of the 600 folks who took the survey) cited data quality as the number one concern to GenAI success. Data quality was followed by data privacy and protection, AI ethics, quantity of data for training and fine-tuning language models, and AI governance as other GenAI concerns, according to the report, which was released last week.
These data management staples are being consumed at a high rate among Informatica’s survey base. Indeed, the Redwood City, California company reports that a full 100% of survey participants say they are investing in data management capabilities to support their data strategies and priorities–an excellent sign if there ever was one.
However, there was a silver lining to that 100% figure for Informatica, which sells a suite of data management tools that span data integration and ETL, data quality, data catalog, data governance, master data management, data observability, and API and application. The company found that 58% of survey-takers were using five or more tools for their data management work. What’s more, the bulk of these data management tools were not available as cloud-hosted services among 49% of the survey-takers (Informatica, of course, sells a unified suite of data management tools under the Intelligence Data Management Cloud banner.
More data typically equals more insight and a better signal. But according to Informatica, two out of five firms say they’re dealing with 1,000 or more data sources. Nearly 80% of those surveyed say they expect the number of data sources to increase in 2024.
It’s not surprising that 39% of data leaders report improving the reliability and consistency of data for GenaI use cases as priorities in 2024. Another 39% cited having a data-driven culture and higher data literacy as goals for 2024, followed by improving governance over data and data processes with 38%.
That data management has emerged as a key enabling factor for GenAI doesn’t surprise Jitesh Ghai, Informatica’s chief product officer.
“Unsurprisingly, generative AI implementation and the data strategies needed to do so successfully continue to dominate bandwidth for most data leaders, regardless of region or vertical,” Ghai says in a press release. “While there remains a myriad of technical and organizational hurdles that these leaders must navigate, it’s clear investments in holistic, highly integrated data management capabilities are the key to unlock the vast potential of GenAI and empower enterprises to take full control of their ever-expanding data estates.”