May 23, 2019

Faulty Data is Stalling AI Projects

George Leopold

Tens of billions will be spent this year on AI development, but those efforts continue to be stymied by ratty data that has undermined model training efforts and burned through project budgets.

That’s the sobering conclusion of a vendor survey of data scientists, AI technologist and business executives that uncovered widespread problems with data quality, specifically data labeling required to train AI models. The result is that most AI projects are stalled, with little to show for early and substantial investments.

The survey released Thursday (May 23) by AI training data specialist Alegionfound that despite heavy investment in focused AI and machine learning projects (most respondents said they have four or fewer projects in development), 78 percent of those projects have slowed at some stage before deployment.

The primary reason is data quality and labeling challenges, prompting many early movers to either develop an in-house solution or outsource data labeling needed to transition machine learning projects to production.

“The nascency of enterprise AI has led more than half of the surveyed companies to label their training data internally or build their own data annotation tool,” the survey found. “Unfortunately, 8 out of 10 companies indicate that training AI/ML algorithms is more challenging than they expected, and nearly as many report problems with projects stalling.”

Hence, survey commissioner Alegion, a specialist in crowdsourcing machine learning steps like data labeling, emphasizes that 71 percent of development teams ultimately outsource those project activities.

Along with a lack of prepped data and the human resources needed to accurately label data sets, two thirds of respondents cited bias or errors in data as the biggest challenge in training their AI models.

In addition to data quality, the survey also sheds light on data quantities required to achieve confidence in AI models. On a scale of between 100,000 and more than 10 million data points, 43 percent of respondents said they require up to 1 million data items to achieve “production-level model confidence.” Meanwhile, 72 percent said model confidence would require labeled data totaling more than 100,000 items.

Coming up with that much labeled data has proven problematic, prompting 81 percent of those polled to conclude that the training AI models has turned out to be more difficult than expected.

Those unforeseen consequences are spawning a cottage industry of data labeling specialists like Alegion to help fill the gaps as AI first movers struggle to get machine learning projects off the ground. The Austin-based company pitches a platform that integrates machine intelligence to scale data labeling efforts and convent raw data into “model-ready training data.”

There appears to be growing demand for such tools as projected AI spending soars. According to market tracker IDC, global AI investments are expected to more than double through 2022 to $79.2 billion. U.S. companies currently account for two thirds of the estimated $35.8 billion in AI spending this year, mostly by the banking and retail sectors.

Recent items:

Training Data: Why Scale is Critical for Your AI Future

Developers Will Adopt Sophisticated AI Model Training Tools in 2018

Applications: Artificial Intelligence

Technologies: Frameworks

Sectors: Financial Services, Retail

Vendors: Alegion, IDC

Tags: AI models, data prep, data science, labeled data, machine learning, training data

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Faulty Data is Stalling AI Projects

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Faulty Data is Stalling AI Projects

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link