Follow Datanami:
May 22, 2019

Data Management: Still a Major Obstacle to AI Success


Data is the lifeblood of AI. Without good data, machine learning algorithms have no way to determine a normal distribution of activities, occurrences, or events. However, only about one in five businesses have data that’s fit for AI and is being used for that purpose, according a new report from Figure Eight.

On Tuesday, Figure Eight (formerly CrowdFlower) released “The State of AI and Machine Learning,” a new report based on surveys of 300 individuals who are involved in AI and machine learning projects. The study highlighted both the progress that companies have made in AI and the obstacles that remain.

The lack of well-organized, cleansed, and annotated data is a major impediment to AI at this point in time, according to the survey, which found that organizations’ data catalogs were all over the map.

While only 2% of survey respondents said their data was “completely unusable,” the state of organizations’ data was far from ideal. The report found that data was in varying states of disarray (at least from an AI-usability point of view), with some organizations having organized data but not accessible, or accessible data but not annotated.

AI practitioners spend much of their time on data management tasks, according to Figure Eight’s new report

Only 21% of respondents indicated that their data was both ready for AI (that is, it’s organized, accessible, and annotated) and is being used for that purpose. Another 15% report their data is organized, accessible, and annotated, but it’s not being utilized, or it’s being used for other business purposes.

Data annotation – that is, labeling the data – is a key component of machine learning. Without well-annotated data, data scientists have no way to train supervised machine learning algorithms to identify specific examples of behaviors or events.

Figure Eight, which is a provider of data annotation services, also found that technical and business-oriented AI practitioners continue to spend a large amount of time manually preparing data. Nearly one-third of the respondents report spending 25% to 49% of their time on data management, cleaning, and/or labeling tasks, while another 29% report spending 50% to 74% of their time on those tasks. More than 10% spend 75% to 99% of their time on those tasks.

At least one survey respondent reported spending 100% of their time on data prep tasks. The percentage was less than 1, but the number was greater than zero.

The biggest bottleneck to succeeding with AI is a lack of technical resources and qualified people, the survey found; 24% reported that as being the top issue. That was closely followed by data management, which was cited as the number one bottleneck by 21% of respondents; a lack of data (18%); a lack of executive or management buy-in (17%); a lack of budget (11%); and a lack of technical tools (3%). The survey reported 6% of respondents had no AI initiative in place.

Text and time-series data are the most popular sources of data for machine learning, according to Figure Eight’s new report

The sweet spot for AI spending appears to be between $11,000 and $500,000, which is where a majority of survey respondents say their organization’s AI budgets lie. Nearly one-third (29%) say their AI budget is less than $10,000, while 18% are spending more than $500,000, according to the survey.

The survey identified some other interesting aspects of data science. For starters, text is the most common data source for machine learning models, as it’s used by 74% of survey-takers, followed by time-series data (59%) and still images (37%). Product or SKU information is used by 32%, while 26% of survey-takers report using video and sensor data.

Interestingly, about 49% of technical AI practitioners say they believe their organizations are behind the curve when it comes to AI, while 51% of technical AI practitioners say they are not. However, 60% of folks working the business side of AI say they’re behind, while 40% think they are not.

Figure Eight, which was acquired by Australian data annotation company Appen, is sharing a copy of its “The State of AI and Machine Learning” report on its website.

Related Items:

Why You Need to Trust Your Data

Bridging the Trust Gap in Data Analytics