Follow Datanami:
February 2, 2022

Taming the ‘White Whale’ of Unstructured Data

Unstructured data comprises a large chunk of all business data, and according to IDC, the figure could reach 80% by 2025. Data that is not collected or stored in an organized way, specifically language data in the form of text from emails, PDFs, and other documents, is a valuable and often underutilized resource.

Companies are finding ways to tap into this potential treasure trove through AI-powered natural language processing (NLP) and natural language understanding (NLU) tools. One firm working to empower this process is, known for its AI platform for language understanding.

The company has just released a report, “Harnessing the Power of Unstructured Data with NLP and NLU.” The report, prepared by The AI Journal, features the results of an October 2021 Sapio Research survey of 116 CDOs from the U.S. and Europe and reports on how data teams are using AI to mine their unstructured data for actionable insights.

The ‘White Whale’ of the Business World’s Founder and Chief Technology Officer, Marco Varone, calls unstructured data the “white whale of the business world” in the report’s introduction, as it represents a majority of business data, yet gleaning its untapped potential is challenging due to global language differences, industry-specific jargon, and a lack of structure in the data’s compilation and storage.

A graph from the report showing various stages of NLP and NLU adoption. (Source:

The company asserts that NLP and NLU technology are the keys to this challenge, but only 8% of data teams have completed the necessary plans and projects to fully benefit from this technology. Thirty-four percent of teams have started implementing an NLP plan, and 24% are still solely in the planning stages.

One reason the report cites for this lag in adoption is a critical lack of the data skills needed to build and implement AI programs, even with in-house training or external recruitment of skilled employees. Companies without clear-cut AI plans are more likely to seek training as a primary method of coaching employees in specific areas like AI (51%), NLP (41%) and NLU (35%), and “companies that have made definitive AI plans but have not yet activated them are more likely to look for external expertise (58%) versus upskilling methods.”

A Principal Concern

Another insight is that 96% of CDOs named “delivering business impact through AI” as a principal concern, and 91% wish to gain value from their unstructured data in order to make that impact.

AI-propelled analytics tools have been specifically designed to accomplish this desire, and the report states “their ability to use the breadth of unstructured data to help organizations know themselves better and become more efficient is a revolutionary step towards becoming a data-led business.”

Deciding which NLU software options to start with was also an important consideration. Cloud-based solutions such as AWS and Google were the choice for 34% of companies, while another 34% used open-source tools like Huggingface and Open NLP. The remaining 44% used platforms, including machine learning and hybrid NL solutions, like There are pros and cons with each choice, which the report thoroughly lists and analyzes.

Benefits Extend Beyond Cost Savings

A graph that shows how companies are measuring AI adoption-related ROI. (Source:

Finally, the report considers how organizations are measuring the return on their investment in AI, not just in terms of cost savings, but also team efficiency, risk management, speed-to-value, and effects on revenue. In order to gain these benefits, recommends selecting time-tested, dependable vendors, adopting a hybrid approach between symbolic AI or machine learning, and initially focusing on a single business case when establishing an AI plan.

Overall, the report emphasizes the importance of prioritizing technology solutions for taming and gaining value from unstructured data as a way to stand out from competitors.

“To make the most of unstructured data, AI and NLP must be priorities,” said Varone. “However, historical approaches to AI and NLP no longer suffice. To succeed, you need the right approach, the right expertise, and a focus on the right data. Natural language understanding is the answer to these broad language data challenges.”

Related Items:

10 NLP Predictions for 2022

We Need to Prepare for Tomorrow’s AI Job Impacts Now

Unstructured Data Growth Wearing Holes in IT Budgets