July 1, 2014

Survey: Variety, Not Volume, Stymies Data Scientists

George Leopold

Diverse data types, not just the volume of data, is the biggest challenge faced by data scientists, according to a new survey of big data practitioners.

One consequence, warns the survey by computational database specialist Paradigm4, is that data variety is causing frustrated scientists to “leave data on the table.” Of the 111 data scientists responding to the survey, fully 71 percent said big data made analytics more difficult. Data variety rather than volume was most often cited as the primary reason.

The survey also found that 36 percent of respondents said it takes to much time to glean insights from data sets that are too big to move to analytics software. “The increasing variety of data sources is forcing data scientists into shortcuts that leave data and money on the table,” Paradigm4 CEO Marilyn Matz, said in a statement releasing the survey findings.

“The focus on the volume of data hides the real challenge of data analytics today. Only by addressing the challenge of utilizing diverse types of data will we be able to unlock the enormous potential of analytics,” Matz argued.

The Hadoop platform also came in for some hard knocks from data scientists. The survey found that 48 percent of respondents have used Hadoop or its processing engine, Spark. Of those, 76 percent said it was too slow, required too much effort to program or had other limitations.

Nearly half of respondents complained it is becoming harder to fit their data into relational database tables. “Incorporating the diverse data types into analytical workflows is a major pain point for data scientists using traditional relational database software,” the survey warned. Among the consequences was that 39 percent of those surveyed reported more job stress.

For complex analytics, the survey found, data scientists are being forced to move large volumes of stored data to dedicated mathematical and statistical computing software. That step takes time and requires additional coding that “adds no analytical value and impedes productivity,” the survey found.

The survey echoes other recent reports about the “fragmentation” of data.

One proponent of “information compression” techniques recently argued that another part of the big data problem is the way “knowledge” is represented in computers.

For example, the researcher cited the long list of image formats such as GIF and JPEG. “This jumble of different formalisms and formats for knowledge is a great complication in the processing of big data,” argued data researcher Gerry Wolff of CognitiveResearch.org.

Despite these challenges, the survey did identify some positive trends. For example, 59 percent of respondents said their company was already using complex analytics to sift through big data. An additional 31 percent said they plan to over the next two years.

The bottom line, according to the Paradigm4 survey results, is that “the ability to effectively use diverse data sources is proving to be a competitive differentiator in many industries.”

Paradigm4’s survey of 111 data scientists was conducted by independent researcher Innovation Enterprise between March and April, 2014.

Apache Spark: 3 Real-World Use Cases

Applications: Enterprise Analytics

Technologies: Frameworks

Sectors: Financial Services, Retail

Tags: big data, data science, data variety

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Survey: Variety, Not Volume, Stymies Data Scientists

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Survey: Variety, Not Volume, Stymies Data Scientists

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 25, 2024

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link