Follow Datanami:
July 11, 2016

Can self-service data preparation change the game for big data and analytics?


Prakash Nanduri, CEO and Co-founder of Paxata

“Since we pioneered the self-service data preparation space, our singular mission is to eliminate that very challenge, and truly unlock the value of BI investments. This important research indicates that the strong demand we have seen in the last three years for our enterprise-grade self-service data preparation platform is just the tip of the iceberg.” – Prakash Nanduri, CEO and Co-founder of Paxata

Prakash Nanduri, CEO and Co-founder of Paxata, shared his perspective on the findings: “It is not surprising to me that more than a third (37%) of the people in the study indicated dissatisfaction with their ability to find relevant data and understand how to use it appropriately for BI and analytics. While companies are drowning in data, business teams are thirsting for information they can actually use.”

According to respondents, inefficiency when working with data is a major deterrent in meeting their business insight goals. The study showed:

  • Manual data preparation burdens analyst resources. The majority of research respondents said that 61-80% of analysts’ time in their organizations is spent on manual data preparation processes. This has an impact on associated headcount costs by reducing total capacity for analytic projects.
  • Reliance on IT creates a drag on business responsiveness. Nearly 50% of analysts rely on IT for the first step of data preparation tasks. The largest percentage said that IT takes two to six days; 18% said it takes one to two weeks; and the same percentage said it takes three to four weeks to fulfill a data preparation request. Weeks lost due to data preparation request backlogs means insights can come too late for organizations to achieve data driven status.
  • Too much time spend on data preparation re-work. Data preparation is primarily ad hoc, with most research participants reporting that they do ad hoc data preparation every single time, while only 4% said data preparation is entirely productionalized. Repetition of menial tasks taps analyst resources and holding onto tribal knowledge keeps understanding of the data from other analysts. Ultimately, massive value leaks occur as only small portions of data can be explored and analyzed or inconsistencies in data prep muddy insight.
  • Data quality continues to plague organizations. To make matters worse, 86% of the respondents were not fully satisfied with the quality of their data, and 94% of research participants are not very satisfied with processes for addressing data duplication.

The survey also revealed the growing need to adopt a modern data infrastructure, as well as a connected information layer, to replace aging ETL systems. The report showed:

  • Self-service data preparation transforms data integration strategies. While 66% respond they are very reliant or reliant on their existing ETL systems, this number is expected to decrease as more companies adopt self-service data preparation.
  • Data preparation lets businesses see beyond their four walls. One third (33%) of research participants are either somewhat dissatisfied or not satisfied with their organization’s ability to integrate non-corporate data with corporate data for use in BI and analytics projects. From Paxata’s own research, 60% of data comes from outside of corporate systems. According to IT, 85% of the data being used is company generated and stored in their systems. Conversely business users indicate a large percentage of the data being used is not from corporate systems – such as personal, public and premium data from 3rd parties in order to enrich and add context – but not stored in corporate systems as it changes so often or is used once and never again.


To get full access to the report, visit