Survey: Excel Remains Go-To Data Prep Tool
Skyrocketing data volumes and a complex mix of data types are bogging down data preparation and processing efforts, according to a recent overview of enterprise data quality.
The snapshot released by data prep vendor Paxata surprisingly found that about two-thirds of the organizations it surveyed late last year are still relying profiling tools like Excel spreadsheets to help ingest and profile data. A similar percentage of businesses that have deployed more sophisticated data quality models also continue to rely on Excel, the survey found.
The fallback to tried-and-true data prep tools like spreadsheets reflects the relatively slow uptake of data quality or self-service business tools that vendor such as Paxata are pitching. The company’s survey of nearly 300 executives and IT manager found that only 15 percent have deployed a “mature data quality model.” Forty-four percent either have no data quality blueprint or have just begun implementing one.
Further complicating matters is the mix of data types and sources: 63 percent of respondents said most data comes from internal, or “first party” sources. About the same percentage (64 percent) was classified as structured data. However, the amount of external data coming from second- and third-party sources is growing to as much as 37 percent of new data.
Hence, the Paxata survey concludes that data prep tools must “support interactivity with both structured and unstructured data.” Added Nenshad Bardoliwalla, Paxata’s co-founder and chief product officer: These tools “also must ingest and prepare large volumes of data and allow business users, as well as technical staff members, to become more fully engaged in data quality initiatives.”
As a first step toward a “mature” data quality strategy, the survey noted that organizations are confronting underlying issues through greater use of data lakes and public cloud storage. The survey found that 84 percent of those polled are using the public cloud to store at least some data while nearly one quarter said they will store up to 80 percent in data lakes over the next 12 months.
The data prep survey was conducted in November 2017, collecting responses from 290 executives and managers at companies with more than $100 million in annual revenues.
Paxata, Redwood City, Calif., has worked with partners such as Cisco Systems (NASDAQ: CSCO) to push its data prep tools into enterprises. The company is among a growing list of data preparation tool vendors that also includes Alteryx, ClearStory Data, Datawatch, Lavastorm and Trifacta. Boston-based Lavastorm was acquired by privately-held Infogix Inc. in March.