October 5, 2017

The Role of Self-Service Data Preparation in Analytics Modernization

Wei Zheng

(ra2studio/Shutterstock)

It’s no secret that data is playing an increasingly important role in not only today’s business environment but also within our society as a whole. In a May 2017 article, The Economist laid out why data has overtaken oil as the world’s most valuable resource. There is little doubt that moving forward the most successful organizations will be the ones that are best able to utilize the constantly growing diversity of data being generated.

To this end, the businesses I interact with on a regular basis – both large-scale Fortune 500 organizations & fast-growing upstarts – are extremely focused on modernizing their company’s approach to analytics in order to stay ahead of their competition. Each organization refers to this modernization effort differently. I have heard it defined as becoming data-driven, datafication or adopting big data. The consistent element across organizations is that it involves updating the following:

Storage & processing infrastructure
Data management & analytics applications
People & processes for driving value from data

At a high level, there are two consistent goals across nearly every organization’s analytics modernization efforts – utilize more data and derive value faster. Businesses need to be able to incorporate more and more data into their analytics processes regardless of the origin, shape and size of the data. Then, turn that expanding variety of data into something that is valuable for their business faster. Talk to any organization, this is no easy task.

These two goals – more data, faster – are exactly what this relatively new breed self-service data preparation solutions provide. It’s been widely documented that data preparation is the biggest bottleneck in any analytics project – taking up over 80% of the end-to-end process. Data preparation solutions are targeting the area of the analytics process that has the most room to improve.

Data prep is a bottleneck for big data (3dkombinat/Shutterstock)

This is especially true as organizations continue to expand the scope of their analytics efforts by incorporating a wider variety of new or unfamiliar data sources. Prior to doing any analysis, each data source has to be unboxed, prepared and joined together with existing data to be leveraged downstream for visualization, statistics or machine learning. Each data source expands the amount of preparation required.

Traditional approaches to data preparation were not designed to handle today’s requirements for speed and diversity. The assortment of data sources that made up a traditional ETL data pipeline 10 years ago didn’t change nearly as fast or as frequently as today’s pipelines do. The thought of having to wait a few months to add a few new attributes into an analysis is unthinkable for many modern organizations today, yet it was commonplace only a few years ago.

In order to go fast, you need to be able to experiment and fail fast. Setting up technology and processes that not only enable, but encourage rapid, constant iteration on analysis is critical to meeting the goal of more data, faster. Data preparation tools enable end users to get immediate and continuous feedback on how every transformation they craft would potentially impact the data set they are working on. Analysts are able to see results while they’re crafting their data preparation logic as opposed to only being able to see the impact of their transformations when a full job run completes.

By changing the workflow, data preparation solutions are also able to open the process up to a wider set of users. Data manipulation was once limited to technical individuals who knew how to code or members of a company’s IT team who were familiar with mapping-based ETL products. Self-service data preparation products leverage the latest techniques in machine learning, human-computer interaction and data visualization to make exploring and preparing data intuitive enough so that any data analyst can finally manage their own data wrangling.

As a growing number of businesses adopt self-service data preparation as part of modernizing their analytics processes, we’ll see the realm of what’s able to be achieved working with data expand. One inspiring example of an organization that is utilizing data preparation to resolve some of society’s most challenging problems is the Centers for Disease Control. Through their utilization of a number of new data management and analytics technologies, including self-service data preparation, their team was able to identify the influence opioid usage was having on an outbreak of HIV in Indiana and take the appropriate actions to stop onward transmission.

We’re just getting started with what’s possible. In the coming months and years, I am excited to see self-service data preparation solutions get in the hands of more users at a growing number of organizations to dramatically expand what we’re able to achieve working with data.

About the author: Wei Zheng is VP of products at Trifacta. She combines her passion for technology with experience in Enterprise Software to define and shape Trifacta’s product offerings. Having founded several startups of her own, Wei believes strongly in innovative technology that solves real world business problems. Most recently, she led product management efforts at Informatica, where she helped launch several new solutions including its Hadoop and data virtualization products.

What’s Driving the Data Prep Market? TDWI Digs In

Why Big Data Prep Is Booming

Applications: Enterprise Analytics

Technologies: Middleware

Sectors: Government, Healthcare

Vendors: Trifacta

Tags: data cleansing, data preparation, datafication

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

The Role of Self-Service Data Preparation in Analytics Modernization

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 6, 2024

May 3, 2024

May 2, 2024

May 1, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

The Role of Self-Service Data Preparation in Analytics Modernization

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 6, 2024

May 3, 2024

May 2, 2024

May 1, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link