The Role of Self-Service Data Preparation in Analytics Modernization
It’s no secret that data is playing an increasingly important role in not only today’s business environment but also within our society as a whole. In a May 2017 article, The Economist laid out why data has overtaken oil as the world’s most valuable resource. There is little doubt that moving forward the most successful organizations will be the ones that are best able to utilize the constantly growing diversity of data being generated.
To this end, the businesses I interact with on a regular basis – both large-scale Fortune 500 organizations & fast-growing upstarts – are extremely focused on modernizing their company’s approach to analytics in order to stay ahead of their competition. Each organization refers to this modernization effort differently. I have heard it defined as becoming data-driven, datafication or adopting big data. The consistent element across organizations is that it involves updating the following:
- Storage & processing infrastructure
- Data management & analytics applications
- People & processes for driving value from data
At a high level, there are two consistent goals across nearly every organization’s analytics modernization efforts – utilize more data and derive value faster. Businesses need to be able to incorporate more and more data into their analytics processes regardless of the origin, shape and size of the data. Then, turn that expanding variety of data into something that is valuable for their business faster. Talk to any organization, this is no easy task.
These two goals – more data, faster – are exactly what this relatively new breed self-service data preparation solutions provide. It’s been widely documented that data preparation is the biggest bottleneck in any analytics project – taking up over 80% of the end-to-end process. Data preparation solutions are targeting the area of the analytics process that has the most room to improve.
This is especially true as organizations continue to expand the scope of their analytics efforts by incorporating a wider variety of new or unfamiliar data sources. Prior to doing any analysis, each data source has to be unboxed, prepared and joined together with existing data to be leveraged downstream for visualization, statistics or machine learning. Each data source expands the amount of preparation required.
Traditional approaches to data preparation were not designed to handle today’s requirements for speed and diversity. The assortment of data sources that made up a traditional ETL data pipeline 10 years ago didn’t change nearly as fast or as frequently as today’s pipelines do. The thought of having to wait a few months to add a few new attributes into an analysis is unthinkable for many modern organizations today, yet it was commonplace only a few years ago.
In order to go fast, you need to be able to experiment and fail fast. Setting up technology and processes that not only enable, but encourage rapid, constant iteration on analysis is critical to meeting the goal of more data, faster. Data preparation tools enable end users to get immediate and continuous feedback on how every transformation they craft would potentially impact the data set they are working on. Analysts are able to see results while they’re crafting their data preparation logic as opposed to only being able to see the impact of their transformations when a full job run completes.
By changing the workflow, data preparation solutions are also able to open the process up to a wider set of users. Data manipulation was once limited to technical individuals who knew how to code or members of a company’s IT team who were familiar with mapping-based ETL products. Self-service data preparation products leverage the latest techniques in machine learning, human-computer interaction and data visualization to make exploring and preparing data intuitive enough so that any data analyst can finally manage their own data wrangling.
As a growing number of businesses adopt self-service data preparation as part of modernizing their analytics processes, we’ll see the realm of what’s able to be achieved working with data expand. One inspiring example of an organization that is utilizing data preparation to resolve some of society’s most challenging problems is the Centers for Disease Control. Through their utilization of a number of new data management and analytics technologies, including self-service data preparation, their team was able to identify the influence opioid usage was having on an outbreak of HIV in Indiana and take the appropriate actions to stop onward transmission.
We’re just getting started with what’s possible. In the coming months and years, I am excited to see self-service data preparation solutions get in the hands of more users at a growing number of organizations to dramatically expand what we’re able to achieve working with data.
About the author: Wei Zheng is VP of products at Trifacta. She combines her passion for technology with experience in Enterprise Software to define and shape Trifacta’s product offerings. Having founded several startups of her own, Wei believes strongly in innovative technology that solves real world business problems. Most recently, she led product management efforts at Informatica, where she helped launch several new solutions including its Hadoop and data virtualization products.