Data Quality Study Reveals Business Impacts of Bad Data
If your data warehouse is starting to look like Miss Havisham’s decaying mansion, you may have a data quality problem.
A new survey of 500 data professionals from open source data quality tool Great Expectations revealed that 77% have data quality issues, and 91% report they are impacting their company’s performance. Only 11% did not report having problems related to data quality.
“Poor data quality and pipeline debt create organizational friction between stakeholders, with consequences like degraded confidence,” said Abe Gong, CEO and co-founder of Superconductive, the company that makes Great Expectations. “This survey made it clear that data quality issues are prevalent and they’re harming business outcomes.”
As a component of data governance and management, data quality is a measure of a dataset’s overall integrity with the goal of consistency. In a recent blog post from Great Expectations, author Sam Bail lists six dimensions of data quality:
- Accuracy: Does the data accurately reflect reality?
- Completeness: Is all the data that’s required for the use case available?
- Uniqueness: Is the data free from unwanted duplicates?
- Consistency: Is the data free from conflicting information?
- Timeliness: Is the data sufficiently recent for the required use case?
- Validity: Does the data adhere to an expected format?
Data that does not satisfy these attributes can cause issues for organizations including making it “difficult or impossible to see a ‘single view’ of an end-user or service, lower productivity, obscure reliable performance metrics, and overwhelm development teams and budgets with data migration tasks,” the company said.
According to the survey, data practitioners said low quality data led to a lack of documentation (31%), lack of tooling (27%), and lack of understanding between teams (22%). Too much time is being spent on data preparation resulting in major delays for production and analytics teams.
Additionally, the study found that less than half of respondents reported having high trust in their company’s data with 13% disclosing they have low trust. These distrustful professionals blame broken apps or dashboards, poor outcomes with decision making based on unreliable data, the lack of a shared understanding of metrics, and data siloed in different places which can cause conflict and discord among different teams.
Data quality initiatives can help, and they usually begin with comprehensively assessing the current state of data. With this information, companies can define and apply certain rules, or expectations, that address data quality discrepancies and then continue to monitor systems and pipelines throughout the entire organization. Of those surveyed, 89% said company leadership supported their data quality endeavors and 52% believed their leaders had high trust in the importance of data quality. The study mentions that data quality efforts included having a data quality plan scoped and budgeted (22%), using a specific data quality tool (19%), checking data manually (14%), and building their own systems (15%).
Great Expectations bills itself as an “open-source tool for defeating pipeline debt through data testing, documentation, and profiling” with a mission “to revolutionize the speed and integrity of data collaboration.” The company raised $40 million in Series B funding this past February with plans to enhance the open source version and develop a paid Great Expectations Cloud version with a suite of collaborative orchestration tools for managing data quality.
“Data quality is critical to facilitate the making of decisions with confidence across the organization, enabling a singular understanding of what that data means and what it’s being used for. That’s why support for data quality efforts should be found at every level of an organization, from data scientists and engineers to the C-suite and board who have confidence in outcomes for decision-making,” Gong said.