Follow Datanami:
November 2, 2023

Data Observability ROI: 5 Key Areas to Construct a Compelling Business Case

Farnaz Erfan

(TierneyMJ/Shutterstock)

Data observability is a transformative solution that is empowering organizations with the ability to harness the full potential of their data by identifying, troubleshooting, and resolving data issues in real time. However, quantifying the return on investment (ROI) for this new technology can be challenging.

There are five key areas where data observability’s ROI can however be integrated into a compelling business case, allowing you to measure tangible benefits and make a convincing case for investment.

1. Reducing the Cost of Homegrown Solutions

Data engineering teams often invest substantial hours developing, maintaining, and validating data quality rules. The complexity of data pipelines and the need for validation from multiple sources further complicate the process, especially in cases where data is not structured neatly. Homegrown solutions lack machine learning capabilities and struggle with anomaly detection.

Measuring the Impact: To calculate ROI, consider the following cost drivers:

  • Number of engineers for development and maintenance.
  • Full-time equivalent (FTE) costs per engineer per year.

It’s essential to recognize that constructing a data observability system, much like any operational software, requires engineering (in this case, data engineering) resources, data science expertise for modeling and constructing anomaly detection, a dedicated quality assurance team, and DevOps engineers responsible for deploying the solution and ensuring its seamless operation.

Although the salary ranges for these specialized roles may vary, for the sake of simplification, we can compute an average across all team members.

(Gorodenkoff/Shutterstock)

Formula: ROI = (total # of engineers to build + # of engineers to maintain) * FTE ($)

2. Reducing Indirect Infrastructure Costs

Validating data in databases or data warehouses through queries – a common practice among many teams – can significantly increase costs, especially when these systems are billed based on usage. This includes costs related to the increased number of queries, storage of historical data quality metrics, and cloud management and hosting. Due to these cost considerations, many organizations opt-in to validate and monitor only samples, resulting in limited data quality improvements and incomplete results.

Specific data observability solutions are designed with comprehensive data quality analysis, storage, and hosting capabilities integrated within the platform. This approach eliminates the need to offload these services onto the monitored systems, effectively mitigating the associated expenses. Moreover, this approach offers scalability, enabling the detection of data quality issues across the entirety of the data rather than relying solely on samples.

Measuring the Impact: Break down these costs into:

  • % of database overage fees related to validation queries.
  • % of extra storage costs for retaining historical data quality metrics.
  • % of increased cloud hosting expenses to support data quality at scale.

Formula: ROI = (annual data warehouse costs * % overage related to data validation queries) + (annual storage costs * % overage in storing historical data quality metrics) + (annual cloud infrastructure costs * % overage in hosting data quality at scale).

In many organizations, infrastructure costs are often consolidated with a single vendor who offers comprehensive services, including data warehousing, storage, and cloud hosting. In such cases, calculating ROI involves multiplying the total infrastructure costs by a percentage (typically between 10% and 20%) to represent the increased impact of data quality monitoring. For instance, if an organization’s annual cloud data warehouse expenses cost $1 million, implementing data quality and observability could yield an indirect impact of 10%, equivalent to $100,000 annually.

3. Lowering Incident Management Cases

Incident management is typically a reactive response to data quality issues. Shifting towards proactive prevention through data observability is ideal but not always possible. When data quality issues lead to inaccuracies in downstream systems, affecting business applications or even reaching customers, business teams often get involved in identifying, investigating, and resolving problems. This impacts the broader organization and should be factored in the ROI analysis of data observability solutions.

Measuring the Impact: Data teams often categorize incident management based on severity levels. For instance, one company classifies its data incidents as follows:

  • Small issues:

○ Quantity: 0-1 per sprint

○ Time to resolve: 2-3 days

○ Number of people involved: 1

  • Medium incidents:

○ Quantity: 3-4/quarter

○ Time to resolve: 3-4 days

○ Number of people to resolve: 2

  • Critical incidents:

○ Quantity: 1-2/year

○ Time to resolve: 5-10 days

○ Number of people to resolve: 10 people

To simplify, you can group incidents together and calculate the averages across all cost drivers.

  • Average # of incidents per year.
  • Average time to resolve incidents in hours.
  • Average hourly cost to correctly detect and remediate these issues.

Formula: ROI = (Average # of incidents per year) * (Average time to detect and resolve the

incident in hours) * (Average hourly cost)

4. Creating Trusted Data for Better Decision-Making

While the initial three ROI gains primarily have focused on cost savings, the last two delve into the potential revenue increases from implementing data observability.

Determining how much revenue improvement is directly attributed to data observability can be complex. For instance, if data observability enhances customer data quality and leads to improved retention, it’s not solely due to observability; other factors like the competence of your staff or the recent product enhancements may have come into play.

To calculate ROI, define the problem scope and gauge data observability’s potential impact on improvements.

Measuring the Impact: Define the problem statement, the problem’s baseline value, and the improvement portion that can be attributed to data observability. Let’s break it down with an example.

  • Problem Statement: “Inaccurate data hinders our [business objective, e.g., customer retention].”
  • Baseline Value: “Inaccurate data results in annual costs of $X for the organization.”
  • Addressable Scope: “We anticipate improving this by Y%, recognizing that some revenue loss is inherent to our business due to factors beyond data quality.”
  • Expected Improvement from Data Observability: “We expect a Z% of the improvement can be attributed to a data observability solution.”

Formula: ROI = Baseline value ($X) * Addressable Scope (Y%) * Expected Improvement (Z%)

It’s important to note that while data observability contributes to this improvement, it’s just one element among several factors affecting data quality. Other factors include enhancing team skills, refining processes, conducting thorough research, and integrating complementary tools alongside data observability.

5. Accelerating Time to Value of Data Products

Data products are gaining popularity, but their success relies on high-quality data. Data observability promises a systematic way of detecting and identifying data issues in a timely matter. This approach not only accelerates time-to-market for data products but also establishes real-time analysis and remediation processes to ensure the reliability of these products when accessed by consumers.

Measuring the Impact: To calculate the impact on data products, evaluating the time-to-market delay resulting from data quality and consistency issues is essential. Some data observability tools offer a low-code, no-code interface that fosters collaboration between business and technical users. This accelerates the development and testing of data quality, helping you reach revenue goals more rapidly. These tools use machine learning (ML) to assess data quality and identify outliers and anomalies, streamlining a process that would otherwise be time-consuming and reliant on guesswork.

Furthermore, these observability platforms harness historical data trends to detect unexpected data issues in real-time. This real-time monitoring capability empowers product and engineering teams to ensure the continuous health and reliability of data products, contributing to revenue growth.

Formula: ROI = Annual revenue of data products per year * time to market delay due to bad data

Closing Thoughts

These are just five areas where data observability yields substantial business benefits. While not all cases may apply to every organization, each plays a vital role in realizing the potential value of data observability. When developing your business case, review this framework with your executive teams and consider all cost drivers and revenue-generating opportunities. Document and break down the total ROI into a clear timeline for implementation. Data observability isn’t just an expenditure; it is an investment. It reduces the time and resources spent on troubleshooting and correcting data issues, lowers infrastructure costs, accelerates data products, and ultimately helps you grow revenue.

About the author: Farnaz Erfan is the founding head of growth at Telmai, a provider of observability tools. Farnaz is a product and go-to-market leader with over 20 years of experience in data and analytics. She has spent her career driving product and growth strategies in startups and enterprises such as Telmai, Marqeta, Paxata, Birst, Pentaho, and IBM. Farnaz holds a bachelor of science in computer science from Purdue University and spent the first part of her career as a software engineer building data products.

Related Items:

There Are Four Types of Data Observability. Which One is Right for You?

Observability Primed for a Breakout 2023: Prediction

VCs Open Up the Checkbook for Observability Startups

 

Datanami