Follow Datanami:
June 26, 2023

In Search of Trustworthy Data Products


A great way to streamline access to data is through the creation of data products. Instead of just unleashing raw data upon data scientists, data products provide a more refined and governed approach. But without a certain level of build-in quality, data users won’t trust the data product.

The concept of a data product can be traced back to 2012, when DJ Patil, then the Chief Data Scientist for the United States, published his book “Data Jujitsu: The Art of Turning Data into Product.” According to Patil, a data product is something “that facilitates an end goal through the use of data.”

Fast forward to 2019, and we find data products being closely tied to the data mesh concept spearheaded by Zhamak Deghani, who we named a Datanami 2022 Person to Watch. According to Deghani’s original article defining data mesh, “Domain data teams must apply product thinking … to the datasets that they provide; considering their data assets as their products and the rest of the organization’s data scientists, ML and data engineers as their customers.”

Thomas H. Davenport, Randy Bean, and Shail Jain pushed the data product ball a bit further in a 2022 Harvard Business Review article, in which they define a data product as “an attempt to create reusable datasets that can be analyzed in different ways by different users over time to solve a particular business problem.”

The HBR authors further refined the term by differentiating between data products that are suitable for reuse, and analytics products that feature built-in analytics or AI capabilities. In either case, companies that adopt data products will want to consider creating a new position to oversee their creation and use: a data product manager.

One of the jobs of the data product manager is to ensure that the data is high quality. Anthony Deighton, who was recently promoted to the title of data products general manager at Tamr, sees parallels between product managers in general software development and data product managers.


“A product manager thinks about the customer’s goals and maps that into product features,” he says. “A data product manager is very similar. He thinks about business goals and maps those into data features. What capability do I need to build in the data to support this business goal? It’s very analogous.”

Tamr, which was co-founded by legendary computer scientist and database creator Mike Stonebraker, recently has taken up the data product baton. The company was originally founded to productize the “data tamer system” that Stonebraker helped develop at MIT and wrote about back in 2012, to help tame the data mess that exists at many large companies with many data silos. (“The more databases you create, the more silos you’ve created. So maybe Tamr is Mike’s way of helping to fix the problem he created by creating more databases in the world,” Deighton says.)

A recent Tamr study suggests that companies are looking to data products to help get a handle on problems with customer data. It found that 69% of respondents cited “business value” as a key metric for measuring the success of data products, second to user experience.

“That’s pretty interesting,” Deighton says, “because rather than thinking about technical measures like uptime or performance or data volumes, they’re talking about connecting it to the business value that they’re providing to their customers or to their partners.”

The survey also found that about three-quarters of respondents said companies that want to solve customer data problems should develop a data product strategy that focuses on business value, rather than other measures. That doesn’t surprise Deighton, who joined Tamr a few years ago after many years at Qlik.

“Data is a really important asset. [But] it’s a mess,” he says. “It’s what is affecting customer relationships. You’re a business owner and you’re saying ‘If I had a better handle on my customer data, I could provide better service or products or whatever to my customers.’ That’s an acute business challenge.”

A data silo (TFoxFoto/Shutterstock)

Tamr uses machine learning and AI to help companies find errors in their data. The company’s technology has been developed to scale entity resolution capabilities across many silos, thereby helping companies get a cleaner and clearer picture of their assets and providing a better starting point for downstream AI and analytics use case.

“Our view from a data products perspective is can we automate that process and use machine learning as a mechanism of putting the machine in charge of that work, not relying on humans,” Deighton says. “The idea that we can turn over tasks that previously we might have thought required humans to a computer is becoming much more mainstream.”

As customer data volumes grow, quality and trust problems tend to compound. Instead of relying on data to automate more customer interactions, companies must fall back on manual interaction methods until they can cleanse the data (which itself is often a time-consuming manual task).

It screams for a better approach.

“Customer data is where your primary interface point to your customer bumps up against this post-apocalyptic dumpster fire which is your data,” Deighton says, “and we’ve all experienced this.”

We experience it when we call a horizontally integrated telecommunications company to address a concern with our account, and are repeatedly handed to other departments, where we repeat our concerns, and are handed off again. We experience it when we call a hospital for test results, and the nurse says “oh that’s a whole different system.”

“We experience this viscerally all the time,” Deighton says. “And what this survey is showing is that organizations experience it as well. It doesn’t feel good if you work somewhere and you’re trying to do a good job of provide a great customer experience and you literally don’t have access to the data. That’s equally frustrating as being the person on the other side of that conversation who can’t get the question answered or the prescription filled or whatever is the problem.”

There’s no escaping Conway’s Law, which states that the design of a system will reflect the organization that built it and how it communicates. Conway’s Law leads to specialization as well as compartmentalization of systems. Deighton’s corollary to that law, which he calls Deighton’s Law, states that data in an organization also reflects the way the organization is structured.

“If you organize your company by product…you will then have customer data organized by those product silos,” he says. “Or if you organized geographically, if you have Europe and the U.S. and you have Northeast and Southeast in the U.S., you will organize your customer data geographically. If you organize your business by go-to-market segments, then you have large enterprise versus mid markets, you will have your data organized that way.”

But companies can start to chip away at Deighton’s Law with a data product strategy. “The idea behind data products is can we manifest a view of that data that sits on top of the organization structures and the data structures that sit beneath them,” he says.

Movement of data to the cloud–to giant data warehouses, data lakes, and data lakehouses–has helped to eliminate some of the technical barriers that maintained data in isolated silos. The physical separation has been eliminated in some cases, which is a big help. What’s preventing progress now is the companies’ organizational structure, and that’s where the data product strategy can help.

“What we see customers doing is that they will take data from these multiple silos and they then they’ll have two tables in the data lake but they’re not related and they haven’t figured out how to bring them together,” Deighton says. “It’s the data product that brings them together.

“The warehouse or the data lake is a necessary but not sufficient condition to success,” he continues. “It’s an important enabler, but the data product strategy is what enables you to see across those silos, whether those silos exist in your data lake or in the systems of record.”

Related Items:

How to Build Great Data Products

How ML-Based Data Mastering Saves Millions for Clinical Trial Business

Tamr Helps Air Force Wrangle Data