Follow Datanami:
December 16, 2021

5 Steps to Data-Based Decisions

Devavrat Shah


Organizations in virtually every industry are now awash in more data than they know what to do with. But how can they take all of that information and use it to arrive at new insights that help to improve operations and chart a path forward? The exact journey from data, to insight, to decision-making will look slightly different for every organization. But my observations of best practices across industries reveal a common architecture to the process.

Organizations seeking to leverage data science for strategic decision-making should follow these five steps:

1. Understand the Structure in Data

Consider the retail sector. Anyone who opens up an e-commerce shop on a platform like Shopify almost instantly begins to collect data – information about transactions from different channels, suppliers, inventory, customer reviews, and other sources. Now let’s say a retailer wants to better understand its customers, including the things they like, the things they don’t, and what factors influence their buying decisions. Answering this type of (seemingly simple) question can quickly get complicated, as customers have different preferences, and different ways of interacting with a brand. Some may purchase only one product, while others are loyal repeat shoppers. But by conducting a careful analysis at both micro and macro level, organizations can begin to get a holistic picture of their customer base, which will later help to inform predictions about what sorts of products will perform well in the future.


2. Use Structure to Create Prediction Models

This is the part of the process where the retailer from Step 1 takes all of its data and begins to ask specific questions, such as: “How will shirt #27 sell in the coming spring?” After data scientists bring existing data into a more structured form, they can apply methods of prediction to answer such questions. It’s important for stakeholders to bring nuance to this aspect of the process. For instance, red shirts may be selling well, but perhaps there’s been a noticeable trend downward in the past week. By analyzing historical data, the organization might discover that this downturn is merely a predictable seasonal hiccup, or due to unpredictable external events (such as the sudden appearance of a worldwide pandemic). Often, people ask the question, “How much data is sufficient?” But really, that’s the wrong question to be asking. Instead, organizations should ask themselves, “Given the data available, which method of making predictions is the right one?”

3. Understand the Dynamics in the Data

In the popular imagination, data science is often a very static process: You take your data, you run it through your neural network, and you make your predictions. But in reality, data in the worlds of retail and finance (and other sectors where data science is particularly valuable) is constantly changing. Further, the data is sometimes biased by choices made by the organizations collecting it. Savvy data science teams must account for all of these variables. To touch again on retail: the data coming into organizations inevitably changed quite a bit from February 2020 (before COVID hit) to January 2021 (the height of the pandemic), and then again several more times since then as the crisis has waxed and waned. Therefore, it is essential to account for such dynamics while developing understanding of data as well as building predictive models.

4. Employ Modern Machine Learning Techniques


It’s one thing to process numerical data from spreadsheets to make data-based predictions and decisions. But now imagine that the numbers in those spreadsheet cells become unstructured objects such as images and text. To leverage this sort of unstructured data, organizations need to leverage modern machine learning (ML) methods. This vastly increases the amount and types of data that organizations can use to inform their predictions. Easily quantified metrics such as historical sales data is, of course, very useful. But unstructured data from online customer reviews, or even social media interactions, can create a much more comprehensive picture of what is happening in the marketplace, and why.

5. Develop an Effective Scenario Analysis (aka Recommendation) System

Finally, data science teams need to bring everything together to make sequential decisions, at scale, in real time. Such decision making inevitably involve decision makers or humans who need help with performing “scenario analysis” and need “recommendations” using historical data. Indeed, the recommendation systems are an answer to this quest.

Typically, recommendation systems have been concerned with applications in e-commerce such as “if you like this, you may like that’’. However, they have wide randing applications beyond e-commerce which includes all sorts of entertainment and wayfinding apps, finance, policy making, matching markets (ride hailing, online dating, gig markets and more) and more generally anything that needs help beyond searching. To illustrate both the challenge and opportunity of creating an effective recommendation system consider a simple example: Amazon maintains an inventory of millions of products and has hundreds of millions of customers. That’s an enormous number of variables to juggle, on both sides of the recommendation engine. But by getting this piece right, the company is able to increase revenues and better satisfy its customers.

Viewed from the outside, data science seems like a very mysterious practice. But then, so does the practice of space travel, or building skyscrapers, or performing open-heart surgery. Like these other practices, data science is governed by rules that can help guide organizations and individuals to success.

About the author: Devavrat Shah, lead instructor of MIT Professional Education’s Applied Data Science Program, is a professor with the department of electrical engineering and computer science, MIT. He is a member of the Laboratory for Information and Decision Systems (LIDS) and Operations Research Center (ORC), and the Director of the Statistics and Data Science Center (SDSC) in IDSS. His research focus is on theory of large complex networks, which includes network algorithms, stochastic networks, network information theory and large-scale statistical inference.

Related Items:

A ‘Glut’ of Innovation Spotted in Data Science and ML Platforms

What is Feature Engineering and Why Does It Need To Be Automated?

The 3 Key Steps to Building a Predictive App with Machine Learning