The Benefits of Building Predictive Analytics on Unified Customer Data
Predictive customer lifetime value (CLV) is a key element in modern marketing analytics, allowing marketers to prioritize customers that have the highest predicted business value. The most popular data science approach to predicting CLV is the extended Pareto/NBD model (EP/NBD) generative model which leverages a few summary statistics about customer transactions: the frequency of repeat purchases, the total customer age, most recent purchase, and the historical average order value. Despite using only a few signals and being over fifteen years old, the EP/NBD models has maintained strong relative performance according to a recent comparison of several CLV prediction approaches.
There have been many attempts to substantially improve CLV prediction via more sophisticated modeling techniques (SVMs, boosted decision trees, and neural networks), but these models also assume the time-series of past customer transactions as the primary data signal. Further improvements to CLV prediction, and predictive analytics generally, are more likely to come from exploiting new sources of customer data rather than modeling techniques or feature engineering. To quote Rule #41 from Google’s Rule of Machine Learning: “When performance plateaus, look for qualitatively new sources of information to add rather than refining existing signals.”
Luckily, following Rule #41 isn’t too hard since modern business collect more types of data than ever about their customers and their business interactions. For instance, retail businesses typically collect location information, itemized products purchases, marketing campaign response across an increasing number of channels. The diversity of data sources has grown so significantly that retailers and other direct-to-consumer businesses leverage dedicated customer data platforms (CDPs) to unify this data. Intuitively, these data sources can benefit prediction quality in many ways. Customers who purchase a specific product may be more likely to churn and not return. Customers living close to a high-performing store might have a higher lifetime spend. A customer who clicks on a marketing email and spends time browsing a catalogue may be more likely to make an in-store purchase.
While the significance of any of these kinds of data may vary from business to business, in total this explosion of customer data will have a substantial impact on how we approach predictive analytics. For instance, in experiments across multiple retailers and experimental settings, we found more than a 15% average improvement in CLV prediction (measured by root mean squared error) from using a diverse set of data signals over a model using only historical transactions.
Here are some of the key benefits to building on unified customer data.
Training on Unified Customer Data
One of the consequences of collecting customer data across multiple channels is that events associated with a given customer may be split across different records. For example, an in-store and online retail purchase from the same customer may not be resolved to a single unified customer profile if there is no shared customer primary key (e.g, an in-store purchase may not be associated with an email). This identity resolution failure has many important consequences, but one of them is that it compromises the quality of predictive analytics. If you are training a model to predict CLV spend by a customer, but your historical information is inaccurate, this will also impact the quality of the predictive model. Based on a sample of real-world data from retailers, we found that 53% of total historical CLV spend is misattributed, meaning the spend by a single customer has been incorrectly associated with multiple customer records.
A common refrain in machine learning is GIGO (Garbage In Garbage Out), referring to the idea that no matter how sophisticated your model or algorithm is, the efficacy of your approach is limited if built upon poor data quality. Beyond ensuring the training data quality is good, building predictive analytics on unified customer data allows you to also add new kinds of information to your models.
Customer Demographics & Product Features
Taking a look at several classes of features, or signals, for CLV and churn prediction, it’s important to look at how you can improve prediction forecasting quality. There are many, but here are a few examples:
- Customer Attributes: Information about who the customer is and non-transaction information about them. Where are they located (and is it near a retail location), what channel were they acquired through, as well as age, name, and gender information.
- Product Attributes: For businesses with available catalogue information, look at the specific product purchases and how “sticky” each has been (in terms of aggregate return orders from customers who have been this or similar products).
- Transaction History: Broadly similar to the same inputs EP/NBD uses; frequency and recency of transactions, as well as past average order amounts.
- EP/NBD Predictions: Use the predictions from a popular baseline in order to understand how much marginal value is given by other feature groups.
Next, you’ll want to understand how prediction quality changed as you use different subsets of the groups above. As an example, we looked at CLV prediction, where we measure prediction quality via RMSE (root mean squared error). The impact and interaction of these feature groups on RMSE quality can be summarized by the figure below (lower RMSE is better):
One key takeaway from this graph is that more feature groups improve prediction quality. There is no type of feature group which can’t be improved by adding in signals from a different group. This reinforces the idea that using a broader range of signals is important.
Similar to CLV prediction, we saw a similar trend with churn prediction: the task of predicting whether a customer will make a purchase in a given forecasting horizon (e.g, this quarter or year). For this task, we measured prediction quality using the F1 measure on the event that the customer returned and a larger number is better:
In addition to the “more features are better” take-away from before, another is that using customer attribute features alone (without seeing a single transaction) performs reasonably well (although does not yield the best performance). A model built purely from customer attributes can be used to help prioritize marketing actions for newly acquired customers, before waiting for a transaction; a limitation of traditional CLV approaches which require at least one transaction.
There’s No Data Like All Your Data
While there are interesting discussions to be had about the modeling technology one uses for predictive analytics, we believe those choices matter much less than being able to leverage all the data you’ve collected about who your customers are and the interactions they’ve had with your business. Doing this will require predictive analytics stacks capable of flexibly leveraging a broad range of data businesses have worked hard to collect, manage, and unify.
About the author: Aria Haghighi is vice president of data science at Amperity, where he is responsible for leading the company’s data science team to expand core capabilities in identity resolution. He has more than 15 years of technology experience playing key advisory and leadership roles in both startup and enterprise companies. Most recently, Aria was Engineering Manager at Facebook where he was responsible for leading the Newsfeed Misinformation team, which uses machine learning and natural language processing to improve the integrity of content on the platform and tackle the prevalence of fake news, hoaxes, and misinformation. Haghighi has also held leadership and technical roles at some of the world’s biggest tech companies including Apple, Microsoft, and Google.
Editor’s note: This story is based on an Amperity white paper, titled “Predicting Customer Lifetime Value with Unified Data,” that is available here.