Follow Datanami:
October 31, 2014

A Look Into the Magic Ball: How To Harness Big Data For Predictions

Dmitri Williams

“Big Data” is the flavor of the moment, but it’s not always clear what the term means. I’ve worked with dozens of companies, and more often than not “big data” means the kitchen sink approach, i.e. let’s just collect everything in case we need it.

That’s fair enough, but using big data intelligently involves more than just a big database and tons of stored information. It means collecting the right kinds of data that answer specific questions addressing specific business needs — and when you collect the right information, big data can provide invaluable insight into your user base and help you predict the future actions and behaviors of your users. It’s all doable, but it requires planning and smart decisions up front that take into account the nature of your customers.

You may be familiar with the popular TV show House. The principal character — Dr. Greg House – is a curmudgeonly mentat of a medical analyst who never trusts his patients to tell him the truth. Weighed down by their own egos, cultures, superstitions and fears, House’s patients regularly lie to themselves and to him; obscuring key facts of their illness and making diagnosis difficult. House teaches us that listening to your patients may not always be the best way to help them.

In a similar light, listening to users talk about your product is flawed in the same way.

First, there’s the problem of sampling. Listening to users who scream the loudest doesn’t yield a scientifically representative sample of all users. Their complaints are proof that a phenomenon exists, but not that it’s common, or even important.

Secondly, these people may be flat out wrong in their observations—not because they’re evil, stupid or angry (though those are always possible), but because they are a poor gauge of even their own behavior. This is why it’s often better to observe than to ask. Today, we can observe through big data analysis, but it’s hardly a new phenomenon. Eugene Webb (1966) found this almost 50 years ago in his “unobtrusive watching” studies, when he discovered that it’s a lot better to watch people do something and measure it yourself than to ask them to recall it.

Webb’s classic case looked at the popularity of museum exhibits. When asked which exhibit they liked the best, museum visitors would consistently give answers that made them sound intelligent and wise. For example, many would say they liked the exhibit on atomic structures, even though this exhibit room was consistently devoid of visitors. The visitors were clearly lying to feel better about themselves and appear wiser.

So how could the museum learn the truth without following the guests around?

Webb’s simple approach was novel and revolutionary. His team collected “unobtrusive” data by counting the number of nose smudges on the glass cases and the wear of the floor tiles in front of the exhibits. Those turned out to be far better gauges of actual popularity.

It’s a simple insight, and one that can be brought to bear in the realm of big data. In the world of gaming, for example, game logs represent an analyst’s dream for data quality and purity. These logs yield a flawless record of every action, transaction and interaction players make (if a game is instrumented intelligently). And this information is what powers good big data.

The next step is gathering and interpreting this data intelligently. On the simplest level, any company needs to understand what they’re working with before they can predict what will happen in the future: how many customers it has, how much they’ve spent, and what they’ve done.

In the gaming industry, we use basic metrics like Daily Active Users (DAU), Average Revenue Per User (ARPU) and K-Factor (an indication of how viral a game is). These are invaluable for understanding aggregate trends, and if they can be augmented with add-on functionality like AB testing and segmentation, they can become quite handy indeed. For example, if an A group gets one kind of content (or mechanic, or CRM intervention, etc.) and a B group another, then the ARPUs of those two groups can be compared to see which performed better.

To get that historical data, “instrumenting,” or making sure these metrics are actually recorded, is critical.

Analytics companies will tell you which events to capture and how to report them — and it’s pretty straightforward. For example, when an event happens—say a user logs in—the system needs to be able to say “User 3482 logged in at time XX:XX:XX.” The analytics company supplies a piece of code that then fires this event off, typically up in the cloud, where algorithms and reporting are applied — which will allow for more advanced calculations, like predictions. This code is supplied in a library that essentially says “wherever you have your log-in event happening, put this line of code here.” If the events are already instrumented, this process shouldn’t take more than a day, at most.

Obviously, if your analytics software requires data for a metric you haven’t instrumented for, you’ll need to work that into your development cycle. For example, if you want to know how many players are on level 8 and you don’t collect an event like “advanced from level 7 to 8,” that metric isn’t going to show up. But in general, this isn’t a complicated process and simply requires time and planning.

So, clearly, having a solid foundation of historical player data is hugely important. It supplies you with data to populate the familiar basic metrics of DAU, ARPU, ARPPU, churn rate and many others that can provide insight into your user base. But these metrics have one key limitation: they’re based on historical data and therefore reactive by their very nature.

To be proactive and make the most out of the data you’re collecting, you need to peer into the future using big data and predictive analytics. It isn’t easy, but it’s not magic either. With today’s advances in big data analytics, the ability to accurately predict player behavior is a reality. The science can get confusing, but here’s a fairly simple way of understanding it.

Let’s say a computer watches all of the events that happen in a game and it starts to recognize patterns. Some patterns are repeated, while others aren’t. When the patterns are repeated, the machine “learns” that and starts looking for that pattern to occur again, this time starting to make a prediction about what is going to come next.

Say the computer sees A-B-C-D, over and over. After a while it recognizes it. Then it sees A-B-C, and you ask it what is going to happen next. It says “D,” of course, but it can also tell you how likely this prediction is to be correct. How can it do that? Well, when it looked into the past, it wasn’t always A-B-C-D. Occasionally it was A-B-C-Q. So the computer also starts to understand likelihood, and can tell you how often that guess has turned out right. That’s the prediction. No magic, really.

Previously, predictive insights were gleaned through using statistical models like correlations, ANOVAs, and regression models, but put simply, they aren’t very good. They’re pretty good — sometimes you get an r-squared of like .46 and you’re reasonably confident given the tools you used — but if we want models that reach accuracy levels of over .5, we have to leave the world of old-school statistical modeling and get with big data.

With big data, you can now have models that hit 60%, 80%, sometimes 90%+ accuracy levels. Of course, it takes a different toolbox to interpret this new information, but approaches like this can give companies the power to turn this predictive insight into targeted promotions or player interventions. In the gaming industry, this means using big data to predict figures like churn rate, and a player’s future spending (lifetime value) — metrics that have always been elusive — with high probability rates. And when all is said and done, with the systematic use of computer science in the big data era, it doesn’t take a magic eight ball to see into the future.

About the author: Dmitri Williams (PhD, University of Michigan) is the CEO, sensei, and co-founder of Ninja Metrics, Inc. Dmitri is a 15-year veteran of games and community research, and a world-recognized leader in the science of online metrics and analysis. The author of more than 40 peer-reviewed articles on gamer psychology and large-scale data analysis, Dmitri’s work has been featured on CNN, Fox, the Economist, the New York Times, and most major news outlets. He has testified as an expert on video games and gamers before the U.S. Senate, and is a regular speaker at industry and academic conferences. Dmitri moonlights as a healer and raid leader, plays a wicked Ashe in League of Legends. He loves data, and believes more of it, used intelligently, makes the world a better place.

 Related Items:

‘What Is Big Data’ Question Finally Settled?

Dispelling Predictive Analytics Myths

Predictive Modeling Meets Patient Behavior