Deep Learning Is About to Revolutionize Sports Analytics. Here’s How
Imagine being able to model your opponents’ defense to optimize your attack in soccer, or to determine the best placement of players for grabbing rebounds in basketball. These are some of the predictive systems that data scientists are just now starting to build, and they’re using deep learning techniques to do it.
Statistics has always played a role in sports, and we’ve seen them become even more prevalent lately, with measures like wins above replacement (WAR) in baseball, expected point value (EPV) in basketball, and similar measures in soccer and hockey. But today, the most interesting work in sports analytics is happening around the optimization of player positioning for team sports.
Much of this player movement data is collected by a Chicago-based company called STATS (Sports Team Analysis and Tracking Systems). Along with the Elias Sports Bureau, STATS is about as trusted a name in sports data as you can get. If you ask Google or Siri for the latest score in the Padres-Nationals game, those services will get that data from STATS.
Five years ago, STATS installed a series of cameras in each NBA arena, as well as a series of soccer stadiums in Europe. The cameras track the movement of each player and the ball as part of the firm’s SportsVU system, at a frequency of 25 frames per second.
With thousands of games per year, that data quickly adds up and turns into a big data problem. The responsibility for making sense of this mass of geospatial and time-series data falls to STATS chief data Patrick Lucey and his team of PhDs.
As Lucey explains, turning this raw data into actionable insights is not easy, and is something that STATS has invested considerable resources into solving.
“How do we make sense of all that data is a fundamental machine learning problem,” Lucey tells Datanami. “You have box scores in baseball and it’s really nicely segmented. But for continuous sports like basketball and soccer, how can we contextualize that data and ask very specific questions and get answers in understanding team play? We’re very good at doing that.”
Machine learning is a critical technique because the player and ball movement data is completely unstructured and lacking any context. A scorekeeper may note that a striker took a shot on goal, but the results of that shot are binary: either it went in or it didn’t.
STATS sees SportVU as a way to fill in the critical details and to recreate the story of the game.
“We want to know where that shot lied on that spectrum. Where does it lie between 0 and 1?” Lucey says. “To query that in SQL, you actually have to create that data point. That data point doesn’t exist. Using machine learning, you can have these new high-level data points, which are very valuable.”
Ghosts in the Game
STATS has created a large amount of intellectual property around the problem of converting raw geospatial data into useful data, and has come up with novel solutions. As Lucey notes, crunching the data is not as straightforward as one might wish.
“In soccer, there are 3.5 billion permutations in how those [10 field] players can be switched around,” he says. “When you include the opposition, there are more permutations than there are atoms in the universe. So just doing an exhaustive search, it’s not possible” to get player formations.
Lucey recently described an approach for teasing player formation out of raw geospatial data in a paper he co-authored with other researchers from Disney Research and the California Institute of Technology, titled “Data-Driven Ghosting using Deep Imitation Learning.”
The central thesis revolves around the creation of a “ghosting” model that can accurately learn and subsequently predict how actual professional soccer players move on offense and defense. The model was based on a recurrent neural network that was trained, in an unsupervised fashion, on 17,400 soccer sequences across of 100 games. The training took several hours, according to the paper, which won second place at the MIT Sloan Sports Analytics Conference in March of this year.
The visual nature of the ghosting system could provide direct benefits to teams that invest in it. Coaches and players would be able to watch a replay of a particular play, and see how the ghosts defenders respond to their opponents attack.
“Given an attacking play by a team, how did that team defend, or how should they have defended?” Lucey says. “How should they have moved to minimize the likelihood that that team scores? This is where we get into the realm of deep neural nets and deep imitation learning. It’s very similar to what Google and Deep Mind did for AlphaGo.”
The NBA’s Toronto Raptors built a ghosting system four years ago, but the model was created manually in a rules-based manner, which took an inordinate amount of time and money. With the advent of deep neural nets, such a system can be created at a fraction of the cost of manually building it.
“For a lot of time we can come up with measure which can describe coarse behavior,” Lucey says. “But now what we’re able to do is actually synthesize what teams should have done at the tracking level.”
More Data, Please
The ghosting approach shows the potential for modeling the movement of players, and the subsequent optimization of those movements. But already, limits have been encountered.
“We are the forefront of machine learning and artificial intelligence, because we’re able to do these sorts of things. But to do it effectively, we need more data,” Lucey says. “Once you start slicing and dicing the data, you don’t have enough examples.”
There may be 20 to 30 examples of a particularly type of play with a particular group of players, but the models will need much more to be effective. “That’s where the rubber hits the road,” the Australian native says. “We’re just starting to do these very precise, specific things.”
Silicon Valley firms like Google and Facebook may get most of the glory for their AI research, but as Lucey explains, there’s a lot of fundamental research going on in the sporting world, too.
“I think some people don’t think about sports in that regard, but it’s definitely at the forefront,” he says. “Sports basically represents the most interesting data out there, I think. It’s adversarial, it’s multi-agent, it’s very fine-grained, and there’s just so much of it. I don’t think you can find a richer or more interesting data set, especially with the tracking data we have in basketball and soccer.”
A paradigm shift is underway in the field of sports analytics, and STATS hopes to be at the center of it. Recent advances in deep learning are giving coaches and professional athletes insights that previously required watching hundreds of hours of game film. With ghosting systems, teams can show players how to position themselves to maximize the potential for a positive outcome.
“The goal here is to help teams, or whoever we work with, find their winning edge,” Lucey says. “It’s correlation versus causation, but basically everybody is doing this now, and if you’re not doing it, you’re at a severe disadvantage.”