Giving Big Data a Sporting Chance
With new information collection techniques adding millions of rows of statistics to the virtual box score, improving a sports team is rapidly becoming a big data problem. A panel led by Jason Bailey, HP Vertica Senior Market Manager, discussed the current and future place of big data in solving the problems of the sporting world.
In sports, just as in enterprise, a remarkable marketing campaign can occasionally turn a profit on a substandard product. However, there are few marketing substitutes for a successful product and, unlike business, there are two distinct metrics that define success in the eyes of the sports consumer: wins and losses. As such, tracking, collecting, and processing data from a system like Stats LLC’s SportVU is well worth it if that data helps find competitive advantages.
“We’re capturing the location data in our system but we also have the play-by-play in our database, and so we’re combining the two,” said panelist and Stats LLC VP of Strategy and Development Brian Kopp. “We knew we had to have the context so we had all the X-Y coordinates for the players, X-Y-Z for the ball, 25 times per second, we link that together with the play-by-play.”
A basketball player is liable to exist over any of the 4700 square feet found on a standard NBA floor. Also, that player is more often than not in motion. As such, it is essential to capture the location data and put it into context, something SportVU does through advanced high-resolution tracking cameras.
“Once you’ve gone through that process,” said Kopp on what happens after the location data is contextualized, “it allows you to do three different things. First it allows you to automate things that you can see with your eyes.” This means automatically counting the amount of touches, dribbles, etc., that occur during a basketball game.
“The second area,” Kopp continued, “is we’re providing more context to data points you have, things like passing.” According to Kopp, the assist is a flawed metric for determining how good a player is at distributing the ball. A better metric is effective field goal percentage as a result of passes from that player. (No, it doesn’t roll off the tongue quite as easily as ‘assist’ but it will do for now.) Carmelo Anthony happens to lead the league in that category over the last two years.
“And then the third category,” Kopp said, “is things you haven’t been able to capture in the past and that gets into spacing and dynamic movements, not just speed and distance.” Some interesting insights have already flowed out of this analysis of movement. For example, Rajiv Maheswaran of the University of Southern California analyzed some of the SportVU data to share that one of the most efficient shots in basketball is an uncontested three-pointer from the corner, where ‘uncontested’ is defined as the closest opponent being five or more feet away, at a talk at the MIT Sloan Sports Analytics Conference in March.
Suddenly, the mystery of the success of this year’s New York Knicks, who won the 7th most games despite losing several key players from last year’s 13th-rated team, unravels. They take more uncontested corner three’s than anyone else in basketball. They employ Carmelo to distribute those threes (and generate them—he also won this year’s scoring title). As a result, they are the second best team in the Eastern Conference and have the best shot at knocking off the top rated Miami Heat before the NBA Finals. That second place finish translates into playoff game revenue, increased merchandise sales, and more brand chatter in social media. What could be simpler?
Of course, there is a danger in trying too hard to capitalize on market inefficiency. Metrics that correlate to team success are not always predictive. It is not enough to understand that a team shot better from three-point range. It is now time to understand why a team shoots better from three-point range—a process that over the next few years will likely involve coaches and analysts incorrectly deriving causation from correlation.
According to Kopp, the statistics and metrics that will really drive the next level of innovation in sports are based on collection and analysis methods that are either in their infancy or have yet to be developed. “The more complicated things we’re working on are identifying through data algorithms screen-and-rolls and different play types and different defensive formations and how that could be automated through tracking data and the algorithms we’re writing.”
The specific financial rewards behind winning a major championship are unclear, as they vary from market to market. New York would likely derive more revenue and value from winning an NBA title than Oklahoma City, simply because there exists a larger local base to which one can sell. However, aside from the direct financial awards given by the NBA to the champions, a title breeds further financial rewards that render the financial investment required in implementing some sort of analytics system worthwhile.
Moving from sports play to sports revenue, not all big data analytics in sport come in the form of measuring and optimizing performance. The National Football League is a multi-billion dollar moneymaking machine, a machine that can only benefit from the fine-tuning a proper big data analytics platform.
Consumer interaction and feedback is important for any business, especially in sports. Making a fan feel like they both understand a feel a part of the action leads to further fan participation and, as a result, increased revenue. Two parts go into that from a big data perspective: making accessible the new advanced metrics used to evaluate performance and social media integration.
According to panelist, Sports Reference User Affairs Coordinator, and ESPN Columnist Neil Paine, stat sheets that reporters are handed during media timeouts of basketball game represent only about a hundred of the millions of data rows attached to a game. Collecting, processing, and analyzing that data requires advanced camera and big data technology that basketball franchises are just beginning to embrace.
Letting the statistically inclined access and query that data on their own promotes a ‘behind-the-scenes’ feel that many sports fan crave. “One of the important things philosophically for our company is that we want to make it easy for the average fan to index these things and look up any piece of information that they want and cross reference the broad datasets with other broad datasets,” said Paine, referring to Sports Reference’s effort to publicize the data generated by things like SportVU.
“We have a thing called the Play Index which allows people to go in and run these queries and essentially it’s a way of letting people who don’t even know what SQL or a database server at all is run database queries without even knowing it,” Paine continued. The ongoing effort is not an insignificant one, as many vendors in the big data area are still looking to make SQL queries accessible to the average business user. For fan interaction’s sake, however, the effort is still important.
Finally, a growing portion of such analytics is the social side. Two months ago, Datanami looked at how HP helped the NFL track social sentiment, particularly with regard to Twitter, over the season and especially during the Super Bowl.
The NCAA looked to do something similar during the NCAA Tournament. Those who watched streams of games on CBSsports.com may have noticed a built-in Twitter tracker/aggregator, which provided statistics on average and peak tweets per minute for a particular game as well as showing tweets from verified media members and relevant selections from the general public.
While this type of information is certainly useful to the fan, as they get a better sense of the magnitude of the event in which they are participating, it is especially helpful to institutions like the NCAA in determining who to center their marketing campaigns around.
For example, according to Bailey, Michigan guard Trey Burke garnered more Twitter attention than anyone else, including anyone on the eventual national champion team Louisville. The NCAA could potentially use that information in various ways, such as using images of Burke in their advertisements for the Final Four game against Syracuse and the championship game.
Burke ended up being named the National Player of the Year—an award that was perhaps foreshadowed by his social performance. After all, many of those who vote on the national player of the year are prominent sports narrative drivers. In other words, they are media members and coaches who post their reports and their opinions on Twitter, and fans of college basketball form their views largely around these posts.
In a sports world where the better performing teams have an increased opportunity to win their championship, analyzing data to optimize a team’s strategy and performance could make a statistically significant difference. This is the future of big data in sports.