Big Data in the Big Game
On Sunday, the San Francisco 49ers will meet the Baltimore Ravens in the 47th edition of the nation’s greatest televised showcase: the Super Bowl.
When Vince Lombardi built his Green Bay Packers team that won the very first Super Bowl 46 years ago, he looked for players who would look good in Packer green. Forty-six years later, teams like the 49ers are starting to follow baseball’s example and use data analytics to help build their roster. It stands to reason that one of the first teams to fully embrace an analytical method to determining player performance is one located near Silicon Valley and led by Stanford’s former head coach.
The game will be the most-watched television program of the year. The National Football League is already the most popular sports league in the United States but this game presents an opportunity to showcase their offerings to the greater populace. While the NFL is often considered only as a sporting entity, it is important to remember that it is also a multi-billion dollar business. And like any other multi-billion dollar business, they need to use analytics to connect with their fans and please their customers.
People watch commercials and view advertisements throughout the year. Yet shelling out for an ad during the Super Bowl virtually guarantees that you will reach your target audience.
Increasingly, big data and data analytics permeate he football game and the advertisers that help pay for it. Today we run down some of the areas where big data and data analytics have infiltrated America’s favorite sport and biggest spectacle.
SAP Spurs Player Development
It can be difficult to effect change in professional sports scouting departments due to biases in human observation. Many scouts have been around for decades and have developed ‘gut instincts,’ which they use to write definitive reports after watching a few games, workouts, and interviews.
Take the following pre-draft scouting report on Panthers quarterback Cam Newton. “Can provide an initial spark, but will quickly be dissected and contained by NFL defensive coordinators, struggle to sustain success and will not prove worthy of an early investment. An overhyped, high-risk, high-reward selection with a glaring bust factor, Newton is sure to be drafted more highly than he should and could foreclose a risk-taking GM’s job and taint a locker room.”
After Newton’s first season in the league, a season where he won Rookie of the Year honors after setting numerous rookie passing records while galvanizing a hapless football team, the report looked to be fairly inaccurate. However, six games into the 2012 season, defenses looked to have figured out Newton just like the report predicted and the Panthers’ general manager (GM) had been. At that point, a year and a half later, the report seemed rather prescient. That is, until he rebounded and posted an even better statistical season than in 2011.
The overall point is that while scouting reports submitted by human observers can be helpful, they are not a particularly good predictor of success.
As such, teams like the 49ers are turning more to advanced analytics powered by systems like SAP HANA to evaluate player performance. “There’s massive amounts of data coming from lots of different places and one of those is player information,” said John Schweitzer, Senior VP and General Manager for SAP Analytics. “They have historical information and social information as they learn more about the investments they’re going to make in player acquisitions. Specifically what the Niners are looking to do is use information in an SAP Analytics platform with football player scouting and development.”
It took one enterprising and somewhat desperate general manager to recognize the value of building a baseball team around numbers for the decade-old practice of sabermetrics to gain a foothold in Major League Baseball. That team was the Oakland Athletics and they managed to make the playoffs with a payroll hovering around $40 million. The average MLB payroll comes in at about $100million. It is thus no surprise that the San Francisco 49ers are in a position to win football’s greatest prize after using data analytics to help build their roster.
Schweitzer hopes that what the 49ers are doing will become standard across the league, evening the playing field. “Our aspirations are to take a great use case like this and make it a standard.” He hinted that future announcements regarding other teams are on the horizon.
Bringing Analytics to Fantasy Football
SAP is also helping the 49ers integrate a fantasy football-centric ‘improved fan experience’ into the team’s new Santa Clara stadium, scheduled to open for the start of the 2014 season.
Fantasy football is one of the most popular online games in the world. For those unfamiliar with the game, regular football fans can ‘draft’ teams of NFL players at the beginning of a season. Those players accumulate stats, which translate to points. The fan with the most points wins.
Each week, every fantasy football player performs his or her own version of ‘predictive analytics’ to set their roster for the coming week. SAP is working with the NFL to put some meat behind those analytics for fans and users so they can have access to the most intricate and telling statistics.
“Our intent in certain ways is not just to bring in structured data, but unstructured data from the social world as well,” Schweitzer said while discussing how SAP plans to offer fans both statistical and sentiment analysis in playing fantasy football. According to Schweitzer, the game is a big data challenge in leveraging both structured and unstructured data, with information coming from social media sites as well as historical databases.
Fantasy football is an intensely competitive market, with ESPN and Yahoo garnering the two highest participation bases. Both of those platforms offer advanced analysis of their own, even if some of it is part of a premium, subscription-based package. Schweitzer hopes SAP can help NFL.com rise to the challenge by offering those advanced analytics for free, with the focus on delivering consumers to the website.
The NFL is not the only major American sports league partnering with SAP for cloud-based statistical analytics. The NBA has also grabbed SAP HANA to present advanced statistics to fans on their website for both fantasy and research purposes.
Collecting New Ad Metrics
It costs about $4 million to secure a 30-second spot on America’s most watched television program of the year. As such, a company shelling out that kind of money for only 30 seconds had better make the most of it. Not only must the commercial be humorous and memorable such that it sticks in the minds of Americans, it must also perform along several metrics designed around the advent of massive social media.
“It’s a whole different measurement game, because of the availability of new technology (think Second Screen), big data analytics, and a plethora of metrics that offer the ability for deep dive and quantitative analysis of the Super Bowl ads,” said George Musi of DG MediaMind.
Traditional television metrics, such as Nielsen ratings and reach, are growing irrelevant in the face of increasing internet presence. Musi listed approximately 40 measures through which companies should me evaluating their marketing campaign, including several social media-based numbers such as likes, follows, favorites, pins, etc.
Gathering that social media information is a valuable yet data-intensive exercise, but it is essential for a company spending $4 million on a 30-second ad in the Super Bowl to find a return on their investment.
Football: An Intriguing Predictive Analytics Use Case
Football provides one of the more interesting use cases for predictive analytics as it is one of the hardest sports to predict from week to week. While 90 players participate in an average NFL game, the sport is not subject to the large group principles that coarse through enterprise analytics or election analysis.
Nate Silver correctly projected all 50 states in last year’s presidential election. As a result, he has become a cult hero of sorts for the data analysts across the world that jump up and down and tell anyone who will listen that they have something important to share. And yet, when using data analytics methods to project this year’s Super Bowl before the start of the playoffs, he came up with the New England Patriots and the Seattle Seahawks.
In terms of predictability, elections are easy, baseball is harder, football is hardest. The reason for this is fairly simple: sample sizes. When Silver aggregated all of the poll data, he had a working sample of tens of thousands of voters from each state—a sample likely to be quite representative of the whole.
The baseball regular season is played over 162 games. After so many games, the cream will almost certainly rise to the top and the dregs will sink to the bottom.
Meanwhile, football’s regular season is but sixteen games long. Usually those sixteen games and the three or four ensuing playoff games are enough to determine a champion that seems like a champion—but not always. Last season, the New York Giants became the first team in NFL history to win the Super Bowl after sporting a negative point-differential during the regular season. The Giants made the playoffs in the first place by overcoming a relatively weak NFC East division with nine wins and seven losses. They then displayed two impressive performances in the playoffs before reaching the Super Bowl on the virtue of two crucial late-game fumbles from San Francisco punt returner Kyle Williams.
Football is also grayer than baseball and elections from an outcome standpoint. There were two statistically worthwhile outcomes to consider in the 2012 election: a vote for Romney or a vote for Obama. Baseball likewise has fairly concrete outcomes. A strikeout is bad, a hit is good, a homerun is better. There exist some neutral outcomes, a sacrifice fly or bunt for example, but for the most part a play has a clear beneficiary.
Football is a little less clear-cut. Take a 3-yard running play on first-and-ten for example. On one hand, the preliminary objective in football is to gain ten yards over four plays. Four three-yard running plays would obviously do the trick. However, most teams would choose to kick the ball away, either through a punt or field goal, when faced with a 4th-and-one. Thus, that three-yard gain has neither a positive or negative outcome, making it difficult to grade.
Enter Advanced Statistics
To combat the game’s unpredictability, a significant amount of advanced metrics have been developed over the last few years from sources like Football Outsiders and ESPN. Football Outsiders is a site akin to Baseball Prospectus (Silver worked for Prospectus before entering the election analysis game) devoted to producing more insightful statistics. Instead of judging an NFL offense by how many yards or points it has gained, FO uses DVOA, or Defense-adjusted Value Over Average.
Drawing from a massive historical database going back 20 years, FO has been able to use their advanced statistics to determine important points of causation. For example, a graphic commonly displayed by sports networks during a football broadcast shows that a team who runs the ball 30 or more times wins significantly more than they lose. Some may draw the conclusion that running the football is the key to success.
In reality, teams run the ball as a result of winning. Teams that are ahead in the fourth quarter will often run the ball to kill the clock on the trailing team.
ESPN has also gotten into the advanced analytics game, coming out with a statistic called Total QBR, which assigns a game score of 0-100 to a quarterback based on their play. The rating comes under criticism since the sports network has not made public its formula. According to ESPN, however, there appears not to be a formula, but a data system that matches every play a quarterback makes to a historical database to determine the play’s effectiveness and clutch factor.
Improving Player Safety
Despite being America’s favorite game today, football faces an ominous future. President Obama recently stated that he would think long and hard before allowing a son of his to play football. Super Bowl winning quarterback Kurt Warner has said he would not let his son play football. Ravens safety Bernard Pollard predicts that football will either be unrecognizable or extinct in 30 years.
These statements arise from a startling brain disorder called chronic traumatic encephalopathy, or CTE, caused by frequent concussions or even non-concussive blows to the head. The conversations and concerns have peaked in recent years, as Junior Seau, linebacker hero for those who grew up watching football in the 90’s, committed suicide in May 2012. Upon examination it turned that out Seau, who had no reported history of concussions, definitively had CTE.
As a result, Seau’s family joined thousands of others in a mounting class-action lawsuit against the NFL where defendants are claiming the NFL was negligent in protecting their players.
If football is to survive as the nation’s most popular sport for years to come and not get buried under the weight of lawsuits and concerned parents pulling their kids away from the sport and toward safer options like soccer and basketball, head trauma healthcare will have to improve significantly.
Healthcare is a huge opportunity area for big data, particularly when it comes to head and brain injuries. The National Institutes of Health, which are funding the CTE studies, have invested in several big data initiatives, including those that would facilitate a data sharing environment that would be crucial to brain research going forward.
Final Word and Super Bowl Pick
Big data and data analytics are seeping into more facets of society every day. Sports, football, and the Super Bowl are no exception.
From player performance tracking to smart advertisement buying, predicting game outcomes, and improving player health, the data opportunity is there. From when the pre-game coverage starts at 10am on Sunday until the Lombardi Trophy is handed to a Harbaugh brother sometime around 10pm, the big data will be buzzing behind the scenes.
With that in mind, we leave you to enjoy this weekend’s unofficial American holiday. But first, your Official Datanami Super Bowl Pick: San Francisco 28, Baltimore 17.