Follow Datanami:
March 16, 2017

Sports Follies Exemplify Need for Instant Analysis of Streaming Data

Girish Pancha


Michael Lewis’ book, “Moneyball” tells the the well-known story of how the Oakland A’s used data analytics to gain a competitive advantage in major league baseball (MLB) in the late 80s and early 90s. In the 25 years since, all major sports including basketball, football, hockey and soccer have embraced analytics as a key to success, it has become part of the ethos of sports.

Yet, for all the breakthroughs people have made using data to gain an edge, the work is only half done. Today’s analysis relies primarily on evaluating historical trends for future use but has not taken advantage of the real-time information that could be made available from the game as it is in play, though sensors on the players and equipment or from cameras with computer vision capabilities. Businesses as diverse as Blood Centers of America, The City of Chicago, and McDonald’s and are finding advantages in using real-time data to make intelligent decisions, from ad personalization to preventative maintenance and marketing automation. The real-time revolution can change the way the games we love are coached and played.

A glaring example of this chasm is the fact that in this age NFL coaches still rely on laminated play sheets to make critical in-game decisions. Those play sheets certainly distill wisdom from the historical data, but they only address real-time variables that the coach can track “in his head”, like down and distance, score, weather, success of recently-run plays and perceived health of his players.

The same approach exists in all the major sports, where some of the “head-scratchers” we’ve seen from coaches from Pete Carroll’s decision to eschew the run in the waning moments of 2014 Super Bowl to Joe Maddon’s use of closer Aroldis Chapman in a blowout 2016 World Series Game 6, up to and including the most recent Super Bowl, when Kyle Shanahan called consecutive ill-fated pass plays when the Falcons were already in range of a field goal that would seal the championship.

Moves like these certainly expose the fact that in the heat of the moment danger of being less than fully committed to analytics. Even if you believe that intuition should trump the data in the end, isn’t it better if the coach knows the real-time magnitude of the risk being taken?

There are no decisions more agonizing for armchair football coaches and more fun for Monday-morning sports reporters and fans than 4th down attempts. Laypeople have taken to analyzing those decisions but, of course, it’s easier to dissect these decisions with the benefit of hindsight and a lack of professional pressure.

Interestingly, football coaches’ explanations for their 4th-down decisions illustrate that they actually have a bias toward real-time data when they make their mental calculations. According to Brian Burke, owner of, coaches suffer from a fallacy known as “Base Rate Neglect.”

“How successful they were on the two short-yardage plays earlier in the game, the nagging injury to the left guard, and the success his defense has had so far that day. What they always ignore is the base rate of success. But it’s the most important piece of information. It’s the one thing you’d want to know as a head coach in such a situation. All that other stuff affects the equation at the margins, but unless you know the base rate they’re practically useless.”

In short, they rely on their “gut” or “intuition” for these decisions, but that is a merely a code word for a bias for real-time data, yet these coaches have only what their eyes and ears can tell them and a puny (compared to the power of real-time machine-learning algorithms) human brain that is not equipped to conduct the necessary analysis of this data in the limited time available.

So, what would some of the major U.S. sports look like if coaches and teams had the tools to ingest and analyze data to make changes on the fly? What would it look like if computers and algorithms dictated in-game sports decisions?

Swing, Batter! (Or Don’t)

In baseball, historical batch data can tell us how many runs we may expect a team to score given the number of outs and runners on base. We also have information about where pitchers are most likely to throw their pitches, when batters are most likely to swing given the count and type of pitch, and the batting average for those types and locations of pitches.

Baseball continues to extend its lead over other sports in the field of analytics

But as any avid baseball fan is aware, we also have real-time at our fingertips, including pitch speed, location and break, not to mention any real-time predictive analysis based on that data, that is not used by coaches,

In our imagined near-future world of baseball, the coach is given probabilities combining historical information with how the player and batter are performing that day and in fact for the pitcher in that inning. If the slider is breaking less or the curve ball hanging, or if the umpire is giving the high strike more than usual, then that changes the favorability of different options versus solely using a historical baseline. Some of the mistakes that might be avoided are throwing the wrong pitch, leaving the pitcher in too long or avoiding pitching and swinging (or taking) predictability to keep your opponent off balance. You could even evaluate performance in a bullpen warmup session to decide when and which relievers to bring in.

Slapshot! (Or Wristshot)

Low-scoring sports like soccer and hockey may get the strongest positive effect from the addition of real-time analysis, since each goal is so valuable. Information exists that can inform a coach about performance based on factors that lead to goals, but the low-scoring nature of these sports means a lucky goal or lucky save can leave the scoreboard belying the state of play. As a result, metrics that assess the game at the level of an individual player or group of players better indicate individual and team performance.

Soccer teams are beginning to track players’ speed on the field

Tools based on historical analysis exist to show how well certain combinations of players perform together by looking at figures, such as expected goals and percentage of total shots attempted vs. opponents. Myriad other tools and websites show how well teams perform throughout the season and which players are playing well or poorly. Often, during games or over the course of multiple games, coaches will change their player combinations and play calling in an attempt to spark more goals or better defense.

What if, during a game, coaches could get instant recommendations on player combinations, tactics, and substitutions or which plays to run based on how each player in the game for both teams was performing, measuring for speed, agility and distance travelled? It could potentially detect things like physical fatigue, loss of concentration (mental fatigue) or even minor injuries to help coaches adjust to the situation at a point in time. Some teams have begun to adopt forms of in-game analysis. There is still more to be done.

Pass, Punt or Kick

In our reimagination of football, gone is the laminated play card, replaced by an iPad with a list of a few high-probability plays, their expected yardage gain and the variance around that average. The coach still has discretion to call a risky or safe play, and the decision is made with a full knowledge of the distribution of possible outcomes based on both history and the realities of the current situation.  Who knows, maybe Pete Carroll still calls the pass play simply for the element of surprise, but at least it’s a choice taken accounting for the impact the available data.

Sports is a Business; Real-Time Analysis Is a Business Tool

There is no doubt that amongst those reading this article there are many who see this sort of intrusion of technology into the games we love as sacrilege because it takes the human element out of the sport. This perspective forgets all the improvements technology has already made to human and team performance, from sports nutrition to coach/player communications systems to instant replay and goal line technology.  Real-time recommendations to coaches are simply one more step in what is a continually competitive sports landscape. In the same way that use of analytics on historical data expanded the basis of competition into the group of emerging analytics believers, real-time analysis will create a new basis for competition for whoever can formulate the best algorithms and machine-learning techniques to populate the best play options on the iPad.

Lastly, we should never forget that sports are a game, meaning that regardless of how a decision is made, its success (and indeed the decision made) cannot be separated from beliefs about the what the opponent will do.

Beyond the philosophical arguments, professional sports are an industry that depends on fostering the highest level of competition possible to ensure best-in-class play that attracts avid and loyal fans. In the same way traditional businesses increasingly leverage real-time data to improve and automate decision-making to gain a leg up in the market or better serve their customers, teams are under pressure to squeeze every drop of performance out of their players and coaches. In 10 years I suspect the use of real-time assistance will be commonplace in at least one sports league, in 20 years any objections will be as antiquated as the laminated play card.

About the author: Girish Pancha, the founder and CEO of StreamSets, is a data industry veteran who has spent his career developing products that address the challenge of providing integrated information as a mission-critical, enterprise-grade solution. Before co¬ founding StreamSets, Girish was an early employee and chief product officer at Informatica, where he was responsible for the company’s entire product portfolio. Girish also previously co-founded Zimba, a developer of mobile applications providing real¬time access to corporate information, which he led to a successful acquisition. Girish began his career at Oracle, where he led the development of Oracle’s BI platform.