Follow Datanami:
November 4, 2020

Another Failure Undermines Trust in Polls

When it comes to American political polling, it turns out Big Data isn’t nearly big enough.

Hence, as we again learned this week, earnest attempts to predict the outcome of a national election based on an extremely small data set long past its expiration date is becoming an exercise in futility.

Granted, the pandemic-driven early voting skewed most models, making it that much harder for pollsters to gauge turnout and preferences among key demographic sectors such as reliable senior voters. Still, critics of current polling practices concluded the unexpectedly tight 2020 presidential election reflects flawed models and a paucity of reliable data.

Case in point is Joe Biden’s roughly 20,000-vote lead in Wisconsin, a vital battleground state. Various pre-election forecasts had Biden handily winning the pandemic-ravaged state. Among the few polls mirroring the actual outcome, the final Marquette University Law School poll released on Oct. 28 gave Biden a 5-point edge, with a +/- 4.3 percent margin for error.

As noted by the Milwaukee Journal Sentinel, Biden’s projected margin of victory in the Dairy State remained virtually unchanged since May, despite an ongoing public health emergency and associated economic turmoil. (Wisconsin reported more than 5,700 new Covid-19 cases on Election Day, a one-day record, surpassing 50,000 active cases statewide.)

Seeking to fill the yawning trust gap on political polling, AP Votecast promotes its election forecasts as “polling data you can trust.” Its methodology was developed by the wire service and the National Organization for Research at the University of Chicago.

The consortium attempts to cast a wider net, for example contacting potential voters in battleground states in the eight days before this week’s election. While its surveys have a smaller margin for error, AP Votecast primarily sought to ferret out voters’ attitudes on the coronavirus, its economic impact and other hot-button campaign issues rather than forecasting statewide election results.

As we’ve reported, national polling outfits have largely failed to accurately forecast the outcome of the last two presidential elections. That failure is prompting a fundamental reassessment of how polls are conducted in an age of disinformation and hyper-partisanship.

“While we have some theories on what influences voters, we have no fine-grained understanding of why people vote the way they do, and what polling data we have is relatively sparse,” Zeynep Tufekci, an associate professor at the University of North Carolina, noted in the New York Times the day after the polls closed.

Unlike, say, data-rich weather forecasting, election polling suffers from a very small sample size. “Since many models use polls from the beginning of the modern primary era in 1972, there are a mere 12 examples of past presidential elections with dependable polling data,” Tufekci noted.

“That means there are only 12 chances to test assumptions and outcomes, though it’s unclear what in practice that would involve.”

The emerging consensus among data and social scientists seems to be that desk-bound pollsters need to pound the pavement in order to more accurately gauge the mood of a deeply divided U.S. electorate.

Recent items:

2020 Election: Five Ways to Improve the Accuracy of Polls

Systemic Data Errors Still Plague Presidential Polling

Six Data Science Lessons from the Epic Polling Failure

Datanami