Follow Datanami:
August 2, 2021

Still Wanted: (Much) Better COVID Data


It wasn’t supposed to happen this way. With three effective COVID vaccines, we were supposed to be on the tail-end of the pandemic by now. But that hasn’t happened. Infection rates are increasing quickly as the Delta variant spreads, and officials are pondering new mask requirements and even lockdowns. What’s more, the data is still a mess.

With rare exception, data has been a major obstacle with the COVID response in the United States since the pandemic began in February 2020. Up and down the spectrum, the response has been hindered by a variety of data problems.

The CDC’s first COVID test was faulty, setting back the US response by weeks. Subsequent tests were better, but false positive rates are still stubbornly high. Track-and-trace methodologies have been flawed and understaffed. What’s more, hospitals and government agencies at all levels use a smorgasbord of data collection, analysis, and reporting techniques, which damages the ability of decision-makers to get a clear understanding of what is actually happening on the ground.

The poor COVID data supply chain creates additional downstream problems. Without a good real-world baseline, predictive models can’t give accurate forecasts of the coronavirus’s spread. That ultimately hinders the ability of public officials to make good decision to fight the spread of SARS2.

Numerous groups have tried to rectify the problem. Datanami has reported how vendors in the big data space, including Snowflake, Talend, and AtScale, have dedicated resources to try and straighten out certain data elements. Johns Hopkins University’s COVID-19 Dashboard became the defacto standard for case counts and death counts. The New York Times aslo stepped up to function as a clearinghouse for data about the pandemic.

A ‘Messy Patchwork’

In March 2020, editors at the monthly magazine The Atlantic were surprised by the lack of good data about the pandemic, and pledged to do something about it by creating The COVID Tracking Project.

“Throughout the pandemic, COVID-19 testing and outcomes data have been a messy patchwork,” wrote Jonathan Gilmour, who worked on data infrastructure for the project, in a May 2021 blog post. “States frequently changed how, what, and where they reported data. In the absence of clear federal guidelines, states were largely left to figure out how to publish data without help.”

State-reported vs. federally-reported test count differences for states that define tests the same way as the federal government but have >5% differences in testing counts. (Image source: COVID Tracking Project)

Ultimately, volunteers with the COVID Tracking Project gave more than 20,000 hours of their time to getting clean and consistent data. The project was surprisingly manual, as automation would have led to an increase in data errors. Amid constantly shifting state and county health dashboards, the project relied on a rather rudimentary screenshot tracking system to give it the “ground truth” it needed to accurately maintain time-series data.

“Seeking out and manually entering each data point gave us a detailed understanding of the data that we would not have been able to develop had we automated data collection,” Gilmour wrote. “We knew when new data points were added and came across caveats and notes posted on state data pages…. Additionally, we learned what was normal and what was abnormal on a state by state basis, enabling us to make informed decisions when handling reporting anomalies.”

The COVID Tracking Project stopped collecting data in March, when it seemed that vaccines would get us out of the woods. Back then, you will remember, the vaccination effort was in overdrive, and millions of Americans were vaccinated every week. That led to a sharp reduction in infections, by any measure.

However, infections are on the rise again, as vaccine hesitancy and the extremely contagious Delta variant have combined for a summer COVID surge that looks remarkably like last summer’s numbers (if they can be believed).

That puts public health officials in an awkward position. Unfortunately, the data hasn’t improved much since the epidemic started. With vaccinations waning, do they have enough information to impose new mask mandates or lockdowns?

‘Unmanaged Data’

Analytics expert Tom Davenport co-authored an August 2020 MIT Sloan Management Review story about the shoddy state of COVID data. The authors wrote: “One is forced to conclude that the data needed to manage the COVID-19 pandemic is effectively unmanaged. This is an acute problem, demanding urgent, professional attention.”

(Matt Bannister/Shutterstock)

Well, it’s August 2021 now, and not much has changed when it comes to the data, according to Davenport. Indeed, the opportunity to improve the situation appears to have passed us by.

“A lot of the energy for changing the data environment sort of evaporated when the vaccines became available and people thought ‘We’re out of the woods on that one. Maybe the next one.  But we’ve got plenty of time to address that issue,’” Davenport told Datanami last week.

Even last summer, before the predicted autumn surge (which turned out to be worse than predicted), there wasn’t much urgency to improve the data situation. The chief data officer (CDO) position at the Centers for Disease Control and Prevention (CDC) went unfilled during the beginning of the pandemic until Alan Sim, who led the data science team at at D.C.-area defense contractor, took the job in December 2020.

“We thought it was too late to do too much with the current pandemic,” Davenport said. “But now that we see how long it’s stretched out, it probably wasn’t too late. There was time to make some changes, if we had been so inclined.”

The biggest problem with the response to the pandemic is a lack of centralized authority and  standards, Davenport said. Tracking and tracing of COVID contacts, a tried-and-true approach for understanding how the virus spreads, has not been successful, he added. Across the board, there have been failures to gather the data needed to get in front of the disease.

“We basically treated it as if it was a local disease that every state could address on their own,” he said. “I don’t think we understand the dynamics of the disease well enough to do a good job of establishing policies [for] masking and certainly not lockdowns.”

A Call for Federal Data Standards

While the CDC theoretically had plans for collecting and assessing data with state and local health authorities at a more integrated level, that apparently never came to pass, according to Davenport . “Cleary, if we’re going to put the CDC in charge of fighting pandemics in the US, then I think we’re going to need to have some clear data standards and processes,” he said.

But there are headwinds, starting with the CDC, which historically has not been that oriented toward data, he said. “They’ve done a good job of fighting pandemic in other countries, Ebola and so on,” he said. “But we haven’t had any pandemic in the US, so I guess it’s not surprising that we had a bad approach to data management for it.”

Tom Davenport is the President’s Distinguished Professor of Information Technology and Management at Babson College, the co-founder of the International Institute for Analytics, a Fellow of the MIT Initiative for the Digital Economy, and a Senior Advisor to Deloitte Analytics

Davenport admitted he’s not an expert in the law, but from his perspective as an expert in analysis, there certainly was (and is) a need for greater standardization and centralization. Even without today’s starkly divided political climate, Davenport has his doubts about whether one could get 50 states to agree on anything. That leaves us in the same boat as we are today, with a cacophony of disparate public health agencies moving independently of one another, leaving the data–i.e. the statistical record of events–to be interpreted in different ways.

What’s needed, he said, are a clear set of federal data standards for relevant events, including what constitutes a case of COVID and what constitutes a death, and so forth. This would minimize the double-counting and other errors that not only give policy makers an incorrect view into the record, but also undermine confidence in public health authorities. This policy could be enforced with a law that says independent state and local health departments that want federal dollars to help fight a pandemic would have to abide by the federal standards, he said.

“I don’t know what it would take,” Davenport said. “I hope that there are some people in the Biden administration who are thinking about this. But I haven’t seen any announcement of any policy changes related to data thus far.”

Related Items:

How the Lack of Good Data Is Hampering the COVID-19 Response

Data + AI Lessons Learned from Covid-19 (Datanami and Webinar)

COVID Data Report Card: Mixed Results for Public Health