Follow Datanami:
December 3, 2020

How Data Can Help Optimize COVID-19 Vaccine Distribution


We are just weeks away from the first COVID-19 vaccinations being administered to individuals, a remarkable feat of biochemistry and drug development. But with only 40 million doses expected to be available in the United States, the vaccine will be in short supply for several months. How then can we get the biggest impact from this limited supply? Data analytics and data science can help provide an answer.

The Centers for Disease Control and Prevention (CDC) bears ultimate responsibility for the distribution of two COVID-19 vaccines that are before the Food and Drug Administration (FDA). A decision is expected from the FDA regarding the Pfizer-BioNTech’s application for an emergency exemption to distribute its vaccine on December 10, while approval (or rejection) of the Moderna vaccine will be announced on December 17.

Assuming the CDC approves both vaccines–which seems like a good bet at this point, considering they both are purported to be at least 94% effective–the big decision in the weeks and months to come is who gets them. Both vaccines require two doses, which means there should be enough vaccine to inoculate 20 million people against COVID-19, which is just a fraction of total population of 328 million.

The vaccine equation is loaded with multitude variables, says Ray Falcione, who’s the vice president of the US public sector for OmniSci, a San Francisco, California analytics firm.

“When you think about vaccine distribution there’s a ton of different factors that go into it,” he says. “When I’m looking at CDC’s deployment plan, it’s everything from coordinating federal, state, and local entities, tribal nations, how many [doses] are available, who you get that to, depending on who is the most vulnerable population.”

The CDC will need to figure out how it wants to attack the problem, and then come up with a strategy to execute the plan. The CDC could prioritize elderly people with chronic conditions, which is the population that has been most impacted by the coronavirus. There were over 58 million people in the United States aged 65 or over in 2018.


Frontline healthcare professionals, including those who are treating people with COVID-19, are also among those who could be first in line to get a vaccine. According to the CDC, there are 18 million healthcare workers in the United States. Other groups have also been inordinately impacted by the novel coronavirus, including minority groups. Will the CDC prioritize vaccinations for them? Certain urban ZIP codes have also seen much more virus than others (although the fall surge is now reaching rural areas too).

The CDC isn’t sharing its methods at this point, but if OmniSci were to be assigned the contract, it would start by gathering data from CMS, the Centers for Medicare and Medicaid Services.

“There’s some public data that we would put together initially, which is mostly from CMS about where are the Medicare beneficiaries with which types of chronic conditions, and what is the geographic variation in that,” says Chuck Cogar, OmniSci’s director for the federal civilian.

The second thing he would do is analyze where those people live vs the centers that are providing the vaccine. “Is there an access-to-care issue, particularly in rural areas? Cogar says. “Do they have proper access, or where do we need additional resources to get vaccine to the people who need it most?”

It’s unclear whether the CDC will use OmniSci’s GPU-based database to help come up with its vaccine distribution plan (although the agency reportedly is using GPUs for genomics-related research). But according to Falcione, the software could be quite useful for the task.

When you have all these data sets, it become a big data problem,” he says. “We’re talking significant amounts of data that can’t otherwise be ingested into a traditional platform, and then being able to visualize and inspect that data and being able to make decision in a timely way…You need a system that can ingest multiple data sets and visualize it in near real-time.”

Syndromic Surveillance

Another way to approach the problem is to focus vaccine distribution to areas that are having the biggest challenges containing the spread of coronavirus. One company that’s been actively involved in monitoring the spread of the virus is Premier, which provides cloud-based clinical decision support services used by 300,000 physicians around the country.

During normal times, Premier develops technology that helps physicians make good decisions at the point of care. It does this by using its machine learning and natural language processing (NLP) technology to interpret the doctors notes entered to the electronic health record (EHR) system, or even by analyzing X-Rays.


As COVID-19 spread earlier this year, the company found another way to use it.

“We took that technology and we quickly pivoted it to syndromic surveillance,” says Leigh Anderson, president of performance services for Premier. “We said, if we can do it for imaging, we can obviously do it for COVID-19. And we can do it in the ambulatory stage, because what we found was the ambulatory space predicts future waves of COVID-19 in the acute care space.”

The system essentially takes unstructured data entered into EHRs and turns it into an early warning system for COVID-19, which Premier’s customers could use to ensure they have enough personal protective equipment (PPE) and other supplies on hand to deal with the predicted demand.

“By looking at the information across all the EMRs, we we’re able to look at it at a state level and a regional level,” Anderson says. “And what we found was we were able to predict what was happening in the hospital, in the ER, from a resource perspective, a couple of weeks ahead of time.”

As vaccines and other therapies hit the market, knowing where the infection will peak in the next 10 to 14 days could help inform where to move supplies to have the greatest impact. This is the promise of having a data-based approach, Anderson says.

“There’s a ton of data out there. There’s lot of people doing a lot of things with it. But how do you make it actionable?” Anderson says. “From my perspective, it’s making data focused and very actionable–that’s not easy to get to, because most of the data in the EMR is not easily interpreted.”

According to Anderson, Premier is working with its member hospitals, as well as the FDA and the CDC, to ensure that its clients are ready to do their part in the distribution of the vaccine. However, there are other considerations of the vaccine itself, including how its stored and distributed, that need to be taken into account in the CDC’s distribution plans. The Pfizer vaccine needs to be stored at minus 70 degrees Celsius, while the Moderna vaccine only needs minus 20 degrees Celsius. Figuring out the transportation logistics for distributing the requires careful forethought.

While the Pfizer vaccine will have the lowest temperature requirements of any vaccine distributed in the United States before, it shouldn’t pose a major obstacle, according to Anderson.

“The good news is we’ve been dealing with temperature-based logistics with vaccines for quite a while,” he says. “Distribution points have been having temperature controlled environments for quite a while. Even though it’s complicated, to be fair, I think the supply chain side of the house can handle it.”

Related Items:

New Algorithm Aims to Optimize Vaccine Distribution

AI Model Detects Asymptomatic COVID-19 from a Cough 100% of the Time