ASF Keynotes Showcase How HPC and Big Data Have Pervaded the Pandemic
Last Thursday, a range of experts joined the Advanced Scale Forum (ASF) in a rapid-fire roundtable to discuss how advanced technologies have transformed the way humanity responded to the COVID-19 pandemic in indelible ways. The roundtable, held near the one-year mark of the first lockdowns in North America, opened with a session from Ari Berman, CEO of BioTeam.
“It’s so easy to focus on the bad things we hear about the remarkable and really unfortunate numbers of people who have died from this, the huge numbers of people who’ve been infected from it, we talk about these new more infectious variants, et cetera,” Berman said – but, he added, there were major success stories in the pandemic, too: collaborations and technology deployments that will save “millions of lives.” (To watch the opening session, click here.)
NIH Keynote: Creating a Coordinated Data Approach to Help Address COVID-19
With that, the roundtable launched into its first keynote, delivered by the National Institutes of Health’s Susan Gregurick, who serves as associate director for data science and director of the NIH’s Office of Data Science Strategy.
“We’ve been working for almost a year now to sprint ahead to collect and enhance SARS-CoV-2 data – clinical data, structural data, genomics data – to address the pandemic,” Gregurick said. “The first thing that we tried to do – and we did successfully – was to get different types of at-home, point-of-care clinical testing technologies out into the hands of our citizens.”
This program – called RADx – ranges from preparing for high-throughput COVID-19 testing to engaging underserved populations through community-engaged implementation projects, and it’s one of several data-driven projects run inside the NIH. The NIH has also, for instance, been working on its Collaboration to Assess Risk and Identify Long-Term Outcomes (CARING) for Children with COVID Program.
Still, the NIH needed to develop a longer reach. They worked with the National COVID Cohort Collaborative (N3C), which integrates electronic healthcare record data on COVID-19, augmenting it with “an incredibly rich set of data from vulnerable populations.” As of a few months ago, the N3C has multiple millions of participants contributing data to hundreds of ongoing projects and collaborators. (The data is accessible in a cloud archive, which is accessible here.)
The NIH also worked with the All of Us Research program – which collects longitudinal COVID-19 health outcome data alongside phenotypic and serological data – the BioData Catalyst, which provides data from clinical trials and observational studies such as those that evaluated hydroxychloroquine early in the pandemic.
Soon enough, the NIH found itself serving as an aggregator of a wide range of data from various sources – and having to grapple with the logistical implications of coordinating both the data and access to the data across a wide range of interested parties.
“Making all this work together across many different projects really does require some efforts in data harmonization,” Gregurick said. “We’ve been tackling this in two different ways: … common data elements and mapping to data models. In some cases it’s a development of curation strategies within the data hub, … in other cases it’s at the point of collection and really collaborating with our data coordination centers.”
You can read the rest of this story at HPCwire.