Follow Datanami:
January 15, 2020

NOAA Updates Its Massive Climate Dataset

Oliver Peckham

As Australia burns, understanding and mitigating the climate crisis is more urgent than ever. Now, by leveraging the computing resources at the National Energy Research Scientific Computing Center (NERSC), the U.S. National Oceanic and Atmospheric Administration (NOAA) has updated its 20th Century Reanalysis Project (20CR) dataset, allowing researchers access to a massive, high-resolution repository of climate estimates dating back to 1806.

NOAA’s 20CR project, which is now in its 13th year, uses a variety of data, including surface pressure, sea temperature and sea ice observations, to reconstruct daily records (and in some cases, near-hourly records) of the global climate. In order to reach back 214 years, 20CR had to get creative – incorporating, for instance, weather logbooks from 19th-century ships. 

The resulting output includes estimates of a wide variety of climate variables, including cloud cover, temperature, winds, moisture and solar radiation, serving as a major boon for researchers in a wide variety of disciplines. “This tool also lets us quantitatively compare today’s storms, floods, blizzards, heat waves, and droughts to those of the past and figure out whether or not climate change is having an effect,” said Gil Compo, a scientist with the Cooperative Institute for Research in Environmental Sciences (CIRES) at NOAA who leads the 20CR project, in an interview with NERSC.

The new iteration of 20CR, called 20CRv3, includes “millions more” observations than was possible in previous versions, with an emphasis on earlier time periods. “The atmospheric estimates from 20CRv3, as well as their uncertainties, are much more reliable than those from the previous reanalysis, particularly in the 19th century,” said Laura Slivinski, a CIRES meteorologist and lead author of a recent paper highlighting the improvements in 20CR. “We’re more certain about how much we know, and where we need to know more.” 

The improved scale of the model necessitated major compute power. NOAA turned to NERSC, using 600 million CPU hours (an order of magnitude higher than the time needed by previous iterations) on 6,600 cores of its Cori KNL (Knights Landing Phi-based) supercomputer. “We needed many more NERSC hours because we now have 80 ensemble members at four times the horizontal resolution, 2.5 times the vertical resolution, and two times the temporal frequency,” Compo said. “We used to be on a 2-degree latitude-by-longitude grid; we’re now on a 0.7-degree latitude-by-longitude grid, which means we went from a grid spacing of about 140 miles to 48 miles. So yes, we are doing more science.”

After the Cori KNL supercomputer at NERSC processed NOAA’s 21 petabytes of input data, the resulting 20CRv3 dataset was stored on NERSC’s High Performance Storage System, where it is now publicly available. “The research opportunities this work makes available are almost boundless,” Compo continued. “We’re throwing open the door to lost history and inviting scientists to pour through.” 

“This dataset will keep getting better as we unlock more observations from historical archives,” Compo said. “It’s really a virtuous circle.”