Follow Datanami:
September 11, 2014

What’s Driving the Explosion of Government Data?

Leo Leung

Most people already know that the world produces massive amounts of data every day, and government organizations are a huge source of that data. For instance, it is estimated that next year the Department of Energy will create 300 terabytes of data per day from analyzing light sources, and by 2020, the department will create 15 petabytes of data per year analyzing high-energy physics.

That’s a lot of data. But where does all this data come from? The amount of data the government is producing is so substantial because it comes from a variety of sources that consume a lot of storage space, from simulations to genetic sequences and physical observations to video. This data is typically used for raw analysis, but also for sharing data sets and research findings to drive additional analysis, both in public and private sectors.

High-resolution, high-speed photography, high-resolution video, and digital audio also account for much of the data load. These media tools are advancing material analysis, national security, climatology and more – in fact, NOAA has an entire list of tools it uses on its site.

Government Data Is Off the Scale

The scale of the data collected by government organizations is truly massive. Here are just a few examples of government organizations and projects that produce data on this scale:

  • Advanced genomic sequencers like Illumina’s HiSeq X Ten will generate terabytes of data per day;
  • Berkeley Lab’s Advanced Light Source (ALS) light source experiments could generate over 100 terabytes a day, some saying over 300 terabytes a day;
  • With tools like satellites and sonar, NOAA’s former CIO, Joe Klimavicz, predicted that by 2020, NOAA could be collecting as much as 800 terabytes of data a day and storing over 100 exabytes.(story continues below infographic)

Scality InfographicBig Data with Big Benefits

All of this data means nothing if you can’t use it. The petabytes and exabytes of data need to be stored, protected, and easily accessible. Moreover, organizations need to be able to leverage both new and archived data with a limited IT budget. With today’s new infrastructure technologies, like software-defined storage leveraging super-dense hardware, hosted either internally or public-facing, this is entirely possible. As noted in this recent Datanami article, government organizations like the Department of Commerce are making more data available to the public with a goal of driving US economic growth.

To do this we need to address traditional boundaries of storage. By connecting storage directly within the supercomputer to scale out networked storage, government organizations can handle more than 20 times the capacity of data. In fact, Los Alamos National Laboratory (LANL) made that move in response to its growing data storage needs.

LANL is tasked with ensuring the safety of the US nuclear stockpile, which requires the lab to run complex computer simulations. LANL’s supercomputer-based simulations run for months on end and generate several petabytes of data. To maintain an active archive of these test results, LANL needed a scalable software-defined storage strategy with cost-effective data protection using erasure coding. As a result, LANL deployed the Scality’s RING software to store the up to 500 petabytes of data it expects its computer simulations to generate.

In the end, government agencies and organizations are going to need to figure out what they want to do with all of this data they are generating. With the right infrastructure in place these organizations can milk the data for all it’s worth, and with that data possibly change the world.


About the Author: Leo Leung is vice president of marketing at Scality, where his responsibility extends from corporate positioning to overall content strategy across all channels. Leo was previously at Oxygen Cloud and EMC, and writes about Scality and the technology market on the company’s blog, LinkedIn, and Leo has an M.B.A. from the Tepper School of Business and a B.A. from Tufts University. His Twitter handle is @lleung.

Related Items:

Software-Defined Storage Takes Off As Big Data Gets Bigger

Climate Researchers Crunch Data on Weather Extremes

Astronomical Algorithm Powers Data Analytics Startup