Follow Datanami:
September 16, 2014

Tape Gets Second Wind as Big Data Mounts

Think tape is dead in our big data world? Think again. This week, the National Center for Supercomputing Applications (NCSA) announced that it bought 20PB of tape capacity to expand the world’s biggest data archive. Meanwhile, the LTO Program has plotted out a roadmap for the next decade that will eventually see a single LTO tape cartridge storing 120 TB of data.

While tape is looked down upon in our speed-obsessed culture, the old standby has proven its relevance in the biggest enterprise deployments time and time again. In fact, if you have extreme data retention needs, a modern tape environment may be the best game in town.

That’s the conclusion one would draw from the NCSA, which two years ago selected Spectra Logic to supply a massive tape installation for Blue Waters, the massive Cray cluster installed at the University of Illinois at Urbana-Champaign. The installation, completed about a year ago, consisted of four Spectra Logic T-Finity libraries equipped with 244 IBM TS1140 enterprise tape drives, providing more than 300 PB of storage capacity, or roughly equivalent to 10 percent of all the words ever spoken in the entire existence of humankind.

The IBM TS1140 is a high-end tape drive that is faster, holds more data, and has fewer errors than LTO gear.

The IBM TS1140 is a high-end tape drive that is faster, holds more data, and has fewer errors than LTO gear.

“A year ago it was the largest archive in the world, supporting between 300 PB to 500 PB of data,” explains Spectra Logic executive vice president of worldwide sales Brian Grainger. “This most recent 20 PB expansion is in line with their growth expectations of users and customers.”

Blue Waters is used by researchers in academia and industry, and is called upon to power everything from seismic and atmospheric simulations to models of the cosmos and complex biological systems. Scientific discoveries are routinely made using the Blue Waters resource, and big data is a huge part of that.

Before a simulation is run on Blue Waters, the source data is loaded from the Spectra Logic tape archive and cached into a 10 PB tier of disk that’s co-located in the Spectra Logic libraries. The data then moves, via an 8Gbps Fibre Channel connection, from the speedy disk tier into Blue Waters, which sports 1.5 PB of memory and can calculate 13 quadrillion calculations per second.

In the commercial and enterprise space, the LTO Program continues to set the bar for big and fast enterprise tape technology. Last week the group—which is led by IBM, Hewlett-Packard, and Quantum–announced that it added two more generations to its roadmap, assuring users that plans are in place to ship LTO gen 9 and 10 at some point in the future.

With LTO Gen 9, a single cartridge will sport 26 TB of native capacity, and a drive will be able to write at speeds up to 708 MB per second. The sizes and speeds increase with LTO Gen 10, which will offer 48 TB of native capacity and an 1,100-MB-per-second transfer rate.LTO

When LTO Gen 10 drives ship–which will likely happen anywhere from eight to 12 years from now, considering the 24 to 36 month release cycle–then data will be flowing 10 times faster than it does under current LTO Gen 6 gear. (The LTO Gen 7 gear, by the way, is due out in late 2015, according to the LTO Program).

That should be fast enough for all but the biggest and most demanding backup and archiving jobs. “There is this misconception in the market that tape is not as fast as disk,” says Spectra Logic’s Grainger. “Typically where the bottleneck is–where people may have the perception that one technology a little slower than the other–is the network or the server.”

Of course, if you look at the specs, even the slowest spinning disk is going to deliver more I/O than the fastest tape drives, such as the IBM TS1140s. But in the real world, poor network architectures and the inability of applications to saturate links during backup jobs contribute to the perception that tape is slow. If you need fast random access to data, disk will always outperform tape, but for writing big data sets that don’t need to be accessed repeatedly, tape still has a lot going for it.

The world’s biggest clusters, such as Blue Waters, are adequately designed to maximize the capacity of tape, but Grainger sees a world where tape will continue to make its mark at smaller commercial installations too.

“No one’s deleting anything anymore,” he says. “And we can’t just ignore the fact that data is growing at a rate that nobody anticipated…There’s no other technology that performs as well as tape does, that’s inexpensive, and has the shelf life.”

Related Items:

Spectra Looks to Drive Tape Storage Into Hadoop

Tape Archives at Work in HPC Environments

 

Datanami