Follow Datanami:
November 5, 2012

Tape Archives at Work in HPC Environments

The Blue Waters project is designed to meet the compute-intensive, memory-intensive, and data-intensive needs of a wide range of scientists and engineers. Scientists will use the Blue Waters supercomputer for a diverse set of applications such as improving understanding of hurricanes and tornadoes, analyzing complex biological systems, studying the evolution of the universe and simulating complex engineered systems like the power distribution system in airplanes and automobiles.

The National Center for Supercomputing Applications (NCSA) selected Spectra Logic T-Finity tape libraries to provide 100% of the near-line data storage needed for the Blue Waters supercomputer based on scalability, data integrity, enterprise technology, reliability and density.

Spectra Logic Fig 1

Bill Kramer, deputy director of the Blue Waters project at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign, says, “The Spectra Logic T-Finity met our rigorous requirements with its high enterprise-level performance, ready data accessibility and massively scalable capacity. We are confident it will provide our user community with fast, reliable access to the massive volumes of critical data stored within Blue Waters’ petascale near-line file repository.”

Spectra T-Finity tape libraries provide the Blue Waters project with the ability to keep all near-line data accessible in an active repository, perform automated data integrity verification for the data store and deliver high performance, sustained read/write rates of up to 2.2 PBs per hour.

Extremely Scalable

Designed for intelligent expansion with the purpose of accommodating massive data sets and growth, Spectra libraries quickly and easily transform to meet the data center’s evolving needs, providing cost-effective storage at all times.

The modern architecture of T-Finity allows the product to scale to meet individual requirements. In its first year of operation, Blue Waters will use four 19-frame T-Finity tape libraries to store 253 raw petabytes (PB) of data. In year two, the operation will scale to 380 PB by adding two additional 19-frame T-Finity libraries, making the Blue Waters system one of the world’s largest active file repositories stored on tape media. T-Finity scales to 400,800 LTO slots and 304,920 TS1140 technology slots in a library complex, resulting in a theoretical capacity of 2.5 EB with LTO-6 and 3.6 EB with TS1140.

To accommodate customer’s floor space needs, the T-Finity is designed with an any-frame, any-location mentality. Configure master, media and drive frames in any arrangement to design a library that will serve unique requirements. With T-Finity, Blue Waters storage administrators do not need to schedule library downtime to add capacity to their tape library. Service frames are added on both ends of the library to support any necessary hot replacement of robotic assemblies during operation. Additionally, the service bays offer bulk entry and exit capability and the ability to add additional frames for more capacity without powering off the library.

Highest Storage Density

The Blue Waters 380 PB repository will feature an integrated storage environment that is the equivalent of a stack of books over 26 times the distance from the earth to the moon. Spectra’s industry-leading storage density saves data center space, based on TeraPack architecture which gives Spectra libraries the smallest footprint and the highest density in the industry. T-Finity provides up to 71% improvement in data center floor space utilization. No other library on the market scales this much and this easily with a modern design that fits into any data center’s physical layout.

Data Integrity Verification

Ensuring data is accessible and retrievable when it’s needed is essential to business operations. NCSA must have the assurance of healthy data upon retrieval, regardless of where it’s stored. Spectra provides that assurance with automated Data Integrity Verification (DIV) and the BlueScale management interface that offers a sophisticated suite of standard features that allow administrators to actively check data once it is written to tape.

Spectra libraries host features that enable the library to not only check data integrity once data is written to tape, but to check the health of the tape itself. These tools are unique since they can run automatically, as scheduled, and without requiring a separate partition. Each process is performed by the library, independent of the application that is used to read and write data to the tape.

PreScan ensures that tapes are usable and can accept data. This function checks each imported tape and verifies the tape can be written to, scanning the tape for potential issues including broken or dislodged leader, poor media health and write-protected status.

QuickScan confirms that a single track can be read. It works by scanning a tape uni-directionally by reading the length of one track of the tape to provide a rapid indicator of integrity of data written.

PostScan checks an entire tape to ensure all sectors can be read and confirms there are no media errors on the tape by reading the entire length of the tape up to the end of the recorded data.

Enterprise Technology

T-Finity is an intelligent library that provides modern design with a highly flexible hardware architecture that gives users options to meet enterprise performance requirements. To help NCSA meet their requirements, Spectra provided the Blue Waters project with TS1140 technology tape drives to meet their performance goals and an economical and long-term data retention storage option. TS1140 Technology tape drives provide the robust reliability and availability necessary in 24×7 duty cycle environments.

“Integration of the T-Finity with IBM’s enterprise TS1140 Technology tape drives was a critical component to address the needs of the Blue Waters project. Given Spectra’s support of TS1140 Technology and proven storage solutions, Spectra Logic was clearly the ideal solution to meet Blue Waters’ high performance, data-intensive storage needs,” said Joe Fannin, president of NET Source.

High Availability & Reliability

Designed for the most uptime-sensitive data intensive clients, T-Finity’s fault-tolerant, redundant-component design delivers 99.99% hardware reliability. Highly available hardware paired with superior library, drive, media and health monitoring tools provide unmatched data integrity and availability. These features are exactly what the Blue Waters project needed in order to meet the compute-intensive, memory-intensive, and data-intensive needs of the international community of scientists and engineers NCSA serves.

Blue Waters is three to five times faster than any online file system out there today with greater than 1 TB per second transfer rate. Accessibility to that data is essential. Molly Rector, CMO at Spectra Logic, states, “TS technology tape drives are 5 orders of magnitude more reliable than disk, at 10-19 of undetected bit error rates.”1

Spectra Logic Fig 2

“Drives used in all enterprise tape libraries support an extremely low error bit rate, with anywhere from 10-17 to 10-19, depending on the drive type, with SATA drive error rate at 10-15,” storage analyst Curtis Preston writes. “While 10-15 may look really close to 10-17, it’s not. When it’s bits we’re talking about, it’s the difference between 113 TB and 11.1 PB! It means you are 100 times more likely to have bad data on a SATA disk than you are on an LTO-5 tape drive, and 10,000 times more likely than if the data is stored on a T1000C or TS1130 drive!”2

To help Blue Waters achieve even further reliability, Spectra libraries provide tools to more efficiently manage media, drives and hardware across the library.

Media Lifecycle Management (MLM) maintains the integrity and availability of data by monitoring and reporting on over 40 metrics throughout the life of each tape. MLM removes the question of media reliability, ensuring the safety of data by providing continuous assessments of the media’s health. MLM also allows users to proactively migrate data from older media before its health becomes questionable.

Drive Lifecycle Management (DLM) is the perfect complement to MLM. This feature tracks and reports on expected utilization thresholds and other health variables, and proactively notifies so that drives may be replaced prior to a hard failure.

Library Lifecycle Management (LLM) delivers information on the health of vital library hardware components. It tracks usage statistics such as the number of moves, cycles, and distance moved then compares that data to expected useful life metrics for individual components. LLM reduces the uncertainty around timing decisions regarding replacing batteries and filters or service transports to keep the library performing in top condition.

Summary

NCSA had very precise requirements for the Blue Waters project and needed a near-line/archive environment that would be highly reliable, accessible, scalable, dense and fast. With features that met their capacity and performance requirements, the T-Finity tape library was the ideal fit. With Spectra’s focus on scalability, data integrity tools, enterprise technology, reliability, accessibility and density it remains the industry leader in HPC archives.

1 Bruckner, Rich. “The Rich Report: Interview with Molly Rector and Michelle Butler” http://www.spectralogic.com/common/podcasts/SpectraBlueWatersPodcast.mp3
2 Preston, Curtis. “Have we put Tape out to Pasture too soon?” Backup Central Blog, April 21, 2011.

Datanami