Follow Datanami:
October 30, 2017

The Top Five Reasons to Use Multi-Tier Storage for Managing Scientific Data

According to International Data Corporation (in an IDC White Paper sponsored by Seagate), the amount of data in the world will reach 163 zettabytes by 2025.

Nowhere is data more important than in scientific research. In research, data is the fuel that powers insight, discovery, and innovation. The availability of more data, coupled with high-performance computing technology, is enabling scientists to perform more complex analysis, which is helping lead to breakthrough discoveries.

But as the volume of scientific data reaches petabyte levels, it creates a storage challenge. Research institutions can no longer manage their storage the way they did ten years ago. Storage strategies must change. Here are five reasons research institutions should use multi-tier storage to manage scientific data.

Scalability to accommodate large-scale growth: Thanks to advances in technology, genomic sequencing has become faster and more affordable. As a result, researchers are running more sequencing operations and generating more data. Many institutions have seen data volume reach multiple petabyte levels.

A multi-tier storage strategy blends different storage technologies—like flash or high-performance disk, tape, and cloud—into a seamless infrastructure. Storage nodes can be added quickly when additional capacity is needed. And that enables research institutions to easily manage their growing demand.

High-performance to meet demanding scientific workflows: Scientific workflows are compute-intensive, which puts a lot of stress on computer systems. Leveraging high-performance computing power means more data can be analyzed in less time, which can accelerate the research process.

Storage infrastructure plays a significant role in the performance of computing environments. Achieving high performance requires an infrastructure capable of fast I/O operations without bottlenecks.

A multi-tier storage infrastructure enables research organizations to leverage high-performance disk or flash storage for active files—those files that are part of an active project or are undergoing computational analysis—to maximize the performance of their computing environment.

Shared access to support collaboration: Technology has made it possible for hundreds of scientists to work together on projects and to share information. But scientists may use different client operating platforms (Linux, Mac OS, or Microsoft Windows), reside in different locations, or use different access methods to connect to the storage infrastructure.

A multi-tier storage infrastructure can be designed to support simultaneous access to data files, multiple access methods, and different operating systems.

Backup protection to safeguard research data: When data reaches the petabyte level, traditional data backup operations are unable to handle the volume. And installing secondary storage arrays just to replicate data is an expensive backup solution.

Using a multi-tier approach, data can be protected more economically. Policies can be established so that critical data sets are copied to another disk array, or to a lower cost form of media like tape or cloud. This ensures data is protected, using the most affordable storage tier, and can be restored quickly in case of a hardware failure.

Cost management to deliver storage capacity in a cost-effective way: Data files age over time and become inactive. In fact, on average, about 70-80 percent of stored data files  are not actively used. Storing inactive files on high-performance storage media is expensive and unnecessary.

A multi-tier storage infrastructure is a better approach. As files age or become inactive, they are moved off of higher priced storage tiers and archived on lower cost media. This enables research institutions to optimize their storage costs while still delivering the total storage capacity they need to support their research operations.

For more information on the benefits of multi-tier storage, visit