July 1, 2013

Achieving Low Latency: The Fall and Rise of Storage Caching

Tony Afshary

The caching of content from disk storage to high-speed memory is a proven technology for reducing read latency and improving application-level performance. The problem with traditional caching, however, is one of scale:  Random access memory, typically used for caches, is limited to Gigabytes, while hard disk drive-based storage exists on the order of Terabytes. The three orders of magnitude difference in scale puts a practical limit on the potential performance gains. Flash memory has now made caching beneficial again owing to its combination of low latency (on a par with memory) and high capacity (on a par with hard disk drives).

A Brief History of Cache

The caching of data from slower media to faster ones has existed since the days of mainframe computing, and quickly made its debut on PCs shortly after they entered the market. Caching also exists at multiple levels and in different locations—from the L1 and L2 cache built into processors to the dynamic RAM (DRAM) caching in the controllers used with storage area networks (SANs) and network-attached storage (NAS).

The long, widespread use of caching is a testament to its benefit: dramatically improving performance in a transparent and cost-effective manner. For example, PCs constantly cache data from the hard disk drive (HDD) to main memory to improve input/output (I/O) throughput. I/O to main memory takes about 100 nanoseconds, while I/O a fast-spinning HDD takes around 10 milliseconds—a difference of five orders of magnitude.

In this example, the cache works by moving the data and/or software currently being accessed (the so-called “hot data”) from the HDD to main memory. The operating system’s file subsystem makes these movements constantly and automatically using algorithms that detect hot data to improve the “hit rate” of the cache. With such behind-the-scenes transparency, the only thing a user should ever notice is an improvement in performance after adding more DRAM.

The data deluge impacting today’s datacenters, however, is causing traditional DRAM-based caching to become less effective. The reason is that the amount of memory possible in a server or a caching appliance is only a small fraction of the capacity of even a single disk drive. Because datacenters now store multiple Terabytes or even Petabytes of data, and I/O rates are increasing with more applications being run on virtualized servers, the performance gains from traditional forms of caching are becoming increasingly insufficient.

Fortunately, there is now also a solution to overcoming the limitation being imposed by traditional DRAM-based caching: flash memory.


Figure 1: Flash memory fills the void in both latency and capacity between main memory and fast-spinning hard disk drives.


Cache in a Flash

As shown in Figure 1, flash memory breaks through DRAM’s cache size limitation barrier to again make caching a highly effective and cost-effective means for accelerating application-level performance. Another important advantage over DRAM is that flash memory is non-volatile, enabling it to retain stored information even when not powered.

NAND flash memory-based storage solutions typically deliver the highest performance gains when the flash cache is placed directly in the server on the high-performance Peripheral Component Interconnect Express® (PCIe) bus. Even though flash memory has a higher latency than DRAM, PCIe-based flash cache adapters deliver superior performance for two reasons. The first is the significantly higher capacity of flash memory, which substantially increases the hit rate of the cache. Indeed, with some flash adapters now supporting multiple Terabytes of solid state storage, there is often sufficient capacity to store entire databases or other datasets as hot data.

The second reason involves the location of the flash cache: directly in the server on the PCIe bus. With no external connections and no intervening network to a SAN or NAS (that is also subject to frequent congestion), the hot data is accessible in a flash (pun intended).

Intelligent caching software running on the host server detects hot data blocks and caches these to the flash cache. As shown in Figure 2, the caching software is located between the file system and the storage device drivers. Direct-attached storage (DAS) and SAN use existing drivers; the flash cache card has a Memory Pipeline Technology (MPT) driver. As hot data “cools” the caching software automatically replaces it with hotter data. 

Figure 2: The intelligent caching software operates between the server’s file system and the device drivers to provide transparency to the applications.


The intelligent caching software normally gives the highest priority to highly random, small I/O block-oriented applications, such as those for databases and on on-line transaction processing (OLTP), as these stand to benefit the most. The software detects hot data by monitoring I/O activity to find the specific ranges of logical block addresses (LBAs) that are experiencing the most reads and/or writes, and moves these into the cache.

By contrast, because applications with sequential read and/or write operations benefit very little from caching, these are given a low priority. The reason is that 6 Gigabit/second (Gb/s) Serial-Attached SCSI (SAS) and Serial ATA (SATA) HDDs can achieve a satisfactory throughput of up to 3000 Megabytes/second (MB/s), and roughly double that with 12 Gb/s SAS.

Most PCIe flash adapters contain at least two SSD modules to support RAID (Random Array of Independent Disks) configurations. In the unprotected RAID 0 mode, data is striped across both SSD modules, creating a larger cache. In the protected RAID 1 mode, data is mirrored across the SSD modules so that in the event one fails, the other has a complete copy.

Any data written to the flash cache must also be written to primary DAS or SAN storage, and there are two ways this can occur. In Write Through mode, any data written to flash is simultaneously written to primary storage. Because most applications will wait for confirmation that a write has been completed before proceeding, this increases I/O latency. In Write Back mode data is written only to an SSD, or when using mirroring, both SSDs, allowing write operations to be completed substantially faster. All writes are then persisted to primary storage when the data cools and is replaced in the cache. Write Through mode can safely use a RAID 0 configuration of the flash cache; Write Back mode should employ a RAID 1 configuration for adequate data protection. 

— NEXT — Benchmark Test Results — >

Benchmark Test Results

LSI® has conducted extensive testing of application acceleration solutions under different scenarios to assess improvements in I/O operations per second (IOPs), transactions per second, user response times and other performance metrics. For I/O-intensive applications, these tests reveal improvements in performance ranging from a factor of 3x to an astonishing factor of 100x. Reported here are the results of one such test. 

This particular test evaluates both the response times and transactional throughput of a MySQL OLTP application using the SysBench system performance benchmark. The basic configuration is a dedicated server with DAS consisting entirely of HDDs. The flash cache is a 100 Gigabyte Nytro™ MegaRAID® 8100-4i PCIe adapter with the Nytro XD intelligent caching software running in the host. Four different flash cache configurations are used based on a combination of write modes (Write Through or Write Back) and RAID levels (0 or 1). 

Figure 3: Response times (in milliseconds) were reduced by 65 percent using the flash cache in Write Back mode with RAID 1 protection.


The “No SSD” results shown in Figures 3 and 4 are for the baseline configuration using HDDs with no flash cache. In Write Through (WT) mode, all write operations are made directly to the HDDs, which limits the performance gains to only about 20 percent. In Write Back (WB) mode, writes are made to the flash cache, resulting in a response time improvement of up to 80 percent, as shown in Figure 3. But because data protection is prudent with WB mode (as no protection would require using transaction logs to recover from an SSD failure), a more realistic improvement would be 65 percent for the flash cache configured with RAID 1 protection. 

Figure 4: Transactions per second increased by a factor of 3 using the flash cache in Write Back mode with RAID 1 protection.


As with response times, transactions per second (TPS) throughput rates improve dramatically when the flash cache is used for both reads and writes. And for some applications, the benefit of the 5-times improvement in TPS shown in Figure 4 might outweigh the exposure from a lack of data protection, particularly given the high reliability of flash memory. But even with RAID 1 protection, TPS throughput increases by a factor of 3 over the “No SSD” configuration.

These tests show that even a relatively modest amount of flash cache (100 Gigabytes) can deliver meaningful performance gains. Tests with 800 Gigabytes of flash reveal an improvement of up to 30 times in SAN environments for some applications.


The size of a cache relative to the size of the data store is a key determining factor in its ability to improve performance. This is the reason DRAM-based caches, limited to Gigabytes of capacity, have become less effective under the growing data deluge. With SSDs and PCIe flash adapters now supporting Terabytes of capacity, the size of the cache becomes considerably greater relative to the data store, which makes caching proportionally more effective.

Another determining factor is the nature of the target application. I/O-intensive applications that involve random read/write access stand to benefit substantially, while those accessing data sequentially, especially in large blocks, stand to benefit little, if at all.

The final determining factor is the caching software’s ability to maximize the hit rate by accurately identifying the hot spots in the data, as these are constantly changing for applications with random I/O operations. Most do a fairly effective job, and the larger flash cache capacity now makes this a less critical factor.

Although a flash cache inevitably offers at least some improvement in performance, the extent of the gain might not be cost-justifiable. Fortunately there are free tools available that can predict the performance gains possible on a per-application basis. These tools employ intelligent caching algorithms, similar to what is actually used in the cache, to evaluate access patterns and provide an estimate of the likely improvement in performance.

The opportunity to achieve substantial gains, combined with the ability to quantify the potential benefit in advance of making any investment, make flash caching solutions an option worthy of serious consideration in virtually any datacenter today.


About the Author

Tony Afshary is the Business Line Director for Nytro Solutions Products at LSI’s Accelerated Solutions Division. In this role, he is responsible for Product Management & Product Marketing for LSI’s Nytro Family of enterprise flash based storage, including PCIe based Flash, utilizing seamless and intelligent placement of data to accelerate data-center applications.


 Related Items:

Yahoo! Spinning Continuous Computing with YARN

Gartner’s Adrian Raps on Big Data’s Present and Future

Hortonworks Previews Future After Massive Funding Haul

Tags: , ,

Share This