2012 – SSDs and the Rise of Fast Data

Hadoop-based clusters that co-located X86 processors with cheap spinning disks personified the JBOD (Just a Bunch Of Disks) approach to big data storage. It was a rather inelegant, brute-force approach to creating a single namespace for storing petabytes of data.

Around the year 2012, another hot storage technology began to gain enterprise traction: solid-state drives (SSDs). If a Hadoop cluster was a Mack truck, then a collection of SSD drives was a Ferrari. Yes, the SSDs couldn’t carry a whole lot, and were a lot more expensive. But the SSDs would get you there a lot faster–and from an analytics perspective, it would be a lot more fun.

The emergence of drives based on NAND memory technology would help manufacturers build drives with higher capacity, albeit at the cost of slightly slower access times relative to drives built with NOR memory technology. Steady decreases in the price for SSDs, as well as an increase in capacity to around 500 GB thanks to NAND technologies, would position SSDs well for high-end analytic and transaction workloads.

This was also around the time when disk array makers who were building on faster NAND-based flash drives started adding advanced features, such as deduplication, compression, thin provisioning, snapshots and replication technologies. As these technologies went mainstream, it helped to drop the cost of solid state arrays and make them competitive with traditional storage arrays.

On the vendor front, we saw the emergence of a new crop of all-flash storage vendors, including Pure Storage, which came out of stealth in 2011 and is today is worth about $8 billion. Another vendor called XtremIO, which was founded in 2009 in Israel, was positioning itself to attack the big data market in 2012 when was gobbled up by EMC for $430 million while its flash arrays were still in development. Fusion-io, a flash array manufacture based in Utah went public in 2011 and was acquired by SanDisk in 2014 for $1.2 billion.

Back in 2012, very few people were running SSDs based on the Non-Volatile Memory Express (NVMe) standard, which was just released the year before. But over time, NVMe drives, which plug straight into the PCI bus and therefore bypasses the SAS or SATA interfaces that were used by previous generation SSDs (not to mention traditional spinning disks), would become the dominant type of SSD.

With the advent of multi-level cell (MLC) and 3D NAND architectures, SSDs today are getting higher and higher capacities. 1TB and 2TB NAND drives are fairly common, although the drives do carry a price premium relative to spinning disk. At the far end of the spectrum, the largest NAND drives today can hold more than 100TB of data, which is about 20 times higher than the biggest SSD drives from 2012.

SSDs today threatens to topple spinning disk from the storage hierarchy. With data access rates that are an order of magnitude better than HDDs, today’s NVMe drives are proving extremely popular for the most demanding applications. The biggest data sets are still stored on spinning disk, but many real-time data analytics systems use NVMe drives to store data. With their ability to access data at rates that are several orders of magnitude faster than SATA and SAS-connected drives, NVMe drives have the capability to keep up with demanding analytics workloads.

Today’s SSDs occupy the sweet spot between cold archival storage (spinning disk or tape) and the hottest data (RAM). As data volumes grow and the windows of opportunity for making business decisions shrink, companies need a versatile storage medium that can satisfy a range of requirements. That honor increasingly is falling to the SSD drive, which has become the storage workhorse for fast analytics.

2019 – DataOps: A Return to Data Engineering

2018 – GDPR and the Big Data Backlash

2017 – AI, Deep Learning, and GPUs

2016 – Clouds, Clouds Everywhere

2015 – Spark Takes the Big Data World by Storm

2014 – NoSQL Has Its Day

2013 – The Flourishing Open Source Ecosystem

2011 – The Emergence of Hadoop