Storage in the Exabyte Era
Back in 2010, a petabyte was considered a lot of data for a single company to have. But today, petabyte is child’s play, and the biggest enterprises are managing data stored in the exabyte range. Finding economical ways to store all that data – let alone manage it in a way that lets enterprises get value from it through analytics — will require all the tricks we’ve learned up to this point, as well as a few new ones.
Getting a firm grasp on the amount of data that is generated and stored every day, week month, and year is a notoriously difficult task. It’s been said that Facebook generates 4PB of data per day, most of which are photos and videos. A single connected car generates over 120TB of data per month, while the entire fleet of wearable devices will generate 335PB of data per month in 2020, according to Statista.
While the biggest individual companies have hundreds of petabytes or possibly more than an exabyte of stored data, the total for everybody is well into the zettabytes. In November 2018, IDC forecasted that the “global datasphere” (or the amount of data generated) would measure 33ZB. The company also stated that, from 2018 to 2025, the storage industry would ship more than 22ZB of capacity across all media types, with about 2ZB shipped in 2020 and close to 5ZB in 2025. What’s more, IDC predicts that 59% of the shipped storage capacity would come from hard disk drives (HDDs).
Clearly, storage in the exabyte age won’t start out looking dramatically different than the recent past. There will be a healthy dose of those familiar HDDs whirring in their cases, and tape is also not going away anytime soon. But the exabyte age will start to look different as newer technologies start to infiltrate the storage sphere and data generation patterns demand different approaches to storing and processing data.
The public cloud is growing quickly at the moment, as companies explore new modes of supporting business processes, transition IT spending from Capex to Opex, and drive digital transformation strategies (including big data analytics and ML). Last year Gartner pegged the entire worldwide public cloud market as a $182.4-billion business in 2018, growing at a 12.6 CAGR to $331 billion by 2022. A good chunk of that (though not all of it) is from storage.
While cloud hyperscalers are strategically buying scads of spinning disk to deliver the massive capacity and efficiency that their storage customers demand, enterprises are investing in faster and denser NVMe storage for more tactical storage applications, IDC says. “The growth in endpoint and edge storage will favor solid state, while the core continues to have a voracious appetite for the economical bytes that hard disk drives and tape provide,” the analyst firm says in its 2025 DataSphere report.
TrendForce, a market research firm, estimates that Seagate owns 41% of the market for HDDs, followed by Western Digital at 38% and Toshiba at 21%. While HDD’s share of the overall storage market is declining – and the total number of drives is declining, too — the total capacity of HDD drives is growing steadily, and is forecast to continue growing for the foreseeable future.
During its recent Field Days event, Western Digital shared how it’s looking to continue pushing the limits of what HDDs can offer. The company is already using helium-filled drives to bolter capacity. Currently, it’s using Shingled Magnetic Recording (SMR) to enable data tracks on HDDs to be layered on top of each other, boosting areal density.
Both Seagate and Western Digital are looking to ship 50TB drives, and eventually 100TB beyond that. Western Digital shared details of its strategy, which is to leverage energy-assisted perpendicular magnetic recording (ePMR) technologies, such as Microwave-Assisted Magnetic Recording (MAMR) and Heat-Assisted Magnetic Recording (HAMR) technologies, to boost the areal density.
“Currently our game plan is … we will deliver SMR and CMR together,” said Carl Che, vice president of HDD Technology for Western Digital. “We have two product generations planned. Beyond that we’ll go to the new energy-assisted technology. We really think with the new technology we’ll be able to reach 50TB in the second half of the decade.”
Western Digital is also working on an architectural innovation called “Zoned Storage” that it thinks will enable better scale and efficiency of storage for both traditional HDD and new NAND technologies.
Essentially, by standardizing the randomization layer that’s built into traditional HDD controllers (and which exist on NAND controllers too), the company is able to create “zoned” areas in the media that are basically distinct namespaces. This technique reduces cost by reducing the need for DRAM resources for each NAND drive, boosts the efficiency of storage, reduces latency, and increase drive endurance by cutting write amplification.
“It allows the customer to control where the data gets located on the SSD,” says Richard New, Western Digital’s vice president of research. “It allows the customer to extract more value from the device by improving the software stack and the interface.”
In terms of architecture, the shift to cloud storage and hybrid computing arrangements is impacting how it all the storage pieces fit together. Enterprises today are demanding the separation of storage and compute, thereby allowing them to grow both separately. This software-defined storage trend has been ongoing for years, but it’s picking up speed as the market has decided that it wants to consume enterprise IT resources using containers.
Much of the growth in data generation (and data storage requirements) comes from unstructured data. Enterprises are on pace to triple the amount of unstructured data stored in file or object storage systems over the next four years, according to Gartner’s recent Magic Quadrant for Distributed File System and Object Storage. What’s more, it found that 40% of infrastructure and operations (I&O) leaders will implement at least one hybrid cloud storage architecture by 2024, up from just 10% last year.
Perhaps the industry’s most poorly kept secret is that much of the unstructured data exists in the form of videos. “Video, as you all know, is driving a tremendous amount of storage,” Yusaf Jamal, Western Digital’s senior vice president of devices and platforms business said at his company’s recent Storage Field Day.
Jamal also identified the rollout of the 5G wireless network as a potential game-changer in data storage and access patterns. “In addition to the core applications like video, we have connected endpoints happening,” he says. “As technology is adapting to how we live in our society, how our health is transitioning to digital health, how are factories are transitioning to smart factories, we have these connected endpoints generating more and more amounts of data that we have to consume.”
All this activity is pushing the estimates for the amount of data that the world will be generating. IDC’s current estimate is that the human race will be sitting on a pile of 103ZB of data by 2023. That creates an extraordinarily rich opportunity for the whole industry, Jamal says.
“As we start looking at how businesses transform themselves and start to get more insights from data, just generating data isn’t enough,” Jamal says. “Having the ability to communicate that data at high speed over wireless and wired technology is a key factor to unlocking the insights that data provides.”