Disruptive Economics and Democratization of Data
It seems like only a few years ago that managing data growth was a high-priority objective for most users. What happened? The short answer is the same thing that happened to oil. Data is the new oil.
Before we learned how to use petroleum-based products effectively, oil was little more than a pollutant. It killed crops, poisoned animals and made land worthless. Then we learned how to lower the cost of extracting, refining it into kerosene for lighting, gasoline for cars and planes, and lubricants. Costs dropped, a competitive market evolved, and oil became the new gold.
That paradigm is now repeating itself with data. We have lowered the cost of storing, managing, and using data to the point where economics and the size of the problems we are tackling have made it an indispensable tool. Today simulations/modeling save engineering firms time and money by enabling engineers to explore innovative alternatives, optimize designs, and lower costs before committing a design to prototyping or manufacturing. For example, changing the logic in an ASIC is expensive in time and money because the chip layout has to change, new masks have to be created, and then the semiconductor vendor has to build and package the new ASICS. This can take months and can result in schedule slippages and cost overruns. Simulating an ASIC’s logic typically takes hours and doesn’t involve any other expenses except simulator time.
This disruptive lowering of costs has lowered the cost of building fraud detection systems that work in near real time. Stated differently, lowering the cost of fraud detection has lowered the threshold of acceptable fraud levels. Ditto the use of analytics and AI to build better decision support systems. In insurance the democratization of data has enabled AI to improve claim approvals and identify new market opportunities. Auto insurance companies are now offering lower rates to customers that allow them to monitor their driving patterns. Health care companies can use advanced technologies to monitor your exercise activities. Facial recognition systems can help stop terrorists. Can anyone imagine trying to map the human genome which consists of approximately 3 billion base pairs without affordable storage and compute power? The possibilities are endless, limited only by our imagination and economics.
Acknowledging that the ultimate value of data is unknowable until after it has been deleted because future use cases are unknowable creates a built-in bias toward retaining data that may be of value in the future. This data growth prioritizes simplicity, scale, manageability, performance, cost, and nondisruptive refreshes in future storage infrastructure designs over the need to protect prior investments in hardware, training, policies and procedures, and fears of conversion cost overruns or failures.
Storing data that may be of value in the future means that it will be stale data until it is used and then probably afterwards. Holding costs down demands that any new storage infrastructure supporting data democratization projects should utilize both expensive high-performance and low-cost high-capacity storage technologies to keep scale affordable.
As explored below the order of magnitude (10X) difference in cost between high-performance (Flash) and high-capacity (HDD) media certainly qualifies as disruptive economics. It leaves most users with no choice but to use both types of technology.
Flash vs HDD $/TB Cost Forecast
All Flash array (AFA) vendors claim that because NAND Flash memory prices are falling dramatically it is only a matter of time before HDDs are retired into the mists of history. They may eventually be right, but not for a long time because flash and disk vendors are both in a race to drive down costs.
Flash vendors are cutting costs by increasing the number of bits stored per cell, developing vertical (3D) manufacturing processes that consume less wafer real estate per cell than horizontal cell geometries, and focusing on increasing cell write cycle ratings. The visible result of these technology improvements has been an evolution of NAND flash chips from SLC (single-level cell or 1 bit/cell) to MLC (multilevel cell or 2 bits/cell) to TLC (triple level cell or 3 bits/cell), to QLC (quad-level cell or 4 bits/cell) if endurance and performance problems can be overcome.
HDD vendors are responding to these flash improvements with areal density increases, higher HDD MTBFs, and smaller form factors that decrease power requirements. Areal density increases are being driven by the development of shingled media, HAMR (Heat Assisted Magnetic Recording), and MAMR (Microwave Assisted Magnetic Recording). These areal density improvements are lowering bit costs by enabling the use of smaller form factor HDDs that reduce platter costs, a major contributor to HDD manufacturing costs; r/w heads are the other big contributor. Reducing the HDD form factor and RPMs also dramatically reduces the power consumed by the motor spinning the platters: the major power consumer in an HDD. (For the curious, motor power needs decrease with the square of velocity which is D X RPMs with D being the platter diameter.)
Analysts that track the NAND flash and HDD technology improvements, shipment rates, and prices declines and vendors that participate in both markets are forecasting that raw NAND flash on a $/TB basis will remain approximately 10X more expensive than HDD capacity through 2028. Add that most users would agree that roughly two-thirds of their online data is stale, and it becomes very difficult to argue that stale data should be stored on expensive flash arrays. Indeed, the easier argument would be to increase the back-end flash to HDD capacity ratio to make it easier for hybrid array cache management algorithms to keep hot data in flash.
The alternative to using hybrid arrays is to build a storage infrastructure with AFA and HDD storage arrays and either:
- Overlay those arrays with a layer of virtualization software that takes ownership of data placement, load balancing, and tiering (part of QoS) away from the arrays. That is, to create a virtualized storage infrastructure.
- Deploy an archiving solution that migrates stale data off expensive AFA and onto low cost hybrid or HDD only arrays
Virtualization software can be deployed in an in-band virtualization appliance, in the application server software stack, or in the arrays. While this approach has the appeal of being able to, at least in theory, optimize data placement at the infrastructure level rather than the array level and work with existing infrastructures, the market has not embraced in-band virtualization appliances or virtualization software running on application servers for a variety of reasons.
In-band virtualization appliances have fallen short because they add complexity, new failure modes, and cost; have the potential to expand failure domains; and lack intimacy with the virtualized arrays. Most in-band virtualization arrays present themselves to storage arrays as windows servers and still rely upon array vendor-supplied tools to configure the array and troubleshoot performance problems. Server software virtualization instantiations have similar problems: they complicate the software stack, may require outages to install or update, may delay software OS or hypervisor updates, and like in-band virtualization appliances still rely upon array vendor supplied tools to configure and troubleshoot performance problems.
The latest variation on this theme is enabling arrays to tier their data directly to the cloud under the direction of policies and/or cloud-based analytics tools using standard APIs. This eliminates the complexity and cost issues associated with the other deployment models, but still leaves the usable availability exposures that tiering to the cloud creates. Whether the market will accept this variation remains an open question.
Archiving solutions can be deployed as an on-premises integrated appliances or software that utilizes on-premises, low-cost, high-capacity hybrid arrays, HDD-only arrays or cloud storage. Both add architectural and vendor management complexity. Using archiving solutions also complicates the use of data analytics and AI/ML because accessing archived data necessitates the restaging of data onto primary storage which makes performance unpredictable, and its eventual de-staging back onto the archive storage.
Operational simplicity places priorities on minimizing the number of different architecture storage arrays incorporated into the new storage infrastructure, lowering the skill levels needed to efficiently manage the storage infrastructure, and the ability to accommodate a wide variety of workloads and growth without adding complexity.
Having an ad hoc collection of arrays with different performance profiles, functionality, and capacities supporting a portfolio of applications creates so many different array/application pairings that the number of permutations becomes impossible to manage efficiently. This favors the use of multi-PB hybrid arrays because it minimizes the number arrays that must be managed and the number of permutations that must be worked through in pairing applications with storage solutions. It also reduces the probability of belatedly discovering performance problems caused by hosting application data on an array that was never previously used by the application.
Managing Storage Arrays
Self-managing arrays that take ownership of back-end data placement accommodate data growth more easily than storage administrators. This is because they are working 24×7 to optimize performance and cost effectiveness and deliver a more consistent performance experience by responding to changes in workloads in real time.
AFAs and modern hybrid arrays with flash- or DRAM-centric data flows are examples of self-managing arrays and feedback from users confirms that they are delivering great ease-of-use. Adding cloud-based monitoring, reporting, and performance modeling increases usable availability by advising storage administrators of missing microcode updates that could cause outages and anticipating potential performance bottlenecks caused by organic growth or plans to add new high-impact applications. Together, these technologies free lots of staff resources.
Storage consolidation also improves staff productivity. Managing fewer bigger arrays is easier than managing lots of smaller arrays. Having fewer bigger arrays simplifies disaster/recovery by increasing the likelihood of an application’s data being hosted in a single array rather than spanning multiple arrays and reducing the number of array-based replication technologies that storage administrators must master. It also naturally minimizes the stranded capacity problem that can increase costs in many large infrastructures by consolidating stranded capacity into usable amounts of storage.
Large, high-end multi-controller storage arrays have an inherent usable availability advantage over active/active dual controller midrange arrays and scale out architectures that depend upon locality of reference to meet service level objectives. If a 4-controller array loses a single controller it nominally loses 25% of its throughput capabilities; a 3-controller array losing a single controller nominally loses 33% of its throughput; and a 2-controller array losing a single controller nominally loses 50% of its throughput capabilities.
In reality the losses are usually somewhat less due to MP factors unless the array is forced to change from write-in to write-through cache management in which case the throughput loss could be much greater. Dual controller arrays that implement active/passive or active/standby data flows can, absent microcode bugs, always meet their service level objectives, but they constraint performance and throughput to the capabilities of a single active controller.
For scale-out arrays throughput losses that result from controller failures are much more difficult to estimate because they may implement federated or distributed topologies. Internode bandwidth and latency further complicate efforts to estimate throughput losses.
Lowering Frequency of Repair Activities and Power and Maintenance Costs
Multi-petabyte scale storage infrastructures built with large high-end arrays have an inherent advantage over infrastructures built with scale-out storage arrays or lots of dual controller midrange arrays because they require fewer controllers. This lowers power and cooling requirements and frequency of repair activities which could eventually translate into lower maintenance costs.
Of course, actual costs may vary because the discounts that a user can negotiate are influenced by a large number of factors including: the strength of vendor lock-ins, existing relationships, prior investments, emotional commitments to the specific technologies, fear of migrations, negotiation skills, etc.
Storage is the New Oil
Fulfilling the promise of data democratization and making the term “new oil” into more than a clever marketing slogan demands that users make cost-effectiveness, automation, simplicity, elasticity, and non-disruptive infrastructure refreshes part of their infrastructure visions. The following recommendations will help transform these visions into realities.
- Apply the KISS (Keep It Simple, Stupid) principle to the design of your storage infrastructure.
- Take the planned service lives of on-premises storage into account to create additional opportunities for improving cost effectiveness. More specifically:
- Not designing an infrastructure for forecasted growth rates beyond two refresh cycles
- Not building a storage infrastructure with more than 30 percent of headroom beyond worst case capacity growth forecasts
- Count on continuing technology improvements to reduce the cost and risk of changing storage technologies and vendors.
About the author: Stanley Zaffos is the senior vice president of product marketing at Infinidat, a provider of enterprise storage solutions. As a former research vice president at Gartner, Stanley oversaw the firm’s Magic Quadrants for General Purpose Storage Arrays and conducted research on hybrid and solid state or flash storage arrays, software-defined-storage, HCI, replication technologies, acquisition and asset management strategies.