Cloud Storage: A Brave New World
Traditionally, the approach to designing enterprise storage systems has been that of an appliance that attaches to the enterprise network or lives on the SAN. This used to correspond to the computing approach, where applications live on dedicated (virtual or physical) servers.
Cloud computing changed the way we think of computing infrastructure. It is of little importance how a particular server is built. The way compute elements are connected together and their aggregate power put together to create a coherent, flexible, scalable computing system is of much greater significance. For this reason, we have seen much more intellectual energy put into designing orchestration infrastructure (such as Kubernetes). Additionally, big iron building blocks are not as common in cloud environments, where finer-grained elements offer greater flexibility and scalability control.
When designing a cloud storage system, a similar approach becomes applicable. Rather than isolated, coarse-grained appliances, it is desirable to use cloud-native resources that are scattered through the cloud and combine their power and functionality to create an enterprise-class, cloud-native storage system. To realize the full benefit of a cloud environment, cloud storage should be designed around a different set of principles than those used in appliance-based systems.
New Paradigm, New Challenges
Storage brings to the table a set of challenges that compute and network infrastructure don’t have to cope with.
For decades, traditional storage systems have been designed using dedicated components such as battery-backed up write buffers, multipath-enabled storage media, hardware-assisted RAID and dedicated backplanes or switching infrastructure. These components, while critical in providing enterprise-class performance and reliability, are rarely available and cannot be easily integrated into clouds.
Giving up performance and reliability, on the other hand, is not an acceptable alternative. A true enterprise-class cloud storage system must give up such dedicated hardware components without sacrificing reliability and performance.
Think Out of the Box
While struggling with the above challenges, cloud storage system design can benefit from some of the properties of cloud environments. When processing data to be stored, one is no longer limited to resources confined in a box. The cloud allows us to use its elastic compute and memory resources to the extent needed. Unused resources are not wasted; they can be used for other (non-storage) tasks.
It is also possible to consume more resources per transaction than typically consumed in an appliance, provided that this carries some benefit, such as better performance, better data management or better economies of scale.
In appliance-based systems, cost and performance optimization is achieved by hand-tailoring data paths and caches to get the best possible performance envelope out of the given (fixed) resources available in the box. Cloud systems, in contrast, require a different approach. Resources are not fixed; rather, they are elastic and can be expanded or shrunk on demand.
Performance Requires a New Method
While the latency of a given transaction is limited by the physics of the network and media, IOPs (Input/Output Operations per second) and bandwidth can scale almost infinitely. Care should be taken as to not limit this scalability; given this, the desired approach to cost/performance optimization is the reduction of resources required to achieve a given amount of work.
In cloud scale, data caches are highly inefficient and bring about limited benefit, if at all. Instead, any available caching resource should be used to cache metadata. Here, the distributed nature of the cloud calls for some form of a distributed, scalable metadata cache.
Taking advantage of cloud resources allows true cloud storage systems to do more work on data ingestion (when new data is added to the system). This reduces the total amount of work in the long run by saving the need to later scan the data to perform tasks such as deduplication. The availability of multiple cores in cloud environments allows some of the ingest tasks to be carried out in parallel, reducing latency.
Horizontal scaling is natural in cloud environments. Cloud storage can be perfectly scalable when designed based on lock-free and, to the extent possible, synchronization-free data structures that enable massively parallel, highly concurrent operation.
A New Approach to Storage Management and Data Management
Cloud computing automates many of the management tasks that control computing resources. Similarly, true cloud storage should automate storage resource management.
Ideally, an administrator should be able to specify the required SLA and Qos for an application, the desired presentation form for the data and security-related settings (ACLs, encryption etc.).
Based on these, the system should be able to automatically manage resources to meet the requirements.
The way data is managed also must be changed. Cloud storage systems handle large pools of data. At the low level, these pools should be represented such that data does not depend on the front-end access methods used by applications. Given this, the traditional separation between primary and secondary storage becomes redundant.
When data in the pool is available to any application, in any presentation form, the physical location (or storage tier) of the data is automatically determined by the system according to QoS settings. Redundancy, disaster recovery and data mobility are all handled automatically by the system; primary and secondary storage simply become different tiers within a single system.
This eliminates the need to create unnecessary copies of data sets, as is often the case with Copy Data Management systems. There is no need to duplicate properties and requirements; data storage and data management can now live in the same framework.
Doing Data Access Differently
Cloud IT is home to a large variety of applications, many of which may require different data access methods (block/file/object etc.). Classifying a system based on the access method (e.g. NAS), creates artificial, unnecessary boundaries between data pools that defeat two of the main goals of cloud: economy of scale and flexibility.
To avoid this, a true cloud system should be able to present data in multiple forms. This can be achieved by using smart data structures, which abstract data and present it in an access method-independent manner.
In order to take advantage of the many benefits of cloud environments on one hand, and overcome the challenges posed by cloud on the other hand, cloud storage systems must break away from the principles and guidelines used for traditional storage. A well-designed cloud-native storage system should be able to provide full enterprise-class performance and feature sets without giving up the benefits of cloud.
About the author: Nir Peleg is the CTO and co-founder of Ionir, which develops a Kubernetes-based storage system. Nir is responsible for the company’s strategic technology roadmap and intellectual property management. Prior to Ionir, Nir founded Reduxio and led the transition of its technology from Reduxio’s appliance-based product to Ionir’s software defined cloud native storage. With over 30 years of industry experience, Nir was CTO and co-founder at Montilio, an innovative file server acceleration company, and founder, EVP R&D and CTO at Exanet, which built one of the world’s first distributed NAS systems. Nir was the first employee and chief architect of Digital Appliance, Larry Ellison’s massively parallel computing venture that eventually became Pillar Data Systems (acquired by Oracle). Nir holds over 20 U.S. patents and patents-pending in the areas of computing, distributed storage, data deduplication and encryption.