May 20, 2021

Cloud Storage: A Brave New World

Nir Peleg

Traditionally, the approach to designing enterprise storage systems has been that of an appliance that attaches to the enterprise network or lives on the SAN. This used to correspond to the computing approach, where applications live on dedicated (virtual or physical) servers.

Cloud computing changed the way we think of computing infrastructure. It is of little importance how a particular server is built. The way compute elements are connected together and their aggregate power put together to create a coherent, flexible, scalable computing system is of much greater significance. For this reason, we have seen much more intellectual energy put into designing orchestration infrastructure (such as Kubernetes). Additionally, big iron building blocks are not as common in cloud environments, where finer-grained elements offer greater flexibility and scalability control.

When designing a cloud storage system, a similar approach becomes applicable. Rather than isolated, coarse-grained appliances, it is desirable to use cloud-native resources that are scattered through the cloud and combine their power and functionality to create an enterprise-class, cloud-native storage system. To realize the full benefit of a cloud environment, cloud storage should be designed around a different set of principles than those used in appliance-based systems.

New Paradigm, New Challenges

Storage brings to the table a set of challenges that compute and network infrastructure don’t have to cope with.

(Istel/Shutterstock)

For decades, traditional storage systems have been designed using dedicated components such as battery-backed up write buffers, multipath-enabled storage media, hardware-assisted RAID and dedicated backplanes or switching infrastructure. These components, while critical in providing enterprise-class performance and reliability, are rarely available and cannot be easily integrated into clouds.

Giving up performance and reliability, on the other hand, is not an acceptable alternative. A true enterprise-class cloud storage system must give up such dedicated hardware components without sacrificing reliability and performance.

Think Out of the Box

While struggling with the above challenges, cloud storage system design can benefit from some of the properties of cloud environments. When processing data to be stored, one is no longer limited to resources confined in a box. The cloud allows us to use its elastic compute and memory resources to the extent needed. Unused resources are not wasted; they can be used for other (non-storage) tasks.

It is also possible to consume more resources per transaction than typically consumed in an appliance, provided that this carries some benefit, such as better performance, better data management or better economies of scale.

In appliance-based systems, cost and performance optimization is achieved by hand-tailoring data paths and caches to get the best possible performance envelope out of the given (fixed) resources available in the box. Cloud systems, in contrast, require a different approach. Resources are not fixed; rather, they are elastic and can be expanded or shrunk on demand.

Performance Requires a New Method

While the latency of a given transaction is limited by the physics of the network and media, IOPs (Input/Output Operations per second) and bandwidth can scale almost infinitely. Care should be taken as to not limit this scalability; given this, the desired approach to cost/performance optimization is the reduction of resources required to achieve a given amount of work.

In cloud scale, data caches are highly inefficient and bring about limited benefit, if at all. Instead, any available caching resource should be used to cache metadata. Here, the distributed nature of the cloud calls for some form of a distributed, scalable metadata cache.

Taking advantage of cloud resources allows true cloud storage systems to do more work on data ingestion (when new data is added to the system). This reduces the total amount of work in the long run by saving the need to later scan the data to perform tasks such as deduplication. The availability of multiple cores in cloud environments allows some of the ingest tasks to be carried out in parallel, reducing latency.

Horizontal scaling is natural in cloud environments. Cloud storage can be perfectly scalable when designed based on lock-free and, to the extent possible, synchronization-free data structures that enable massively parallel, highly concurrent operation.

A New Approach to Storage Management and Data Management

Cloud computing automates many of the management tasks that control computing resources. Similarly, true cloud storage should automate storage resource management.

Ideally, an administrator should be able to specify the required SLA and Qos for an application, the desired presentation form for the data and security-related settings (ACLs, encryption etc.).

Based on these, the system should be able to automatically manage resources to meet the requirements.

The way data is managed also must be changed. Cloud storage systems handle large pools of data. At the low level, these pools should be represented such that data does not depend on the front-end access methods used by applications. Given this, the traditional separation between primary and secondary storage becomes redundant.

(ramcreations/Shutterstock)

When data in the pool is available to any application, in any presentation form, the physical location (or storage tier) of the data is automatically determined by the system according to QoS settings. Redundancy, disaster recovery and data mobility are all handled automatically by the system; primary and secondary storage simply become different tiers within a single system.

This eliminates the need to create unnecessary copies of data sets, as is often the case with Copy Data Management systems. There is no need to duplicate properties and requirements; data storage and data management can now live in the same framework.

Doing Data Access Differently

Cloud IT is home to a large variety of applications, many of which may require different data access methods (block/file/object etc.). Classifying a system based on the access method (e.g. NAS), creates artificial, unnecessary boundaries between data pools that defeat two of the main goals of cloud: economy of scale and flexibility.

To avoid this, a true cloud system should be able to present data in multiple forms. This can be achieved by using smart data structures, which abstract data and present it in an access method-independent manner.

In order to take advantage of the many benefits of cloud environments on one hand, and overcome the challenges posed by cloud on the other hand, cloud storage systems must break away from the principles and guidelines used for traditional storage. A well-designed cloud-native storage system should be able to provide full enterprise-class performance and feature sets without giving up the benefits of cloud.

About the author: Nir Peleg is the CTO and co-founder of Ionir, which develops a Kubernetes-based storage system. Nir is responsible for the company’s strategic technology roadmap and intellectual property management. Prior to Ionir, Nir founded Reduxio and led the transition of its technology from Reduxio’s appliance-based product to Ionir’s software defined cloud native storage. With over 30 years of industry experience, Nir was CTO and co-founder at Montilio, an innovative file server acceleration company, and founder, EVP R&D and CTO at Exanet, which built one of the world’s first distributed NAS systems. Nir was the first employee and chief architect of Digital Appliance, Larry Ellison’s massively parallel computing venture that eventually became Pillar Data Systems (acquired by Oracle). Nir holds over 20 U.S. patents and patents-pending in the areas of computing, distributed storage, data deduplication and encryption.

The State of Storage: Cloud, IoT, and Data Center Trends

Big Data Is Still Hard. Here’s Why

Vendors: Ionir

Tags: big data, cloud storage

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

April 26, 2024

April 25, 2024

April 24, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024