Fast Object Storage: Meeting the Demands of Modern Data
Object storage is expanding beyond the cheap-and-deep, slow and cold storage option it’s been known as for decades into a new form of next-gen primary storage: Fast object storage. Get ready, because it’s seeing a surge in demand.
Fast flash-based object storage takes advantage of the flexibility, availability and durability of traditional object storage but offers increased throughput and lower latency. The need for fast object storage became clear as cloud-native apps started to use object as their default storage – the persistence layer – and application design began to shift to align with these cloud-native concepts often based on Amazon S3. This led to a future in which some applications would require higher performance, a primary characteristic of modern data.
Most data now is machine-generated and not always generated in-house, which makes it unpredictable. It is also multi modal – in the form of files and objects. The modern data era, which is complex and massive, demands Fast Object Storage and over the next couple years will demand fast file and fast object storage – but more on that later. Three trends – massive data growth, rich metadata encapsulation and multi-dimensional performance – dominate considerations about modern data and underscore the need for a new kind of object capability, one that’s radically different from the origins of object stores that were built for disk:
1. Exponentially-growing unstructured data sets aren’t just an archive. New applications of machine learning, massive-scale data warehouses and other modern analytical tools have all become vital in unlocking the insights in the large volumes of unstructured data that has been driven by growth in raw sensor devices and machine-generated data.
The once-popular trend of co-locating data and compute in a “data lake” has turned out to be an architectural dead-end. As public and private cloud architectures have emerged over the last decade, object storage has become the preferred platform for hosting these enormous volumes of data. By decoupling compute and storage infrastructure and relying on advances in modern networks, system architects can “right-size” specific workflows and offer fine-grained elasticity to their customers.
2. Metadata is essential to working with massive data. Storing all this data isn’t particularly useful if we can’t find it. An early architectural pattern emerged from database applications, where architects struggle to store binary blob data inside of a traditional relational database. By placing the unstructured blobs on an external object store, the relational database becomes a powerful metadata store in the larger application architecture. We continue to see this pattern being deployed in sectors as diverse as high-tech manufacturing, payment processing, and software-as-a-service backends.
While relational databases are capable, flexible tools, decoupling the data and metadata can lead to consistency challenges. We’ve seen a growing set of customers looking to encapsulate metadata alongside the raw data in their object storage platforms. As a result, modern object platforms are providing rich access methods and support for increasingly sophisticated metadata-driven analysis.
3. Modern data moves fast across many sites. Whether you’re building log analytics systems for cybersecurity threat detection, data warehouses for rocket telemetry metrics or tick data stores for quantitative finance, there are various performance considerations your infrastructure must account for. Low storage latency for newly arrived data, immense bandwidth for bulk analysis of data movement and high metadata throughput to identify the right data at the right time are three that are likely top of mind
Beyond Fast Object is Unified Fast File and Object
Fast file storage is nothing new but combining fast file and fast object in the same system, architecturally speaking, is groundbreaking and also the future. Traditional architectures can deliver high performance for either small or large files and sequential or random file workloads.
But modern data requires all of the above at the same time – the one thing you can count on is that there is no single workload pattern that you can optimize for. And that means you need multi-protocol and multi-dimensional access with architecture that can deliver performance with different kinds of data. It needs to capture and analyze that data in real-time. And it has to be ready to handle future unknowns, as workloads continue to evolve. Consider the following real world examples of organizations applying a unified fast file and object approach to their modern data challenges of today:
A telecommunications company that collects tens of terabytes of telemetric and capacity data every 15 minutes from cell towers in order to validate the quality of signals as well as available capacity on the main network. Their ability to collect and process the data yields insights – in some instances in near real time – so they can take actions to adjust for inefficiencies.
Cogo Labs, is an incubator that depends on data analytics in its high-stakes world of spinning up profitable companies fast. Cogo is leveraging data to identify business opportunities, run internal “what if” experiments and accelerate the revenue streams of their internet startups.
Unified Fast File and Object is a new, emerging category of storage that supports such use cases. Its fundamental characteristic is that it has fast file and fast object in the same storage system. In today’s market, most platforms are either file or object, and are either low performance or, in the case of file, are fast only for specific data profiles. True unified fast file and object storage should deliver multi-dimensional performance that can handle diverse workloads, scale seamlessly to enable consolidation, and be simple to deploy and manage.
Simply put, the needs of modern unstructured data requires a new approach to data storage that supports small and large files and objects with multi-dimensional performance and treats all data sets as first class citizens.
About the author: Brian Gold is an engineering director at Pure Storage and part of the founding team for FlashBlade, Pure’s scale-out, all-flash file and object storage platform. He’s contributed to nearly every part of the FlashBlade architecture and development from inception to production. Brian received a PhD from Carnegie Mellon University, focusing on computer architecture and resilient computing.