Why Infrastructure Flexibility is Critical to the Data Lifecycle
It’s not a coincidence that we associate the word lifecycle with data. From ingestion to transformation and archival – data is alive. It is the fuel that drives the modern enterprise and the lifeblood that flows through every part of it.
The data world has increasingly become more complex and confusing, however. We have everything – from data friction and data drift to data gravity and data exhaust – enough terms to fill up an Urban Data Dictionary. Each represents different challenges of data management, but all share a common denominator: the need to monetize so-called big data in new and more agile ways.
“Big Data” Is A Misnomer
Why include the modifier “so-called” before the words big data? Because big data is a misnomer. Yes, as data grows and evolves, it can grow to be gigantic–today petabytes, tomorrow zettabytes and beyond. But “big” is only one part of the problem. More interestingly, the applications of big data can really be quite small.
Consider a retailer using purchasing data to create a 360-degree view of a single customer. That retailer is collecting information on that customer’s unique likes, dislikes, or buying patterns (often derived from their “data exhaust”), and developing a marketing profile based on that information. While it’s true that the retailer is pulling from a potentially large pile of data, they’re applying those insights to a microcosm of an individual customer’s buying habits.
But There’s Definitely A Data Downpour
Doing this successfully and quickly can require a great amount of flexibility. Twenty years ago, data flow moved in one direction and through specific conduits, just like a drainage system. Structured data systems such as mining cubes and business intelligence (BI) platforms handled the flow admirably. But that drainage system has turned into a downpour, and attempting to manage this storm through outdated tools is not only expensive but untenable. Instead, organizations must now use a combination of agile processes, tools, and infrastructure to manage their data flows.
Let’s begin with the processes first. Similar to DevOps, DataOps teams live by the mantra “collaborate, iterate, and fail fast.” The goal is to have data scientists, cloud, or infrastructure admins, data security engineers, and data integration specialists collaborate and iterate on common solutions, with the goal of optimizing analytics platforms and infrastructure. Working closely together, they can catch errors and course correct early on, saving money and application development time, and enabling enterprises to extract optimal value from data-centric applications.
These teams need flexible tools and infrastructure to manage data, however big, different, or fast it may be. One possible route is toward public clouds. These can provide a great deal of flexibility but could bring the added burdens of data gravity and vendor lock-in, which could be costly in the long run, given cloud egress costs.
Alternatively, organizations may wish to consider deploying software-defined storage (SDS) on flexible on-prem infrastructure with the luxury of the self-service of the public cloud while maintaining control of their own destinies. They can scale storage and compute resources up or down as their data needs change and easily manage competing analytics stakeholders across the enterprise. In addition, by standardizing on open, they can automate the management of storage workloads and the movement of data from one platform to another without having to modify their applications.
SDS can also support organizations as they move from virtual machines to containers by providing them with the ability to run container-native storage. By their nature, containers are ephemeral and lack persistent storage that’s required by most applications. Some Kubernetes platforms with integrated storage provide stateful data that can continue to exist even after the container is spun down. This can be done without submitting a storage provisioning request to a data manager, reducing a process that previously could take days or weeks to mere hours. More importantly, since many enterprises run container-based workloads in hybrid and multicloud environments, it’s important to have a consistent software-defined storage layer irrespective of where your applications reside, adding to greater business flexibility and choice.
Managing This New Lifecycle Requires New Methods
The way applications are developed must live and grow commensurate with the data lifecycle. Developers should have the freedom to be able to move and shift application development on a dime. Simultaneously, they should be able to cultivate and manage all types of data, big and small, and make it truly work for their organizations’ unique use cases.
To achieve these goals, organizations must eschew the old ways of data management and turn to new methods that can keep data flowing. Implementing flexible processes, supported by an equally flexible set of tools and infrastructure, can be the key to a happy and fulfilling data life cycle.
About the Author: Irshad Raihan is director of storage product marketing at Red Hat, the world’s leading provider of open source software solutions.