Back to Basics: Big Data Management in the Hybrid, Multi-Cloud World
The promised fruits of advanced analytics and AI are driving organizations to get control of their data. They understand, correctly, that without solid data management and governance regimens in place, that becoming a data-driven company is more of a pipedream than a realistic goal. The trouble is, the rapid shift to hybrid and multi-cloud data environments is presenting a real roadblock when it comes to big data management.
Managing big data is challenging in the best of circumstances–think on-prem environments, where the data sources are fairly limited, are well-known, and change infrequently. But when IT professionals are asked to manage sprawling clusters of constantly changing data across multiple clouds and on-prem environments, they’re basically confronting the modern big data nightmare.
Infoworks CEO Buno Pati sees this scenario playing out over and over again among his clients, which include big names like Macy’s, Aflac, CVS, and Pepsico.
“The problem as we see it in most of these large enterprises is they have thousands of data sources and lots of ideas of how to use the data to benefit the business, but they’ve kind of been stuck with the tools and development teams used in the past,” Pati says. “That has been a challenge for them. In particular what they need to be able to do is deliver the data to the right place at the right time, without creating hundreds of separate data projects.”
Cloud platforms like AWS, Microsoft Azure, and Google Cloud offer compelling advantages, including pre-built data platforms that offer a range of management and analytics capabilities, not to mention limitless storage and compute. Companies are clamoring to take advantage of these cloud capabilities, but it’s a mixed blessing, Pati says.
“What has added a level of complexity to that is the hybrid multi-cloud environment in which, frankly, every one of our customers is operating,” he says. “None of them are single cloud. None of them are on-prem only. There’s a mixture of things. That’s also here to stay.”
Building the systems to manage and govern the data in highly varied environments is possible. The digital natives–companies like Google–have the engineers with the technical chops to build systems that can do the heavy data lifting needed before the analysts, data scientists, and AI specialists can do their thing, Pati says.
But the average Global 2000 company lacks these skills. Infoworks’ approach to this challenge is to build a single data management platform that can touch these respective environments. The Palo Alto, California company says its enterprise data operations and orchestration platform handles various aspects of data management, including data onboarding and governance, data transformation and modeling, and data pipeline development and deployment.
“We provide a layer of abstraction where people can develop stuff without writing a single line of code,” Pati says. “The basic functionalities of the platform are on-board data, prepare data, and operationalize the data, but do that in a way that serves a hybrid multi-cloud environment,” Pati says.
Because Infoworks’ product was developed to work with the various public cloud environments, it eliminates the need for customers to build those integrations themselves. “You can deliver the data to any platform you wish—Google, Databricks–because every one of those applications or use cases has a best place to run,” Pati says. “There’s a reason why you want to do things on Snowflake versus Databricks and vice versa.”
Pati says one of Infoworks clients, a global food and beverage maker, recently moved its big data workloads from a Microsoft Azure HDI (Hortonworks) environment to Databricks. It took them just 15 weeks to migrate 2,200 workloads to the new platform, versus the 12 months they had budgeted.
“Since then, they’ve been just remarkably productive,” Pati says. “They have seen over 500% growth in production jobs, and 870% growth in workflows year over year. They’re calling this their enterprise-wide integrated data fabric, a corporate-wide effort.”
Pati says the data market has aligned with the vision laid out years ago by Dataworks founder, Amar Arsikere. A Google engineer who built some of the first products on BigTable, Arsikere, who is Infoworks CTO and chief product officer, envisioned a solution that could eliminate much of the technical complexity involved with moving, transforming, and managing data.
“It’s not about point tools and coding. It’s about platforms and automation,” Pati says. “It’s not a single cloud solution, as much as Azure would like you to think so and AWS would like you to think so. It’s multi-cloud and hybrid. Gartner is really pushing, hey don’t buy cloud specific tools. Get it from an independent software vendor because you’re going to be hybrid multi-cloud and you don’t want to be jammed.”
Data Stewards in the Spotlight
Another third-party vendor helping customers to navigate the modern big data landscape is Zaloni. Matthew Monahan, Zaloni’s director of product management, said the rapid expansion of data lakes it a major factor that’s contributing to customer struggles.
“We made this evolution from data warehouses, which were very focused on known use cases and were very robust systems, to data lakes, where all of the data is thrown into one location,” Monahan says. “That’s great for big volumes of data, but it’s very difficult to manage.”
But managing data in a single data lake is easier than managing what customers face today, which is a collection of lakes. The best way to deal with this big data complexity is to create a framework that looks at all of the data assets in a holistic manner, and allows data governance policies to be applied across the lakes, he says.
“What you need is one strategy,” Monahan says. “You need one framework around which you can build your data governance approach, so everybody is doing it the same way.”
In addition to data management software that can work across these heterogenous systems, Zaloni sees the data steward emerging to fill an important role in this new distributed data world. The combination of data stewards and governance software will be critical to helping customers get value from their data.
“Data stewards generally are not builders. They’re not your deep technical folks. That would be your data engineers,” Monahan says. “We have a pilot program going on for a new offering in AWS specifically focus on governance to provide that abstraction layer that data stewards need on top of the technical layer. It works across multiple cloud environments layers–on prem, hybrid cloud, etc. That’s what I think we’re going to see more and more of over the next year or two.”
Enabling access to trusted and governed data remains a challenge, and one that Zaloni is helping its customer tackle as their cloud footprints grow. The goal–giving customers access to all of their data in a trusted, secured, and integrated manner–may never be reached, but it’s a worthy endeavor, Monahan says.
“You never give up on the dream,” he says. “We acknowledge that there’s always going to be more data coming in. There’s always going to be new data being produced, and we’re not always going to have 100% of it. But that’s always the goal. The goal is to get us as close as we can and to continue to get us as close as we can, so when you go into the platform and you say, ‘Show me all the PII data, show me all the places you have a Social Security number,’ that you have reasonably good confidence you got it all.”
The continued explosion of data presents both opportunities and challenges. Thanks to the expansion of cloud computing, the barrier to entry for advanced analytics and machine learning is lower than ever. These factors are fueling a surge in data-driven activity. However, without good data management, governance, and integration programs in place, all the analytics and machine learning in the world won’t help you, which is why vendors like Infoworks and Zaloni are succeeding by helping customers focus on big data funamentals.