Follow Datanami:
August 25, 2022

Cloud Migrations Negatively Impacting Data Estates, Capital One Says


We’re in the midst of a massive migration of data to the cloud at the moment, driven in large part by the promises of advanced analytics and AI and the competitive advantages they can bring. However, before getting this big data payoff, companies must effectively manage their exploding data estates in the cloud, and that’s where things get interesting, according to a new Forrester report commissioned by Capital One, which has its share of cloud migration battle scars.

Several interesting tidbits came out of Forrester Consulting’s new report, which is titled “New Data Management Models Are Essential To Operate In The Cloud” and is based on a survey of 157 data decision makers in North America.

For starters, the cloud journey is still nascent in most shops. While public clouds are growing quickly, nearly 75% of data decision-makers tell Forrester they haven’t yet begun to manage the bulk of their companies’ data in the cloud.

More than half of the companies surveyed (56%) tell Forrester that they are managing their data in a centralized manner, which would require stitching all of the data together into one silo using data integration and ETL tools. Two in 10 (19%) say they run a decentralized data shop, while only 15% are taking a federated approach, the report says.

Historically, most companies have used a single data management tool vendor for the bulk of their data management needs. That’s still largely the case today. But over the next 24 months, the number of companies using multiple data management vendors to satisfy multiple data needs is predicted to explode to nearly 40%, Forrester says.

(Source: Forrester report for Capital One)

Another data hurdle: the data is a mess (which won’t come as a surprise to regular Datanami readers). Forrester’s report identifies widespread instances of poor data quality, a lack of data cataloging, difficulty understanding the data, and a lack of data observability.

Every company would like to have a well-governed data estate, but reality somehow intervenes, and the result is that the majority of companies struggle in this department. Forrester reports that 82% of survey-takers say they have confusing data governance policies, and 80% struggle to govern data at scale and suffer due to lack of entitlements and role-based access to data.

Cost is also big holdup to effectively managing a cloud data estate. Forrester says 82% of the folks who participated in the survey report forecasting and controlling costs as challenges. “What was once meticulously planned and budgeted on-premises, is now unpredictable,” the report says.

Finally, a lack of the right talent and skills is conspiring to prevent companies from fully leveraging their cloud data estates.

These findings are not surprising to Salim Syed, a vice president and head of engineering at Capital One Software.

Before helping build solutions in Capital One’s new software business (more on that in a bit), Syed was involved in the credit card company’s move to the cloud. That migration was ultimately successful, but not before it generated some painful lessons.

“These are things that we have felt,” Syed tells Datanami. “We experienced it when we went to the cloud.”

Capital One previously ran a Teradata data warehouse with about 500 TB of data in on-prem data centers. The company closed its last on-prem data center in 2020 and now relies on AWS and Snowflake clouds to run its 50 PB data lake/data warehouse.

McLean, Virgina-based Capital One Financial Corp. has about $420 billion in assets  (DCStockPhotography/Shutterstock)

“One of the first data platforms we chose was Snowflake. This allowed us to really scale to our demand,” he says. “We have thousands of users running millions of queries, and we wanted a data platform that could just scale to meet our business’ demand.

“But the [consequence] with that kind of unlimited power and unlimited compute is you can go from data starved to data drunk very easily,” Syed continues. “You can end up blowing through all your credits if you don’t have proper governance, proper cost control measures in the way you’re provisioning your data platforms.”

Instead of turning to software vendors for a solution, Capital One dealt with the problem in house. It developed its own self-service tools that allowed line of the business folks to manage their own data and provision compute resources when they needed, while adhering to cost control and data governance requirements through “guardrails” built into the software, Syed says.

Capital One decided the software it built was good enough to sell, so in June, Capital One Software launched its first suite of tools for managing data in Snowflake, dubbed Slingshot.

Syed says Slingshot customers will appreciate having a single, integrated suite for managing Snowflake information in a data mesh type of approach, as opposed to switching between a bunch of different tools.

“The data management industry doesn’t need disruption, but it needs simplification,” he says. “There are probably hundreds of companies that have vertical slices of data management solutions–one solution dealing with catalog, one lineage, one data quality, then you have data loading tools, data transformation tools.

Capital One follows data mesh principles to manage its cloud data estate and with its new shrink-wrapped software business (Image courtesy Zhamak Dehghani)

“What we found was that, when you’re building this federated data mesh platform with Capital One, what was really important was to build a solution that’s focused on experiences for certain personas, instead of trying to figure out what tools do I need to go and find and stitch it together.”

The cloud has largely solved the hardware scaling issue, providing infrastructure that is infinite, for all practical purposes. The availability of managed services in the cloud has also gotten customers out of the software and application framework maintenance business, which is another big plus.

As those hurdles to scale were eliminated and customers flooded into the cloud, new challenges have emerged around data management and governance, which the industry is still grappling with, as Forrester’s report demonstrates. Instead of reverting to the old top-down approach–which would be to re-centralize the data and clamp down on self-service–Capital One’s proposed solution revolves around leveraging data federation to enable data to remain decentralized while using a common set of tools and policies, which today is called data mesh.

“When you go to the cloud, you have that exponential explosion of data sets. And if you don’t have a good governance practice or data management practice, your data stays in darkness,” Syed says. “What we do is we build central policies and central tooling, but we give ownership to the lines of business, to the folks who really own the data and who know what the data means. And that has allowed us to scale in this new world.”

Related Items:

Data Mesh: What’s In It For The Business?

Data Sourcing Still a Major Bottleneck for AI, Appen Says

The Modernization of Data Engineering at Capital One

Editor’s note: This story has been corrected. Capital One had about 500 TB in its Teradata data warehouse, not multiple petabytes. The name of Capital One Software’s new data management software for Snowflake, Slingshot, was misspelled. Datanami regrets the errors.