Follow Datanami:
May 26, 2021

Google Cloud Tackles Data Unification with New Offerings


Good data is hard to come by and has derailed more than one data initiative. But with a trio of product announcements at this week’s inaugural Data Cloud Summit–including the introductions of a data fabric called Dataplex, a data sharing repository called Analytics Hub, and changed data capture (CDC) solution called Datastream–Google Cloud is at least attacking the problem. The new offerings show a continued move to more enterprise-friendliness with customers, a Gartner analyst says.

Getting good, clean, and consistent data continues to be a major challenge for companies and their data analytics and AI initiatives. With data spread out among various databases, data warehouses, and data lakes, getting a single view of the data can be extremely difficult. In fact, according to Gartner, poor data quality costs companies an average of $12.8 million per year, Google Cloud says.

To that end, Google Cloud unveiled three new offering to address the problem, starting with Datastream, its new serverless CDC and data replication service.

Datastream enables customers to replicate data streams in real-time, from Oracle and MySQL databases into Google Cloud services, including BigQuery, Cloud SQL, Google Cloud Storage, and Cloud Spanner. The product, which is currently in preview, will eventually be widened to support additional on-prem databases, including Db2, Postgres, MongoDB, and others, according to a chart shared with Datanami.

Google Cloud Datastream is a CDC and data integration offering

Garner Analyst Sanjeev Mohan says Datastream will put Google Cloud into competition with other ETL and data integration providers, including Matillion, Fivetran, HVR, Striim, and Oracle’s GoldenGate.  That’s a sign of how critical these data movement products are, he says.

“Will it get traction? The answer is, it depends on what is the ecosystem for the clients,” Mohan says. “Some of the new customer, like Vodaphone, who are moving to GCP, I think this is a very good option. But if a client says, I’ve got AWS and…Google Cloud is not the only cloud, if they’re multi-cloud, they may look for a cloud-vendor neutral product because they need to have one product where they build pipelines.”

Google Cloud’s forthcoming data sharing offering, called Analytics Hub, is designed to let users share data and insights, including dynamic dashboards and machine learning models, in a secure manner with other people inside and outside of their organization, the company says. The offering, which is not yet available in preview but soon will be, is based on BigQuery’s existing and popular sharing capabilities, Google Cloud says.

Google Cloud Analytics Hub provides a way to securely share data

Secure data sharing is coming up more and more with enterprises, Mohan says. “The idea of data sharing is to be able to not make multiple copies of data but have a single copy of data, and share it in a secure manner,” he says.

Dataplex, meanwhile, is billed by Google Cloud as an “intelligent data fabric” that can provide “an integrated analytics experience.” The offering, which is currently in preview, will let users “rapidly curate, secure, integrate, and analyze their data at scale,” the company says. Dataplex includes automated data quality functionality for data scientists, as well as built-in AI and machine learning capabilities that allows companies to “spend less time wresting” with systems and more time “using data to deliver business outcomes,” the company says.

Delivering a single view of data and analytics assets, no matter where they sit in the cloud, is a good idea that other cloud providers are also pursuing, Mohan says. Some independent software vendors, like Cloudera, are also pursuing it, he says. Dataplex works with a customer’s assets on Google Cloud, and eventually other clouds as well, such as through Google Cloud BigQuery Omni, which is supporting Azure today, he says.

“They are embracing this hybrid, multi-cloud space,” Mohan says. “But the problem with multi-cloud is how do you unify both your analytics and your data governance.  You need to be able to see where the data came from and have a common lineage, so Dataplex is that integrated data management platform which can sit on top of a raw data lake or a data warehouse or even a database.”

Dataplex is a data fabric that unifies access to data and analytics across Google Cloud and outside repositories

Overall, Mohan likes where Google Cloud is headed. “I think they are starting to execute on a more enterprise-friendly, enterprise-ready strategy by unifying their data story,” he tells Datanami. “So they’re adding more capabilities. They’re simplifying the architecture through serverless. They’re able to further reduce complexity. Their billing models are also getting simplified in this process [with] pay as you go.  So overall I think Google Cloud is starting to round out its data strategy for its customers to be more cohesive and enterprise friendly.”

Related Items:

Google Cloud Overhauls AI with Vertex Launch

Google Cloud Extends BigQuery to AWS, Azure

Google Cloud Unveils Slew of New Data Management and Analytics Services