Google Cloud Unveils Slew of New Data Management and Analytics Services
Google today unveiled a handful of new cloud services designed to simplify common tasks in the data analytics workflow, including a beta of a new data integration and ETL service called Cloud Data Fusion, the capability to leverage BigQuery from a spreadsheet interface, and the addition of Tensorflow machine learning capabilities to BigQuery ML, among others.
Google Cloud is the third largest public cloud, behind Amazon Web Services and Microsoft Azure. But judging from the stream of announcements coming out of its Google Next show in San Francisco this week, the company isn’t content to be number three in a rapidly growing market.
The biggest announcement today arguably is the unveiling of Cloud Data Fusion, which the company touts as a “fully-managed and cloud-native data integration service” that sports more than 100 out of the box connectors and a “broad library” of open-source transformations.
Cloud Data Fusion is designed to allow users to “easily ingest and integrate data from various sources and transform that data, for example, blending or joining it with other data sources, before using BigQuery to analyze it,” writes Sudhir Hasbe, a director of product management for Google Cloud Platform, in a blog published today.
Users will be able to manage all their datasets and data pipelines from a single management console. Adding new datasets or pipelines to Cloud Data Fusion requires no coding, Google says, and can be accomplished by dragging and dropping icons in a Web browser. Google provides a collection of pre-built data transformation templates, or customers can build their own.
The service provides connectors for everything from on-premise mainframes and relational databases to other cloud systems. The company says the service “brings the capabilities of a seasoned data engineer to the rest of us, whether you need a little code or none at all.”
Robert Medeiros, an R&D architect with TELUS Digital, says Data Fusion lowers the barrier to entry for big data through its visual interface and pipeline abstraction. “This increased accessibility, combined with a growing collection of pre-built ‘connectors’ and transformations, translates to rapid results and in many cases allows data analysts and scientists to ‘self-serve’ without needing help from those with deep cloud or software engineering expertise,” Medeiros says in the blog post.
Cloud Data Fusion is based on CDAP, an open source application originally developed by a company called Cask Data that Google acquired last year. CDAP was originally developed to streamline development of applications on Hadoop, and now Google is using it atop Cloud Dataproc, its managed Hadoop and Spark service. Pricing for a Cloud Data Fusion instance starts at $1.80 per hour.
Google Cloud also announced the beta release of its new “connected sheets” functionality in Google Sheets, its spreadsheet offering. With connected sheets, customers can leverage the full power of BigQuery for powering SQL queries directly from the spreadsheet interface.
“That means no row limits with this connected sheet,” Google’s Hasbe writes in the blog. “It works with the full dataset from BigQuery, whether that’s millions or even billions of rows of data. It also means you don’t need to learn SQL — you’re simply using regular Sheets functionality, including formulas, pivot tables, and charts, to do the analysis.”
One early connected sheets tester is AirAsia, an airline based in Malaysia. “Analysts and business users are able to create pivots or charts, leveraging their existing skills on massive datasets, without needing SQL,” says Nikunj Shanti, chief product officer at AirAsia, in the blog post.
Google is also enabling deep neural networks developed with Tensorflow to be built and executed directly inside BigQuery ML, the company announced.
BigQuery ML was launched last year, giving customers the capability to leverage their SQL knowledge to utilize machine learning capabilities. This year, it’s adding more advanced ML capabilities, including support for Tensorflow neural nets, K-means clustering, and matrix factorization.
Google made several other announcements today, including:
- AutoML Tables, a new service that allows users to automatically build and deploy machine learning models on structured, tabular data;
- Data Catalog, a new service that lets users search for and discover data assets stored in GCP;
- BigQuery DTS, a data transfer service that automates the movement of data from more than 100 popular cloud-based software as a service (SaaS) applications into BigQuery;
- Cloud Dataflow SQL, a new service that lets users build their own Dataflow pipelines using the same SQL dialect used in BigQuery, and to merge Cloud Pub/Sub streams with files or tables for batch or stream-based processing;
- Dataflow Flexible Resource Scheduling (FlexRS), a new service that saves customers money by processing batch workloads that aren’t time-sensitive at off hours, when processing is cheaper;
- BigQuery BI Engine, a new in-memory processing system that lets users concurrently run complex analytics at interactive speeds, via plug-ins to Google Data Studio
Google Cloud Next ’19 runs through tomorrow at the Moscone Center in San Francisco.