February 24, 2016

Google Releases Cloud Processor For Hadoop, Spark

George Leopold

(bluebay/Shutterstock.com)

Google took the wraps off of its managed Apache Hadoop and Spark service this week, saying its cloud data processing platform is intended to reduce the cost and ease management of processing big datasets.

Cloud Dataproc, which moved from beta testing to general availability on Monday (Feb. 22), is designed to quickly spin-up clusters that can be resized from three to 300 nodes, according the Google (Nasdaq: GOOG, GOOGL). The automation tool is intended to shift users’ focus from data processing and toward data analysis. Background cluster processing is the best way to achieve that balance, the search giant argues.

Google Cloud Dataproc entered beta testing last fall. Trial customers created clusters ranging in size from three to “thousands” of virtual CPUs, the company said. Dataproc clusters can be spun up as needed. Integration with Google Cloud makes Dataproc clusters independent of its storage platform.

Google said it has also integrated its BigQuery and Cloud Bigtable capabilities with Dataproc. It also can be used in conjunction with Google Cloud Dataflow for real-time batch and stream processing.

Several features were added to the processing platform during beta testing, including data “property tuning,” virtual machine metadata and tagging and cluster versioning. This week’s release also included support for custom machine types, the company added. (Dataproc clusters are built on Google Compute Engine instances. Machine types define the virtualized hardware resources available to an instance.)

Reducing cost and complexity related to data processing are two key goals for Dataproc. “Using Spark and Hadoop should not break the bank [and] you should pay for what you actually use,” the cloud vendor stressed in a blog post. Hence, it is pricing Cloud Dataproc at 1 cent per virtual CPU in a cluster per hour.

Another goal is to speed up data processing using Hadoop and Spark. Google claimed Dataproc clusters start and stop operations in 90 seconds of less. Hence, users spend more time analyzing data than waiting on clusters.

Meanwhile, the cluster versioning feature provides access to stable versions of Spark and Hadoop, Google added.

This week’s release also includes image version 1.0.0 to support Hadoop 2.7.2, Spark 1.6.0, Hive 1.2.1 and Pig 0.15.0 releases. Google stressed that the provision of updated and native versions of Hadoop, Hive, Pig and Spark eliminates the need for new tools or APIs. It also means existing projects and ETL pipelines can be moved to its data processing platform without redevelopment.

Google also announced Dataproc support from third-party tool vendors and service partners. Tool partners include Arimo, Attunity, Looker, WANdisco and Zoomdata. New service partners include Moser, Pythian and Tectonic.

The release adds momentum to the enterprise shift toward Spark that brings with it management challenges related to resource constraints and data siloes. Hence, Google stressed that Cloud Dataproc is designed to increased availability while automating cluster administration.

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Google Releases Cloud Processor For Hadoop, Spark

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Google Releases Cloud Processor For Hadoop, Spark

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 24, 2024

April 23, 2024

April 22, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link