Follow Datanami:
December 21, 2018

Google Updates Cloud Database, Developer Tools

(Semisatch/Shutterstock)

Google unleashed a batch of updated tools this week aimed at cloud-based big data and storage options along with the beta release of a developer tool designed to ease use of Apache Spark with the R programming language.

Google (NASDAQ: GOOGL) said Wednesday (Dec. 19) its integration of Spark and R would allow developers to work with large data sets stored in the Google cloud in a manner similar to dplyr, an R package used to prepare and analyze data.

The beta release of the tool known as SparkR is designed to run jobs on Google Cloud’s Dataproc, it’s managed service for running Spark and Hadoop clusters. The cloud vendor added that SparkR also supports distributed machine learning using MLlib.

The “integration [can be used] to process against large cloud storage datasets or perform computationally intensive work,” Google said in a blog post.

The integration is billed as allowing developers to build and scale models “to analyze datasets of sizes that previously would have required huge upfront investments in high-performance computing infrastructures.”

The second tool released this week is the latest version of a Python runtime geared to the Google App Engine. Google said Python 3.7 reflects the evolution of the programming language and its own app development infrastructure over the past decade. For example, the upgraded version allows developers to use Python dependencies to write apps and microservices that will operate on existing Python runtimes,

Meanwhile, Google continues to add new cloud services aimed at specific big data applications as a way to differentiate its platform from larger public cloud rivals. The latest example are enhancements to its Cloud Spanner relational database service, including query “introspection” improvements along with greater availability of the service and expanded configuration choices across multiple cloud regions.

The “introspection” upgrades include new query statistics feature designed to monitor and debug SQL database queries consuming the most Cloud Spanner resources. The “capability gives users better visibility into frequent and expensive queries running on the system,” Google noted in a separate post.

“This information is useful both during schema and query design, as well as for production debugging [because] users can see which queries need to be optimized to improve performance and resource consumption,” the cloud vendor added.

Streamlining queries that would otherwise consume use larges amounts of database resources “is a way to reduce operational costs and improve general system latencies,” Google said.

The company also announced expanded availability of the database service in Hong Kong as part of a cloud region launch. Cloud Spanner is now offered in 14 of 18 Google cloud regions. The first new multi-region configuration of the database service extends from Oregon to South Carolina as a way to reduce latency and provide backup in case of cloud outages.

Recent items:

Project Hydrogen Unites Apache Spark with DL Frameworks

Google Cloud Adds Apps Database

Has FaunaDB Cracked the Code for Global Transactionality?

Datanami