Google Unleashes TPUs on Cloud ML Engine
As the amount of machine learning training data soars, so too does demand for new tools that will accelerate the process. With that in mind, Google Cloud announced the beta release of a new feature that allows users to speed training by running Tensor processing units (TPU) on its machine learning engine.
The company also announced general availability of the latest release of Google Kubernetes Engine, its platform for managing applications containers at scale. Among the goals is expanding enterprise adoption of the popular cluster orchestrator.
The TPU capability released to Google Cloud Platform customers this week is the latest in a steady stream of cloud offerings design to contrast its platform from its public cloud rivals. The cloud TPU was released this time last year as part of Google’s “AI-first” strategy.
The search giant (NASDAQ: GOOGL) also has released cloud-based services designed to automate the training of machine learning models.
Google ML Engine, which was released in March 2017, provides a managed service for accessing the TensorFlow open source computational library. The service is designed to enable training and deployment of machine learning models using a variety of data sets. ML Engine handles computing resources and other IT infrastructure while developers focus on speeding model training.
In a blog post on Monday (May 21), Google said support for cloud TPUs could be used to train a range of reference models or to accelerate training of existing models written with TensorFlow APIs.
The service specifically handles provisioning and management of cloud TPU nodes, Google said, meaning they can be used “as easily as CPUs and GPUs,” added Nikhil Kothari, Google’s lead engineer for its Cloud ML Engine. (Google last month announced support for Nvidia [NASDAQ: NVDA] Tesla V100 GPUs on its Compute and Kubernetes cluster orchestrator engines.)
Kothari noted that an ML Engine tuning feature is designed to optimize “hyperparameters,” that is, values established in advance of model training, thereby combining required performance, scale and algorithms as another way of improving machine learning models. The resulting models can then be deployed via the ML Engine to issue predicting requests or to submit batch prediction jobs, Google added.
Separately, Google said the latest version of its Kubernetes Engine released this week adds support for shared virtual private clouds. The new feature is intended to improve control of enterprise networks as well as regional clusters and persistent storage.
Google promotes the Kubernetes Engine as a way to accelerate the deployment and updating of applications and services by provisioning cloud resources based on a user’s computing, memory and storage requirements. Along with enterprise applications, the company notes that the Kubernetes update supports databases running on a cluster.
The upgrade also supports increasingly popular hardware accelerators used to run machine learning and other computing-intensive workloads, Google said.
-Editor’s note: This story has been corrected to reflect the actual release date of the Google machine learning engine as well as the new TPU feature.