December 13, 2018

Uber’s Training Tool Shares Ride for Deep Learning

George Leopold

A Linux Foundation project focused on AI development is expanding with the addition of a deep learning training tool based on an Uber-sponsored project.

Launch by the ride-sharing specialist, the Horovod project is a distributed training framework for Keras, PyTorch and TensorFlow. It is designed to handle resource allocation and provides the ability to scale machine learning training efforts.

Horovod is also intended, for example, to accelerate training on a TensorFlow program running on a single graphics processor by extending training to multiple GPUs. The resource allocation and scaling features are based on new algorithms while tapping into high-performance networks to provide tooling to scale deep learning models.

Uber has reported a doubling of scaling in benchmark testing against a standard distribution of TensorFlow, the Linux Foundation said Thursday (Dec. 13).

“This project has proven highly effective in training machine learning models quickly and efficiently,” said Ibrahim Haddad, the Linux Foundation’s research director.

“Uber built Horovod to make deep learning model training faster and more intuitive for AI researchers across industries,” said Alex Sergeev, the Horovod project leader.

Uber announced in March it was extending its work on distributed deep learning while scaling Horovod on large clusters and supercomputers using IBM’s Power9 architecture.

Along with Uber and IBM (NYSE: IBM), contributors to the Horovod project include Amazon Web Services (NASDAQ: AMZN), Intel Corp, (NASDAQ: INTC) and Nvidia (NASDAQ: NVDA). Uber is using the project to develop self-driving vehicle and trip forecasting applications.

Uber joined the Linux Foundation as a “Gold” member in November. Horovod will be managed as part of Linux Foundation’s deep learning community.

The Uber project is among a number of efforts aimed at accelerating the GPU-based training of deep learning models. For example, Fast.ai, an organization offering free courses on deep learning, claimed a new speed record in August for training a popular image database using Nvidia GPUs running on public cloud infrastructure.

A pair of researchers trained the ImageNet database with 93 percent accuracy in 18 minutes using 16 AWS cloud instances, each with eight Nvidia Tesla V100 Tensor Core GPUs. Running Fast.ai and Pytorch libraries, the researchers claimed a 40-percent boost in speed and accuracy for training ImageNet on public infrastructure.

The previous record was held by Google (NASDAQ: GOOGL) on its Tensor Processing Unit Pod cluster.

Recent items:

Deep Learning is Great, But Use Cases Remain Narrow

Training Time Slashed for Deep Learning

Applications: Artificial Intelligence

Technologies: Cloud, Frameworks

Sectors: Financial Services, Manufacturing, Other, Retail

Vendors: AWS, google, IBM, NVIDIA, Uber

Tags: AI, deep learning, gpus, Horovod, Keras, Linux Foundation, machine learning, model training, PyTorch, scaling, TensorFlow, training

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Uber’s Training Tool Shares Ride for Deep Learning

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Top 6 Strategies for Reducing Data Warehouse Costs

Building an Operational Data Warehouse for Real-time Analytics

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

AI & Big Data Expo North America 2024

CDAO Canada Public Sector 2024

AI Hardware & Edge AI Summit Europe

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Uber’s Training Tool Shares Ride for Deep Learning

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

May 13, 2024

May 10, 2024

May 9, 2024

May 8, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link