August 14, 2018

Training Time Slashed for Deep Learning

George Leopold

via Shutterstock

Fast.ai, an organization offering free courses on deep learning, claimed a new speed record for training a popular image database using Nvidia GPUs running on public cloud infrastructure.

A pair of researchers trained the ImageNet database with 93 percent accuracy in 18 minutes using 16 Amazon Web Services (NASDAQ: AMZN) cloud instances, each with eight Nvidia (NASDAQ: NVDA) Tesla V100 Tensor Core GPUs. Running Fast.ai and Pytorch libraries, the researchers claimed a 40-percent boost in speed and accuracy for training ImageNet on public infrastructure. The previous record was held by Google (NASDAQ: GOOGL) on its Tensor Processing Unit Pod cluster.

“Our approach uses the same number of processing units as Google’s benchmark (128) and costs around $40 to run,” Fast.ai reported. The researchers said they would release their software for training and monitoring distributed models running in the AWS cloud.

The researchers included a Fast.ai alumnus and a deep learning expert with the Defense Innovation Unit Experimental (DIUx), a Pentagon startup working to transfer commercial technologies to the military.

Fast.ai developed a set of tools for cropping database images while DIUx supplied a framework called a nexus-scheduler used to orchestrate training runs and track the results. The scheduler was tuned for multi-machine training.

The researchers said they were encouraged by a recent report that AWS was able to reduce training time on the image database to 47 minutes with comparable accuracy.

The Fast.ai effort employed what they called a “new training trick.”

“A lot of people mistakenly believe that convolutional neural networks can only work with one fixed image size, and that that must be rectangular,” Fast.ai’s Jeremy Howard explained in a blog post. “However, most libraries support ‘adaptive’ or ‘global’ pooling layers, which entirely avoid this limitation.”

Howard continued: “…unless users of these libraries replace those layers, they are stuck with just one image size and shape (generally 224 by 224 pixels). The Fast.ai library automatically converts fixed-size models to dynamically sized models.”

The researchers said training started with small images that were gradually increased in size as training progressed. Early, inaccurate models quickly learned to identify more and larger images while spotting more image detail and distinctions. To accelerate training, they also used larger batch sizes during intermediate training steps to better utilize GPU memory to avoid network latency.

Among the lessons drawn from the Fast.ai experiments are the assertion that deep learning researchers do not necessarily require massive processing power to accelerate training. The researchers argued that a combination of new training techniques such as dynamically sized models along with public cloud access to GPU infrastructure on demand can help democratize deep learning and other AI development tasks.

“There’s certainly plenty of room to go faster still,” Fast.ai’s Howard said.

Recent items:

Google to Automate Machine Learning with AutoML Service

‘Lifelong’ Neural Net Aims to Slash Training Time

Applications: Artificial Intelligence

Technologies: Cloud, Frameworks, Processors

Sectors: Other

Vendors: AWS, google, Nividia

Tags: cloud GPUs, deep learning, DIUx, Fast.ai, gpus, ImageNet, nexus-scheduler, PyTorch, Tesla V100, TPU, training

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

April 17, 2024

April 16, 2024

April 15, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Training Time Slashed for Deep Learning

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 17, 2024

April 16, 2024

April 15, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In