Follow Datanami:
August 14, 2018

Training Time Slashed for Deep Learning

via Shutterstock, an organization offering free courses on deep learning, claimed a new speed record for training a popular image database using Nvidia GPUs running on public cloud infrastructure.

A pair of researchers trained the ImageNet database with 93 percent accuracy in 18 minutes using 16 Amazon Web Services (NASDAQ: AMZN) cloud instances, each with eight Nvidia (NASDAQ: NVDA) Tesla V100 Tensor Core GPUs. Running and Pytorch libraries, the researchers claimed a 40-percent boost in speed and accuracy for training ImageNet on public infrastructure. The previous record was held by Google (NASDAQ: GOOGL) on its Tensor Processing Unit Pod cluster.

“Our approach uses the same number of processing units as Google’s benchmark (128) and costs around $40 to run,” reported. The researchers said they would release their software for training and monitoring distributed models running in the AWS cloud.

The researchers included a alumnus and a deep learning expert with the Defense Innovation Unit Experimental (DIUx), a Pentagon startup working to transfer commercial technologies to the military. developed a set of tools for cropping database images while DIUx supplied a framework called a nexus-scheduler used to orchestrate training runs and track the results. The scheduler was tuned for multi-machine training.

The researchers said they were encouraged by a recent report that AWS was able to reduce training time on the image database to 47 minutes with comparable accuracy.

The effort employed what they called a “new training trick.”

“A lot of people mistakenly believe that convolutional neural networks can only work with one fixed image size, and that that must be rectangular,”’s Jeremy Howard explained in a blog post. “However, most libraries support ‘adaptive’ or ‘global’ pooling layers, which entirely avoid this limitation.”

Howard continued: “…unless users of these libraries replace those layers, they are stuck with just one image size and shape (generally 224 by 224 pixels). The library automatically converts fixed-size models to dynamically sized models.”

The researchers said training started with small images that were gradually increased in size as training progressed. Early, inaccurate models quickly learned to identify more and larger images while spotting more image detail and distinctions. To accelerate training, they also used larger batch sizes during intermediate training steps to better utilize GPU memory to avoid network latency.

Among the lessons drawn from the experiments are the assertion that deep learning researchers do not necessarily require massive processing power to accelerate training. The researchers argued that a combination of new training techniques such as dynamically sized models along with public cloud access to GPU infrastructure on demand can help democratize deep learning and other AI development tasks.

“There’s certainly plenty of room to go faster still,”’s Howard said.

Recent items:

Google to Automate Machine Learning with AutoML Service

‘Lifelong’ Neural Net Aims to Slash Training Time