Accelerating Neural Network Design
The design of a neural network architecture remains a daunting problem, requiring human expertise and lots of computing resources. The soaring computational requirements of neural architecture search (NAS) algorithms used in developing neural network frameworks make it difficult to search architectures such as ImageNet.
While so-called “diffentiable” NAS can help reduce the cost of GPU computational demand, that approach still consumes much GPU memory. Massachusetts Institute of Technology researchers have therefore proposed a scheme dubbed “proxyless” NAS to reduce computational demand using methods such as initially training models on a smaller data set, then scaling the process.
In a paper published last month, the MIT team presented a “ProxylessNAS” approach they said “can directly learn the architectures for large-scale target tasks and target hardware platforms.”
Their scheme addresses heavy memory consumption associated with diffentiable NAS with the goal of reducing GPU memory consumption and hours of graphics processing required to a level equal to typical model training. At the same time, the approach would allow for searching and crunching large data sets.
NAS is increasingly being used to automate neural network architecture designs for deep learning tasks such as image recognition and language modeling. The problem is that conventional NAS algorithms are computing and memory hogs: Thousands of models must be trained to accomplish a specific task, the MIT researchers noted.
Their search approach focuses on identifying “building blocks [for] proxy tasks,” beginning with smaller data sets or learning using fewer blocks. The best blocks are then “stacked” and transferred for use with a larger target task.
The catch with this approach is that blocks optimized for proxy tasks are not always optimal for a targeted task. Among the issues is latency, the researchers note.
Proxyless NAS addresses those limitations by directly “learning” a neural network architecture for a given task and the required hardware needed for computations. Using ImageNet, the approach required 200 GPU hours, which translated into a 200-fold decrease.
“ProxylessNAS is the first NAS algorithm that directly learns architectures on the large- scale dataset [such as ImageNet] without any proxy while still allowing a large candidate set and removing the restriction of repeating blocks,” the MIT researchers reported.
“It effectively enlarged the search space and achieved better performance,” they added.