What’s Keeping Deep Learning In Academia From Reaching Its Full Potential?
Deep learning is gaining a foothold in the enterprise as a way to improve the development and performance of critical business applications. It started to gain traction in companies optimizing advertising and recommendation systems, like Google, Yelp, and Baidu. But the space has seen a huge level of innovation over the past few years due to tools like open-source deep learning frameworks–like TensorFlow, MXNet, or Caffe 2–that democratize access to powerful deep learning techniques for companies of all sizes. Additionally, the rise of GPU-enabled cloud infrastructure on platforms like AWS and Azure has made it easier than ever for firms to build and scale these pipelines faster and cheaper than ever before.
Now, its use is extending to fields like financial services, oil and gas, and many other industries. Tractica, a market intelligence firm, predicts that deep learning enterprise software spending will surpass $40 billion worldwide by 2024. Companies that handle large amounts of data are tapping into deep learning to strengthen areas like machine perception, big data analytics, and the Internet of Things.
In the academic world outside of computer science from physics to public policy, though, where deep learning is rapidly being adopted and could be hugely beneficial, it’s often used in a way that leaves performance on the table.
Where academia falls short
Getting the most out of machine learning or deep learning frameworks requires optimization of the configuration parameters that govern these systems. These are the tunable parameters that need to be set before any learning actually takes place. Finding the right configurations can provide many orders of magnitude improvements in accuracy, performance or efficiency. Yet, the majority of professors and students who use deep learning outside of computer science, where these techniques are developed, are often using one of three traditional, suboptimal methods to tune, or optimize, the configuration parameters of these systems. They may use manual search–trying to optimize high-dimensional problems by hand or intuition via trial-and-error; grid search–building an exhaustive set of possible parameters and testing each one individually at great cost; or randomized search–the most effective in practice, but unfortunately the equivalent of trying to climb a mountain by jumping out of an airplane hoping you eventually land on the peak.
While these methods are easy to implement, they often fall short of the best possible solution and waste precious computational resources that are often scarce in academic settings. Experts often do not apply more advanced techniques because they are so orthogonal to the core research they are doing and the need to find, administer, and optimize more sophisticated optimization methods often wastes expert time. This challenge can also cause experts to rely on less powerful but easier to tune methods, and not even attempt deep learning. While researchers have used these methods for years, it’s not always the most effective way to conduct research.
The need for Bayesian Optimization
Bayesian optimization automatically fine tunes the parameters of these algorithms and machine learning models without accessing the underlying data or model itself. The process probes the underlying system to observe various outputs. It detects how previous configurations have performed to determine the best, most intelligent thing to try next. This helps researchers and domain experts arrive at the best possible model and frees up time to focus on more pressing parts of their research.
Bayesian optimization has already been applied outside of deep learning to other problems in academia from gravitational lensing to polymer synthesis to materials design and beyond. Additionally, a number of professors and students are already using this method at universities like MIT, University of Waterloo and Carnegie Mellon to optimize their deep learning models and conduct life-changing research. George Chen, assistant professor at Carnegie Mellon’s Heinz College of Public Policy and Information Systems, uses Bayesian Optimization to fine tune the machine learning models he uses in his experiments. His research consists of medical imaging analysis that automates the process of locating a specific organ in the human body. The implications of his research could help prevent unnecessary procedures in patients with congenital heart defects and others. Before applying Bayesian Optimization to his research, Chen had to guess and check the best parameters for his data models. Now, he’s able to automate the process and receive updates on his mobile phone so he can spend time completing other necessary parts of the research process.
Unfortunately, the vast majority of researchers leveraging deep learning outside of academia are not using these powerful techniques. This costs them time and resources or even completely prevents them from achieving their research goals via deep learning. When those experts are forced to do multidimensional, guess-and-check equations in their head, they usually have to spend valuable computational resources on modeling and work with sub-optimal results. Deploying Bayesian Optimization can accelerate the research process, freeing up time to focus on other important tasks and unlock better outcomes.
Scott Clark is the co-founder and CEO of SigOpt, which provides its services for free to academics around the world.. He has been applying optimal learning techniques in industry and academia for years, from bioinformatics to production advertising systems. Before SigOpt, Scott worked on the Ad Targeting team at Yelp leading the charge on academic research and outreach with projects like the Yelp Dataset Challenge and open sourcing MOE. Scott holds a PhD in Applied Mathematics and an MS in Computer Science from Cornell University and BS degrees in Mathematics, Physics, and Computational Physics from Oregon State University. Scott was chosen as one of Forbes’ 30 under 30 in 2016.