Deep Learning: The Confluence of Big Data, Big Models, Big Compute
Fueled by enterprises seeking greater insight from their analytics, deep learning is now seeing widespread adoption. While this artificial intelligence (AI) discipline was first conceived in the late 1950s, the recent jump into deep learning and other AI methods is fueled by the recent increase in hardware power, the explosion of big data and desire for greater insight in several key industries.
Deep learning – and AI in general – have taken off because organizations of all sizes and industries are capturing a greater variety of data and can mine bigger data, including unstructured data such as text, speech and images. The global deep learning market is expected to grow 41 percent from 2017 to 2023, reaching $18 billion, according to a Market Research Future report.
And it’s not just large companies like Amazon, Facebook and Google that have big data. It’s everywhere. Deep learning needs big data, and now we have it.
Contrary to popular belief, more data does not always mean better results. However, deep learning models absolutely thrive on big data. Through progressive learning, they grind away and find nonlinear relationships in the data without requiring users to do feature engineering. Deep learning models also can overfit the training data, so it is good to have lots of data to validate how well the model generalizes.
So what is a deep learning model? It’s essentially a neural network with many layers. And these models can be enormous in size – often with more than 50 million parameters. The algorithm is not new, but because we now have bigger data with more computing power. This enables next-generation deep learning applications such as computer vision or speech to text.
There are some considerations for those adopting deep learning. Consider the following “big” issues:
- Big data is expensive to collect, label and store.
- Big models are hard to optimize.
- Big computations are often expensive.
Let’s dig into each of these considerations.
Deep learning is not a silver bullet, but it damn sure is a Swiss army knife. It can be used for all kinds of applications. Let’s look at some example applications matched to the data and algorithmic type.
Natural language processing. Deep learning innovator and scholar Andrew Ng has long predicted that as speech recognition goes from 95 percent to 99 percent accurate, it will become a primary way to interact with computers. Deep learning models currently have about a 4 percent error rate for speech to text. So unstructured spoken or typed text requires lots of data to produce the best results possible. Different accents are also needed, as are a good representative sample of various speech speed patterns.
IoT applications. An Internet of Things application is a treasure trove of big streaming transitional data. An oil company might have several offshore wells with sensors taking near-constant measurements of critical factors in each pipeline or well. This rich sequential data stream is like high-octane fuel for deep learning.
Computer vision. Deep learning is extremely well suited for computer vision tasks. Computer vision applications require lots of images. One might ask, for example, why are there all these deep learning examples for classifying cats? One reason is that, in 2012, Google used deep learning application to recognize videos of cats on YouTube. Since then, the task has become a routine test bed and something of a meme for deep learning geeks.
But to classify a manufacturing part as defective or not, users must label their own data! A good representation of defective and non-defective parts in this example is needed for training data. Big data is needed, but – and I can’t stress this enough – it must be carefully labeled for supervised learning.
Sometimes it is necessary to generate a data set to have enough for successful deep learning. A common hack to increase the size of image training data is to augment existing images by rotating, randomly shifting or randomly cropping images, and making other slight changes.
Deep learning training that produces models that generalize well is a difficult data science task but also requires some art. If there is insufficient labeled data, transfer learning is another option. Transfer learning approaches a new task by transferring knowledge from an existing and related task. A better-fitting model is the likely result from transfer learning.
AI researchers and academics Lisa Torrey and Jude Shavlik state that transfer learning typically provides better starting values. They also say the rate of improvement is steeper and the final convergence will tend to improve. A model can be trained in less time, they say.
Models can also decay through a process known as concept drift. A best practice is to save the model state and do incremental learning as more data is collected. Get more data and continuously improve the model.
Another best practice is to prune big models to help improve training speed while still maintaining good model fit quality. The basic idea is to drop out some of the neurons that may be redundant. There are many more algorithmic tuning tips you should investigate.
The recent and continuing uptake in deep learning use has only been possible in recent years because of the decline of computing costs while the hardware power has been exponentially growing. But another technology – the graphics processing unit (GPU) – offers another big boost to server power for deep learning.
Deep learning has two fundamental operations: forward and backward propagation passes. Both operations are essentially matrix multiplications. GPUs are particularly adept at processing matrices, and that’s why GPUs have become the default hardware for training deep learning models.
GPUs utilize parallel architecture. While a central processing unit (CPU) is excellent at sequentially handling one set of very complex instructions, a GPU is very good at simultaneously handling many sets of very simple instructions. With a good, solid GPU, data scientists can iterate over designs and parameters of deep networks. They can run experiments in days instead of months, hours instead of days, minutes instead of hours. GPUs provide an opportunity for improved accuracy, faster results, smaller footprints and less energy used.
In the last few years, the three “big trends” detailed above have converged to take deep learning beyond the AI hype of a decade or so ago. As storage became cheaper and businesses started saving more and more data, big data became a phenomenon. The rise of the data scientist role gave AI and analytics newfound desirability. And computing platforms continued to become more capable as the cost declined.
Deep learning can help make better consumer credit decisions faster. A deep learning powered health care IoT, including wearable devices, can save lives. Retailers can provide more accurate recommendations to drive sales. There’s a powerful deep learning application to benefit most industries. If you haven’t already, it’s time to see what deep learning can do.
About the author: Wayne Thompson, Chief Data Scientist at SAS, is a globally renowned presenter, teacher, practitioner and innovator in the fields of data mining and machine learning. He has worked alongside the world’s biggest and most challenging organizations to help them harness analytics to improve performance. Over the course of his 26-year tenure at SAS, Thompson has been credited with bringing to market landmark SAS Analytics technologies. His current focus initiatives include easy-to-use self-service data mining tools along with deep learning and cognitive computing tool kits. He received his PhD from University of Tennessee, Knoxville.