Three Tricks to Amplify Small Data for Deep Learning
It’s no secret that deep learning lets data science practitioners reach new levels of accuracy with predictive models. However, one of the drawbacks of deep learning is it typically requires huge data sets (not to mention big clusters). But with a little skill, practitioners with smaller data sets can still partake of deep learning riches.
Deep learning has exploded in popularity, with good reason: Deep learning approaches, such as convolutional neural networks (used primarily for image data) and recurrent neural networks (used primarily for language and textual data) can deliver higher accuracy and precision compared to “classical” machine learning approaches, like regression algorithms, gradient-boosted trees, and support vector machines.
But that higher accuracy comes at a cost. Deep learning models are much more complex and typically require much more data to deliver better predictions. And of course, running all that data requires more computer horsepower, typically in the form of GPU-equipped clusters. It’s no wonder that the world’s leaning practitioners of deep learning are companies with names like Google, Facebook, and Microsoft, which have a ton of data and compute capacity on which to develop and train advanced predictive models.
Size does matter, but that doesn’t leave mere data mortals out in the cold. With the right techniques, data scientists and machine learning engineers can get in on the deep learning action, but without a huge corpus of training data up front.
One of the technologists with a lot of experience making the most of smaller data is Vaibhav Nivargi, the CTO and co-founder of Moveworks, which develops IT ticket automation software. The company is just three-and-a-half years old and is backed by some of the top venture capitalists in Silicon Valley. But it doesn’t have a ton of IT tickets on which to train its predictive models, which posed a challenge to Nivargi.
“IT tickets are not really the most voluminous types of data,” Nivargi says. “Even if you have a customer with several thousand employees, it’s a relatively infrequent activity to file an IT ticket. The data sets are relatively sparse and small. To be able to leverage more sophisticated techniques, running at very high levels of accuracy and precision, is highly non-trivial.”
Under Nivargi’s watch, Moveworks has swung above its weight and developed deep learning models that are much more accurate and precise compared to traditional machine learning approaches. These three techniques detail how he did it.
Transfer learning is arguably the most basic approach to leveraging powerful deep learning approaches when you don’t have the data to develop a more custom solution. At its most basic level, it’s a copy and paste approach: You copy a deep learning model that’s already been developed, but paste your custom code into the last layer that develops the final prediction.
Moveworks adopted BERT, a recurrent network model developed and open sourced by Google for natural language understanding, as the basis for one of the deep learning models it uses to understand the words that its customers use as they interact with an IT help desk.
“Think of transfer learning as fine tuning,” Nivargi tells Datanami. “You already have something that’s reasonably working, but you want to make it work for your use case. You take most of what it has learned, and you just change the last few steps.”
Collective learning is a technique that can be used to amplify your existing sparse data to generate new data that’s very close to the distribution of real world data. More data, of course, equals better results in a deep learning model.
Using collective learning techniques to boost the available corpus of training data in the computer vision domain is fairly straightforward. “You can take the image and tweak the contrast,” Nivargi says. “Or you can rotate the image or chop off the sides, and now you have 3x to 4x that data.”
It’s trickier to do in the language domain, because you can’t just chop off the end of words or add random words and punctuation. But with enough care in the hands of a knowledgeable practitioner, the collective learning approach can also be used to boost an existing data set and provide more collections of words to feed into the recurrent neural network.
“We have language like ‘I’m trying to set up a meeting with X’ or ‘I’m getting error Y. Can you help me?’” Nivargi says. “You can substitute a meeting with Zoom or WebEx or GoToMeeting, and mix and match the preamble and the problem, and still get data that is very realistic.”
Meta learning, which is sometimes called n-shot or multi-shot learning, is another powerful data-boosting technique. According to Nivargi, Moveworks uses meta learning to create additional dimensions of the data, which allows the deep learning models to better understand the shape of the data.
Let’s say I see an IT ticket that says ‘I need access to the dashboard so I can publish a metric,’” Nivargi says. “It’s perfectly valid, syntactical English. It’s meaningful. But it’s completely incomplete. We don’t know what kind of dashboard.”
To fill in the blanks and drive toward a greater understanding of what that employee meant, Moveworks employs a meta learning approach that applies context to the statement. The approach uses available metadata to make a guess about what the employee meant.
“You can use an example of what department the employee works in, or the time of day,” Nivargi says. “If the employee in in marketing, there’s a good chance they’re talking about the Salesforce dashboard. If they’re in engineering, there’s a good chance they’re talking about the Jira dashboard.”
DL for the Masses
Collectively, these three techniques let Moveworks specifically amplify this data to a very high volume, Nivargi says.
Transfer learning has a bootstrap phenomenon, where you can start from a strong base and then adapt to your domain,” he says. “Collective learning has a network effective. And meta learning can take data from N dimensions and add several hundred dimensions more. So all of these are cumulatively very, very powerful.”
Moveworks employs a wide variety of supervised and unsupervised machine learning and deep learning models to help understand the meaning of IT help desk tickets. Some models are universal and some are customer-specific and some are ensembles. Like most companies active in data science, it’s found some techniques work well in some domains, while other approaches require completely different approaches.
In the early days of the company, Moveworks hit a wall with traditional machine learning models. One of them was “saturated” at around 82% accuracy. “It would simply not learn above that,” Nivargi says. “The model would plateau.”
After employing some of the transfer learning, collective learning, and meta learning tricks discussed here, Moveworks had enough data to keep the bigger deep learning models flush with data. As a result, the models are more accurate.
“Some of our models now can be deployed in production in the high 90s, 90% to 96% precision, and we can have very high coverage as well,” he says. “So now we can use extremely powerful and sophisticated deep learning techniques without requiring the same kind of inherent data that a Google or Facebook or Microsoft has.”
September 23, 2021
- CNCF End User Technology Radar Provides Insights into DevSecOps
- At Annual OCEANS 2021, Sofar Ocean Debuts First-of-Its-Kind Maritime Open Standard, Bristlemouth
- Elastic Announces the General Availability of Elastic App Search Web Crawler, New Features for Elastic Enterprise Search
- Securonix Achieves FedRAMP In-Process Authorization
- EDJX and Cubic Corporation Partner to Launch the Internet of Military Things Edge Platform
September 22, 2021
- GigaOm Names Moogsoft an Industry Leader in “Radar for AIOps Solutions” Report
- Clearsense Acquires Plug-and-Play AI Analytics Firm
- Purdue University Global Launches Master of Science in Data Analytics
- Dihuni OptiReady CognitX Deep Learning Servers and Workstations Powered by NVIDIA Ampere Architecture-based GPUs
- Scality Awarded New U.S. Patent for Breakthrough Technology in Hyper-Scale Data Protection
- MicroAI to Bring AI Training to Renesas MCUs
- Recent Gartner VP Analyst Sanjeev Mohan Joins Okera as a Strategic Advisor
- C3 AI Reinvents Enterprise Software UX With C3 AI Data Vision
September 21, 2021
- Healthcare Analytics Summit 21 Virtual Kicks Off Today
- Tesco Selects Teradata Vantage to Drive Enterprise-Wide Analytics at Scale
- Ketch Secures $20 Million in Series A1 Funding, Accelerating its Rapid Growth
- Yandex Spins Off ClickHouse into Standalone Company
- Analytics Vidhya Announces $5.5 Million Strategic Investment from Fractal, Aims to Train Half a Million Full Stack AI Professionals
- Nutanix Cloud Platform Breaks Down Silos in Hybrid Multicloud Operations
- Telit Announces New Industrial IoT Platform To Visualize Machine Data
Most Read Features
- One on One with Google Cloud Product Director Irina Farooq
- Big Data File Formats Demystified
- Tabular Seeks to Remake Cloud Data Lakes in Iceberg’s Image
- What’s the Difference Between AI, ML, Deep Learning, and Active Learning?
- Who’s Winning In the $17B AIOps and Observability Market
- SambaNova Brings Custom Silicon To Bear on High-End AI Workloads
- In Search of the Modern Data Stack
- COVID-Driven Cloud Surge Takes a Toll on the Data
- Rethinking Education in an AI-First World
- Did Rockset Just Solve Real-Time Analytics?
- More Features…
Most Read News In Brief
- LinkedIn Open Sources Tech Behind 10,000-Node Hadoop Cluster
- Data and AI Salaries Continue Upward March, O’Reilly Says
- Gartner Shuffles the Technology Deck with Latest ‘Hype Cycle’ Report
- Data Prep Still Dominates Data Scientists’ Time, Survey Finds
- Who’s Winning in Open Source Data Tech
- Can Apple Right its Privacy and Security Cart?
- Apollo CEO Bullish on GraphQL’s Potential in the Enterprise
- Hands-Off: Manual Data Integration Tasks Plummeting, Gartner Says
- Why Is SAS Going Public?
- Unstructured Data Growth Wearing Holes in IT Budgets
- More News In Brief…
Most Read This Just In
- TIBCO NOW 2021 Showcases Limitless Power of Data
- Cribl Raises $200M in Series C Funding on Traction with Global Enterprise Customers
- Toloka Launches Data Research Grants, Announces First Eight Recipients
- Anaconda Announces Support for Pyston, Hiring Lead Developers Kevin Modzelewski and Marius Wachtler
- MariaDB Announces SIS Provider Campus Cloud Services Migration to MariaDB SkySQL
- Transaction Processing Performance Council (TPC) Launches an Artificial Intelligence Benchmark (TPCx-AI)
- Kinetica Fuses Streaming and Contextual Analysis At Scale
- OneStream Previews New AI and ML Capabilities at Splash 2021
- JetBrains Launches Public Early-Access Program for JetBrains DataSpell IDE
- Aporia Launches Self-Serve Machine Learning Platform Open to Public
- More This Just In…
Sponsored Partner Content
October 5 - October 7
October 12 - October 14
October 19London United Kingdom
October 27 - October 28
November 29 - December 3
December 6 - December 10San Diego CA United States