Practical Tips for Success with Machine Learning
In the last year, the hype around AI has been deafening. And for good reason. Despite a long hiatus in the AI research community without any major wins, we’ve made some amazing progress lately. From headlines about AI protecting our digital identities to driving our cars to even diagnosing our maladies, it seems like AI has been everywhere.
Unless you have been living under a rock, chances are your feed has been littered with references to deep learning, convolutional neural networks (CNNs), recurrent neural nets (RNNs), or TensorFlow, each accompanied by a bold proclamation technology is about to solve everything from world hunger to health care.
Ok, if you’re not breaking out the champagne, I understand. For many years, AI advances often seem too far removed from practical applications to be germane to most businesses. However, I’m putting a stake in the ground and saying the hype cycle has reached its peak, and we’re entering a phase of exponential progress in real world AI applications.
Some of the noteworthy breakthroughs we’ve seen lately include Google demonstrating how its speech recognition system now understands speech more accurately than humans can. Computer scientists have also discovered that with sufficient learning parameters and deep enough models, computer algorithms can now mimic human brain operations that were once too difficult for ML. We’ve also seen a slew of AI solutions that figure prominently in our everyday lives, such as systems that can accurately detect spam and differentiate it from important emails, AI-driven solutions for cybersecurity and fraud prevention, and smart home assistants like Alexa.
I do understand why business leaders may be hesitant to buy into the promise of AI. After all, it’s often hard to predict the future. Will AI be truly revolutionary, like the internet? Or is it a bubble, like Hadoop, where the enthusiasm exceeded real benefits? Will businesses be eclipsed if they don’t jump on the bandwagon, or will they be regretting expensive mistakes that could have been prevented if they’d had more patience?
I co-founded ThoughtSpot about five years ago. Prior to that, I led a team of ML engineers at Google responsible for building the ML infrastructure and models used to predict when a user was going to click on an AdSense ad. Each quarter we trained hundreds of models to find better machine learning structures. Those models had two orders of magnitude more data and features than any others that I know of. Oh, and most quarters we also added more to Google’s revenue than the total income of most pre-IPO companies.
So I can say with certainty that ML isn’t a bubble or hype, and in fact it can make a real impact in the business world. The hundred-million-dollar question remains though: how can organizations leverage machine learning when new AI-tech is introduced almost daily?
Perfect Is the Enemy of Good
There’s not a huge delta between a solution that is “good enough” and one that’s the best. In fact, even the simplest tools can deliver significant returns if you employ the right data and problem modeling. While I was at Google, I worked with a team of very smart engineers that lacked any real knowledge about ML. However, with some smarts – and a bit of luck – the group was able to increase yield about 30 percent by using machine learning to solve an important problem. Then a collection of machine learning PhDs worked for years to increase the yield a mere 5 percent.
I’ve seen this scenario time and again. A serviceable solution will often net you 70-80 percent of the gain you’re seeking. For that reason, I think it’s pointless investing months selecting the perfect tool, instead, just be certain to build solid data models that will make it simple to switch tools in the future. If you do want to progress from a sufficient to an ideal solution, bear in mind that it’s a long hard road. If you choose that path, be certain the next 5-10 percent of improvement warrants the investment. For example:
If you are producing business insights that save hundreds of millions of dollars the choice is clear.
If you are working on a cure for cancer, the choice is clear.
In other instances, you should give careful consideration to the value of going the extra mile.
Humans Are An Essential Part of the Equation
You still need human insights to develop a good AI/ML solution to any problem. So it’s important to remember that the right team is often the most important ingredient in a successful project.
Intelligent feature engineering experts are one of the best investments you can make in any AI/ML project. It’s true that you can automate portions of the feature selection process, but much of it is an art form. For example, the most important feature often isn’t even in place at the start of a project, so if you’re relying exclusively on automated tools to select features, you’re likely missing out.
The same goes for the data you’re using. Without a human looking at the issue, you might not even realize you need to establish new data pipelines to capture the right data for your desired outcome. And the people who can see the connections needed to take that step are most often those who understand the problem domain and ML techniques. Take, for example sales data. While a sophisticated AI/ML approach can help optimize specific stages of a sales cycle, a human is able to see that piping in a completely new dataset, like the market caps of all the companies the sales team is targeting, would bring exponential value to the whole business.
If you’re lacking a team with the right experience, you may also run the risk of overfitting. There is a popular saying in data-science (attributed to economist Ronald Coase), “If you torture the data long enough, it will confess.” You can avoid that possibility by choosing tools that speed validation 100 percent.
Like all important work, machine learning is 5 percent inspiration, 95 percent perspiration. It takes a lot of bad hypotheses to get to a good one. Most organizations vested in their machine learning models’ accuracy are determined to decrease the window between developing a theory and empirically proving or disproving it. In fact, generally speaking, innovation within an organization can be reliably gauged by the amount of time it needs to validate an idea.
Measure Tiny Successes
Over the years, I’ve learned that just 10 or 20 percent of ideas yield any improvement in results, and often those improvements are so small that they’re lost in the measurement process.
However, if you’re able to create a measurement system that can consistently find small improvements, the combined gains from 50 – 100 of those improvements are usually unbeatable. It’s equally important to have a fast iteration machine that can quickly test a thousand ideas, and have a system that recognizes when something moves the needle a tiny bit.
I’ve spent most of my career in machine learning, and firmly believe we’re at a very exciting juncture in the technology’s ability to solve real world problems. If you avoid the tyranny of perfection, maintain a human touch, and combine your small successes, you’ll be able to leverage ML to more quickly, efficiently and accurately solve your business challenges today.
About the author: Amit Prakash is co-founder and CTO at ThoughtSpot and has deep experience in building large scale analytics systems. Prior to ThoughtSpot, Amit led multiple analytics engineering teams in the Google AdSense businesses, contributing $50M+ quarter-on-quarter growth to the business through improving analytical algorithms for AdSense. Prior to that, Amit was a founding engineering in the Bing team at Microsoft, where he implemented the pagerank algorithms for search from scratch. Amit received his PhD in Computer Engineering from the University of Texas at Austin and a Bachelor of Technology in Electrical Engineering from the Indian Institute of Technology, Kanpur.