Five Reasons Machine Learning Is Moving to the Cloud
Amazon Web Services turned a lot of heads recently when it launched a machine learning platform aimed at making predictive analytics applications easy to build and run, joining cloud juggernauts Microsoft and Google with similar ML offerings. It turns out the cloud is very well-suited for this critical type of big data workload. Here are five reasons why.
1. Machine Learning Is Everywhere
If predictive analytics is the killer app for big data, then machine learning is the technological heart powering that killer app. Whether you’re aiming to leverage your big data to stop fraudulent transactions, reduce customer churn, fight cybercriminals, or make product recommendations, machine learning algorithms are the keys to creating models of what happened in the past, so you can use new data to predict what happens next.
Machine learning is nothing new; the field has been around for decades. But thanks to a confluence of events—including the ever-increasing amount of processing power, the growing sophistication of analytic software, and most important of all, the huge amounts of data available to train and feed predictive models—the need for, and the benefits of, machine learning have never been greater.
2. The Cloud’s Super Gravity
The cloud is like the Death Star: The more workloads it sucks in, the cheaper it gets for all cloud customers, and the harder it is to ever get away. Consider that Amazon Web Services (AWS) has between 2.4 million and 5.6 million servers installed in about 90 data centers across the world, according to a 2014 EnterpriseTech story, and is adding enough server capacity every day to support Amazon.com’s entire ecommerce operation circa 2004.
Cloud services like AWS’ S3 and Microsoft’s Azure make it very cheap to store all kinds of data—including log data, mobile data, and data generated by cloud-based apps like Salesforce and Workday. When it comes time to running analytics on that data, the economics of the matter make it difficult to justify landing it back down on earth.
3. Statistics Is Really Hard
When the cloud-based machine learning company called BigML was launched in 2011, the only way to do advances analytics was to buy an expensive stats package like SAS or IBM‘s SPSS or use the emerging open source tools like R.
“Machine learning and predictive analytics aren’t new,” BigML vice president of business development Andrew Shikiar tells Datanami. “But the only alternative in the past was to buy some SAS for your quants and have them do machine learning. Instead of buying SAS or putting R on your desktop, users can just log into BigML…and use an array of algorithms that we’ve introduced to the platform.”
BigML has attracted more than 17,000 users over the past four years, and has more than 200 paying clients, making it one of biggest providers of cloud-based machine learning software whose name isn’t Amazon, Google, Microsoft, or IBM.
4. ML Workloads Are Highly Variable
The actual underlying computational requirements for machine learning vary depending on where you are in the machine learning lifecycle. When you’re training (or retraining) your models, you may need a large amount of processing power, whereas actually running the models may not consume much resources at all. That variability makes the cloud a perfect place to park machine learning workloads, especially if the training data already lives on the cloud. Cloud providers like Amazon can quickly spin up virtual partitions to handle massive training sets, then turn them off when they’re no longer needed.
Consider the experience of Cisco. The computing device maker maintains an extensive collection of 60,000 “propensity to buy” (P2B) models, which it uses to predict sales of its products every quarter (we profiled Cisco in a January feature in Datanami).
Getting the necessary computer time was a challenge for Cisco’s data scientists, and as a result, it would often take several weeks to retrain the models every quarter. For a big company like Cisco, this type of delay between training and deploying ML models could result in millions in lost sales opportunities. While Cisco doesn’t run on the cloud (it adopted H20.ai to speed up its in-house ML environment), the company’s experience shows the importance of scalability in machine learning.
5. Data Scientists Are Still Unicorns
The shortage of data scientists has been well documented, in this publication and others. In response, universities have ramped up data science programs, and software companies have shifted into overdrive to abstract away the need for data scientists in the first place. While it’s debatable whether software can completely eliminate the need for data scientists, it’s undeniable that many data science activities previously done by highly trained PhDs will eventually be automated. We’re seeing many of these software offerings moved to the cloud
The combination of advanced analytics software and the availability of cheap processing power makes the cloud a perfect place to play with algorithms—as well as a great place for startups to ramp up their business models.
One of those startups, a Silicon Valley outfit called ForecastThis, yesterday announced that its MLSolver technology is now available via the cloud. “We’ve created a means by which data owners or experienced data scientists can now cut straight to the very best methods for their data,” says the company’s CTO and co-founder Justin Washtell. “There’s no longer an imperative to be an algorithm expert or to spend valuable time testing and comparing different algorithms.”
BigML’s Shikiar says being in the cloud gives him certain advantages over software companies developing on-prem solutions. “Working in the cloud is the easiest way to evolve the platform and service customers,” he says. “With the advent of cloud-based machine learning platforms…the need to roll your own algorithms may go by the wayside.”