Inside Cisco’s Machine Learning Model Factory
Like most companies of its size, Cisco Systems uses forecasts to help it direct resources. The accuracy and timeliness of those forecasts is critical, but keeping the predictive models that drive them up-to-date was a time-consuming task. So when the $47-billion company found new machine learning technology that could significantly speed up the training and scoring of those predictive models, it jumped at the chance.
Cisco maintains a collection of 60,000 propensity to buy (P2B) models that it uses to forecast demand for its products, everything from routers and IP phones to blade servers and cable TV boxes. Every potential customer of every Cisco product in every country around the world is represented in those models, which its sales and marketing teams heavily rely upon to decide which prospects to pursue with which products. Needless to say, the models are of critical importance to Cisco and its go-to-market strategy in the fast-changing IT industry.
The 20-person marketing analytics team led by Cisco’s principal data scientist, Lou Carvalheira, is charged with keeping the P2B models current. Time is of the essence for Carvalheira and his team: the sooner the underlying machine learning algorithms that power models can be retrained and scored each quarter, the more time that Cisco’s sales and marketing professionals have to utilize the data before it grows stale.
The team has an extensive arsenal of analytics tools at its disposal, including a big investment in SAS for analytics, access to a large Teradata warehouse, a Hadoop environment based on MapR’s distribution, offerings from Matlab, Mathematica, and Maple, and products from SAP‘s BusinessObjects and Tableau. “Cisco has a very intense analytical culture,” Carvalheira says. “We have a little bit of everything.”
Despite that culture and the presence of the best tools money can buy, the team struggled to update and score the models in a timely manner. Speed was required to maximize the freshness of the data, but it was an uphill battle–especially when it came to running resource-intensive machine learning algorithms in Cisco’s shared analytics environment.
“Our environment was quite heavy and hard to work with,” Carvalheira tells Datanami. “When it became time for us to renew our models, everybody was trying to do something else as well. The bottom line was recreating those models would take quite a while–sometimes, many, many weeks.”
Simple and Scalable ML
Motivated to find a faster modeling solution, Carvalheira started researching alternative machine learning environments. When he checked out an in-memory machine learning solution from a company called H20.ai (formerly 0xdata), he immediately liked the simplicity of its approach.
H20 was founded four years ago by SriSatish Ambati to provide a straightforward yet powerful distributed machine learning environment for data scientists. Ambati, frustrated with limitations in R, developed H20 using a combination of R, Python, and Java to write the software, which is open source.
Carvalheira liked the fact that he could control H20 workflows using R, and was particularly impressed with the software’s deployment process, which basically boils down to installing Java Archive (JAR) files on X86 computers.
“All you need is the H20 JAR file,” he says. “You just copy a JAR file and you’re able to run stuff. It was mind-boggling. They spent a lot of time simplifying it.” Scaling an H20 cluster is nearly as easy. “If you don’t have enough memory to accommodate your data when you want to train your models, well, just add another machine. It scales horizontally,” he says.
The software doesn’t do everything. It doesn’t do data preparation. It doesn’t give you visualizations. Compared to Cisco’s extensive SAS environment, H20’s functionality is rather paltry. But what H20 does, it does very well, Carvalheira says. “It’s just fantastically optimized,” he says.
Running on H20
Carvalheira acquired the H20 software and deployed it on a 4-node CentOS cluster that boasts 24 cores and 128GB memory. All of a sudden, Carvalheira had the capability to train and score all of Cisco’s models 60,000 P2B models in a matter of hours.
The speed of the in-memory system stood out right away. “I think some of the algorithms are the fastest in the market,” he says. “The GLM [generalized linear model] they have–truth be told, you don’t solve all problems with GLM–but it’s just fantastic how quick that is.”
The overall algorithm ensemble also impressed Carvalheira. In the existing analytics environment, the company only had enough time and resources to run decision trees, a type of machine learning algorithm. But with the headroom provided by H20 running on his own cluster, he now has the luxury to explore other machine learning techniques.
“Today I don’t have to rush, so I can not only do one decision tree, but I can do a random forest of decision trees,” he says. “I can do many decision trees, or I can do a random-boost machine type of model, which is a model that exercises different models that complement each other. So I can test many different variations and techniques, in the same time or even a shorter period of time, than I would in the past.”
Impact on the Factory
The business requirements for Carvalheira’s team remain the same–calculate the chances that 160 million businesses will purchase a given Cisco during the quarter. Armed with the H20 software running on an isolated cluster, the marketing analytics team is able to retrain and score the predictive models in a much shorter timeframe than before.
“The results are fantastic,” Carvalheira says. “We see anywhere from three to seven times better results with the models that we have.” For the modeling and scoring alone, the H20 environment is upwards of 10 to 15 times faster, he adds.
“Before, I used to have one little hammer to hit my nail. Now have I different hammers with H20,” he says. “And H20 gives us breathing room to make mistakes. If you make a mistake in a set of models that takes a couple minutes to run, it’s not so bad. It’s experimentation and simulation, so we have the freedom to experiment, to run more models, to err.”
Cisco maintains its existing analytics environment for certain tasks. But for the particular problem of retraining and scoring the P2B models in as quick a manner as possible, the little H20 cluster is generating the results much more quickly than the big server powering the old modeling system.
“Before it was taking so long to create these models that the shelf time of the models would be really small, so you would have to quickly act on them,” he says. “Now we have much more time to develop, so we don’t have to rush.”
Carvalheira is a big fan of H20, and has been to several events hosted by the Mountain View, California-based startup. While the software is free and open source, Carvalheira has opted to pay for technical support. He’s hoping to expand Cisco’s use of the software, and is exploring how it might be integrated with Hadoop or Spark (H20 just unveiled its Spark environment, called Sparkling Water, last fall, when it changed its name from “Hexadata”).
“If you think what I managed to do with my group….a little cluster that I managed to put together with a couple of engineering workstations, that’s incredible,” he says.