Algorithmia Laser-Focused On ML Deployment and Management
The market for enterprise data science tools is very diverse, with new tools appearing all the time. It may eventually determine users are best served with full-featured suites that handle the entire data science lifecycle, from model development to deployment and everything in between. But in the meantime, Algorithmia CEO Diego Oppenheimer is enjoying making and selling tools that automate the last mile.
Oppenheimer says the genesis of Algorithmia was the difficulty that he and his co-founder, Kenny Daniel, experience while trying to deploy predictive applications into production. Daniel was studying AI for his PhD at USC, when he commented to Oppenheimer how much data science effort was being wasted in developing sophisticated machine learning.
“Nobody was able to run them,” says Oppenheimer, who cut his teeth in the analytics business developing BI tools at Microsoft. “If the future is going to be in intelligent applications, then that kind of application is going to need a whole bunch of tools.”
In Oppenheimer’s view, a new software development lifecycle is emerging for machine learning applications. It’s similar in some ways to the old software development lifecycle, in that applications are built, tested, deployed, and managed.
“Except now, we’re dealing with more cases with probabilistic code versus deterministic code, so it varies,” he tells Datanami in a recent interview at Strata Data Conference. “The data varies. The models vary. And so a new set of tools was necessary to build that, and nobody was doing it.”
“If you think about the four areas in machine learning workflow — data prep, model training, deployment, and model management — we do the last two,” Oppenheimer says. “We help companies facilitate a way of getting models into production quicker, which makes it so that data scientists can actually work in a better way.”
Algorithmia’s solution involves a mix of Git, Kubernetes, and Docker. The process starts when a data scientist has finalized her predictive work, developed in any one of a number of supported frameworks and languages. The data scientist uses Git to push the predictive piece (an individual algorithm, a model, or a predictive function) into the Algorithmia environment, dubbed the “AI Layer,” and out pops an API on the other end that she can use to call that algorithm, model, or function.
The secret sauce, of course, is what lies behind that API. The company encapsulates the algorithm, model, or function into a Docker container, deploys it to the hardware of the users’ choice, and manages it with Kubernetes. The company points out that this is the same basic approach that companies like Google and Uber user, except that it’s available as pre-built capabilities for companies without those firms’ extreme IT skill levels.
Algorithmia supports popular data science languages, including R, Python, Java, Scala, Ruby, Rust, and major machine learning frameworks like Spark, PyTorch, Tensorflow, MXNet, H2O, and others. The company’s value lies in knowing the nitty gritty details of what it takes to bring a particularly model built in Tensorflow, say, into “Reddit-scale” production on customers’ infrastructure.
“We’re team dedicated to doing that. That is our value proposition. This is what we do,” Oppenheimer says. “We’ve architected ourselves to be able to do it very quickly. This is the typical argument of build versus buy. Could I could this? Absolutely. Do you want to maintain it? Are you going to work at the speed that we work with? Tensorflow 2.0 is about to come out. Are you going to be ready for that? When new PyTorch comes out, are you ready for that?”
Algorithmia is starting to gain good traction. The company has signed 10 Fortune 100 accounts, Oppenheimer says, and has about 40 employees on staff, with numerous open positions listed on its website.
Oppenheimer understands that customers have lots of options when it comes to data science tools today. The offerings on the cloud are enticing, particularly for those data scientists who want that end-to-end, “one throat to choke” comfort.
There are also the automated machine learning tools that purport to handle the full lifecycle, too. Algorithimia compete with those tools to some extent, but the Seattle, Washington-based firm is laser focused on the challenges of deploying and managing predictive algorithms, models, and functions.
“We pick up where those products leave off, and we make a point to be very good citizens,” Oppenheimer says. “We don’t work with 100% of them. There’s caveats with each one. But our goal is to work with all of them, unless you’re a platform that doesn’t allow extracting models.”
The goal, ultimately, is to make data scientists more productive. Data scientists continue to spend far too much of their valuable time doing non-data science tasks, like cleaning and prepping data and building data pipelines.
An Algorithmia study of 500 data scientists that it commissioned last fall indicated data scientists spend up to 75% of their time with these non-data science tasks, which is in-line with other surveys over the years. The same survey found that 60% of machine learning projects were failing at the deployment stage. Both of those statistics show a looming need for better data science tools, Oppenheimer says.
“If you think about the investment that goes into a data science effort, first of all you spent money on your data lake and data storage and Hadoop. Then you spend money on the data scientists,” he says. “They got to a model that was accurate. They say ‘Hey this works. If we get this into production, we are going to get business results.’ But then they can’t get it into production. Their entire value chain does not light up until you solve that last mile problem.”