‘Octomize’ Your ML Code
If you’re spending months hand-tuning your machine learning model to run well on a particular type of processor, you might be interested in a startup called OctoML, which recently raised $28 million to bring its innovative “Octomizer” to market.
Octomizer is the commercial version of Apache TVM, an open source compiler that was created in Professor Luiz Ceze’s research project in the Computer Science Department at the University of Washington. Datanami recently caught up with the professor–who is also the CEO of OctoML–to learn about the state of machine learning model compilation in a rapidly changing hardware world.
According to Ceze, there is big gap in the MLOps workflow between the completion of the machine learning model by the data scientist or machine learning engineer, and deployment of that model into the real world.
Quite often, the services of a software engineer are required to convert the ML model, which is often written in Python using one of the popular frameworks like TensorFlow or PyTorch, into highly optimized C or C++ that can run on a particular processor. However, the process of getting the code to run optimally is not easy, Ceze says.
“There’s really billions of ways in which you can compile a machine learning model into a specific hardware target. Picking the fastest one is a search process that today is done by human intuition,” he says.
“The way you lay out the data structures in memory matters a lot. And which instructions are you going to pick? Are you going to pick vector instruction? Are you going to run this on a CPU or a GPU?” he continues. “All of these choices lead to an exponential blowup of what are the ways in which we can run. Picking the right one is really hard. It’s done by hand tuning most of the time.”
Apache TVM simplifies the process of turning machine learning models from Python code in a TensorFlow model into an executable by using (wait for it) more machine learning. As Ceze explains, Apache TVM uses machine learning to search among all the possible configurations for the optimal way in which to run the model on a given piece of hardware.
“Think of it as a middle layer between frameworks and hardware,” he says. “We offer a clean abstraction across a wide variety of hardware.”
On the software side, Apache TVM supports popular deep learning framework like TensorFlow, PyTorch, Keras, MXnet, Core ML, and ONNX. On the hardware side, Apache TVM supports Intel X86, AMD CPUs, Nvidia GPUs, Arm CPUs, GPUs, and MMU, Qualcomm SOCs, and FPGAs. It supports phones, IoT sensors, embedded devices, and microcontrollers. It’s primarily used for inference ML workloads, not training ML workloads.
“The way it actually works,” Ceze explains, “is when you set up a new hardware target, the TVM engine runs a bunch of little experiment on that target hardware to learn how the new target hardware behaves in the presence of different optimizations. By building that set of training data for how the hardware behaves, you can learn the personality of the hardware target, and it uses that to guide TVM’s optimization for that specific target.”
The main advantage to running an ML model through Apache TVM is time to market. It can take months of hand-tuning for a software engineer to optimize a given model for a given processor type. But Apache TVM can get that same level of performance automatically, within hours or days, Ceze says.
“It’s a compiler, so we’re not going to change the accuracy of your model,” he says. “But compared to the default stacks that exist, we offer anywhere from 2-3x all the way to 30x better performance on the hardware target.”
Ceze acknowledges that a software engineer, using traditional approaches, can probably get that 30x advantage over plain vanilla deployments. “But that’s after a specific amount of hand-tuning and hand-engineering by pretty expensive and hard-to-find people,” he adds.
The Apache TVM project has become quite popular over the past couple of years. The project just passed 500 contributors, and it’s been adopted by Amazon, Facebook, and Microsoft, among others.
That’s where OctoML comes into the picture. The Seattle-based company was co-founded by Ceze and several of his University of Washington graduate students who developed Apache TVM, including CTO Tianqi Chen; Jared Roesch, chief architect of the platform team; and Thierry Moreau, vice president of technology partnerships. Chief Product Officer Jason Knight, who was a staff algorithms engineer at Nervana when it was acquired by Intel, is also a co-founder.
In addition to leading the development of the open source Apache TVM project, the company is developing a commercial version of the product called the Octomizer that is easier to use than the open source software.
“Think of it as TVM as a service,” Ceze says. “You don’t have to set up anything. You don’t have to download the code from GitHub. You don’t have to set up the benchmarking harness. All of that is ready for you as a service.”
The Octomizer (which has to be one of the best product names to come out of the ML community in some time) also brings other advantages, Ceze says.
For starters, it will provide the user with a dashboard that lets them quickly compare their ML model running against a variety of hardware types. The offering also does stuff to help manage the data that’s required by the Apache TVM engine to optimize the compilations.
It’s worth noting here that Apache TVM (and thus Octomizer) are designed to work primarily with deep learning systems. However, it can also be used with traditional machine learning models, like Decision Trees and XGBoost, by expressing them as vectors, Ceze says.
The target market for the Octomizer is any organization that’s developing and deploying its own machine learning model. There are currently about 1,000 companies on the waitlist for Octomizer, which is expected to become available by the end of the year. The list has a mix of big tech companies, financial companies, and computer vision and genomics startups, Ceze says.
OctoML is also working on establishing OEM-type deals with platform providers that want to incorporate Octomizer functionality into their offerings. The company has already established partnerships with Qualcomm and Arm, Ceze says.
The $28 Series B funding round will give OctoML the funds it needs to get Octomizer to market and propel the company’s growth. Among the company’s advisors is Carlos Guestrin, who holds the title of Amazon Professor of Machine Learning in the computer science department at the University of Washington. Guestrin, who also advised the Apache TVM creators, was the founder of a machine learning startup called Turi that eventually was acquired by Apple. He remains at Apple.