In Search of a Common Deep Learning Stack
Web serving had the LAMP stack, and big data had its SMACK stack. But when it comes to deep learning, the technology gods have yet to give us a standard suite of tools and technologies that are universally accepted. Despite the lack of a unifying acronym (or perhaps because of it), early adopters solider on.
The idea of a common “stack” upon which developers build – and administrators run — applications has become popular in recent years. Blessed with a multitude of competing options, developers can be fearful of picking the “wrong” tools and technologies and being left on the dark side of a forked project. Administrators tasked with keeping the creations of developers running similarly are afraid of inheriting a technological albatross that weighs them down.
As early adopters put new technologies through the paces, they expose the good and the bad of each approach. Exposed to the real world, stronger technologies thrive, while weaker technologies fall by the wayside. Through a natural selection-like process, technologies that are well-adapted to their environments beget new projects that share their inherent qualities. Gradually over time – or sometimes quite quickly through a punctuated process– through a million and one A/B tests, a champion emerges.
We’re in the middle of those million and one A/B tests for deep learning technology at the moment. It’s human nature to seek safety in numbers, but if you venture out into the deep learning world, you must realize there’s no herd to protect you. While some technologies have emerged as front-runners, there’s no guarantee they’ll be around in five or 10 years, which means there’s substantial risk.
Dillon Erb is an early adopter who jumped eagerly into the deep learning pool. He co-founded a company called Paperspace that’s building an abstraction layer to shield developers from the vagaries of early deep learning technology.
“Therese’s certainly some leaders in the space,” Erb says of deep learning technologies. “I would say that [a stack] hasn’t yet emerged in the traditional sense, in the sense of the LAMP stack, or the go-to tooling.”
Among deep learning frameworks, Google’s TensorFlow jumped out to an early lead, but recently PyTorch has come out of nowhere and is now neck and neck with TensorFlow, Erb says. Similarly NVidia GPUs and the associated CUDA tooling are in high demand at the moment, but other processor types are currently in the mix, or at least on the drawing board.
Kubernetes is also an early leader in the cluster orchestration level, while the integrated development environment (IDE) level is pointing toward one project in particular. “The Jupyter notebook has become the defacto IDE for a data science or a machine learning person,” he says.
A general pattern is also emerging around the basic architecture that deep learning applications will use. “A task-running architecture … as a pattern has emerged, and it’s something that most people are doing,” Erb says.
Google’s Kubeflow is the model for that that task-running architecture on Kubernetes. Kubeflow has been adopted somewhat widely by early deep learning adopters, but it can be difficult to operationalize. Yesterday, Intel unveiled its own deep learning framework, called Nauta, that is based on Kubeflow.
“But in terms of which particular tools to use, that’s still being defined, which is one of the big challenges in our space,” Erb continues. “PyTorch was not even on the radar when we first got into the machine learning universe and then out of nowhere it kind of blew up. It’s all NVidia GPUs today, but I would imagine that newer architectures will come out, and maybe they’re from NVidia, Intel, or Graphcore.
“It’s definitely a Cambrian explosion of tools right now.”
Deep learning is the key technological breakthrough that’s carrying the ball for AI at the moment, and it’s arguably responsible for much of the promise (and the hype) that surrounds AI. However, deep learning is still cloaked in complexity, and it’s only practiced by experts.
In fact, across the entire planet, there are only 22,000 Ph.D.-carrying data science researchers and engineers worldwide who have the technical skills to deploy deep learning methodologies in a commercial setting, according to an analysis of LinkedIn profiles conducted by the AI company Element last year.
“Modern AI or deep learning is extremely powerful, but really is only usable by experts. A team has to have a networking expert, a DevOps expert, a statistician and AI expert,” says Erb, the CEO of Paperspace. “This technology is amazing and transformative, but it’s just too hard to use right now, even for well-intentioned, smart people.”
The Brooklyn, New York-based startup aims to “abstract away” those dependencies by modernizing the tooling, and in the process automate as much of the deep learning stack as possible. The approach it’s taking with its offering, called Gradient, is not dissimilar than what Amazon is doing with some of its machine learning offerings, like Sagemaker. But Paperspace is developing a higher-order system that can co-exist with the machine learning and deep learning offerings from Amazon, Google, Microsoft, and others.
Paperspace, which has raised $21.5 million, gives developers the tools to build a deterministic, reproducible machine learning pipeline that can support many different processing architectures, according to Erb. “We call our scheduling software GRAI,” he says. “It supports CPUs, TPUs, GPUS, and pre-emptible instance types.” Customers can specify the exact machine they want to run on in AWS or GCP, or they can leave the scheduling to Papersapce.
The Web giants have built their own deterministic, reproducible machine learning pipelines, but they have billions of dollars to throw at the problem. There are certain classes of problems that deep learning is very good at solving – computer vision and natural language processing being the leading two – and Erb thinks that even midsize insurance companies will eventually use the technology.
“The ambition over time is to lower the barrier of entry and I think that’s an essential step for this technology to be used not just by Google and Geoff Hinton and the smart guys in Toronto,” Erb says. “I do think that it will become less exotic over the next few years, and a more important tool in the toolbox of development teams, to say, hey we don’t have to be experts, or researchers in the deep learning space to leverage the insights that the technology can provide.”
Alas, we’re not there yet, as the aforementioned stack has yet to emerge. But thanks to the early work of pioneers like Paperspace, a general form perhaps is beginning to take shape.