The Perfect Storm: How the Chip Shortage Will Impact AI Development
This chip shortage has brought to light our dependency on hardware to run high-tech economies and the everyday lives of consumers. Today, chips can be found in everything from gaming consoles like Xbox Series, PlayStation 5 to household appliances like washing machines, refrigerators and more.
It’s important to note that this chip shortage is not just an isolated supply chain issue; it’s a supply chain issue that will impact every industry—and there’s no solution in sight. According to Forrester Research, the shortage is expected to last through 2022 and into 2023.
While the macro implications of the shortage are vast, there’s one aspect that’s been on my mind: Will this chip shortage negatively impact progress in the world of AI?
Whenever a chip is introduced, developers need new ways to connect to applications—models serve this purpose in AI. But what if the creation and evolution of AI models far outpace the design process and introduction of new hardware due to shortage? Answer: We could see a major decline in breakthrough AI performance across every industry because models and specialized hardware depend on each other—especially today given the Cambrian explosion of hardware.
But how did we get here? The key reason for the current silicon shortage is simple: demand outpacing supply. This is largely due to the pandemic and the expected recession that didn’t quite happen—silicon fabs ramped down capacity expecting a recession only to be surprised by fast-growing demand fueled by people living their lives through electronic devices.
Hardware Innovation Can’t Keep Up With Model Evolution
Specialized AI chips are attractive because they can offer much higher performance—and more importantly, performance per watt—by specializing their architecture and circuits to AI / ML workloads. This specialization, however, targets classes / types of models that evolve very quickly. The fact is that by the time a new shiny AI chip is ready to hit the market, the popular model architectures have already evolved beyond them. Transition from model research to deployment can happen in a matter of a few short months, rendering the model to hardware compatibility obsolete at a fast pace.
Consider a complex model like GPT-3. A model of this caliber has over 150 billion parameters and takes over 3×10^23 compute operations to train. It can take centuries if trained on a single modern GPU. Designing an efficient specialized chip for it requires understanding the specific types of data items, compute operations and sparsity distribution on the data, and designing hardware to support that well. But by the time a chip is ready to launch, the GPT-3 model will have been replaced by a more advanced version, or another model altogether—with potential trillions of parameters and very different compute operation mix and data sparsity distribution.
One could say that manufacturers can make chips more general, but that completely undermines all the innovation that has taken place by accelerating model performance via specialized hardware—and that’s simply not an option.
But It’s Not All Doom & Gloom—We can Make Due With What We Have
So how do we keep benefiting from advances in AI then? We make the most with the hardware we have now. That will involve developing new techniques to create a model more suitable for a chosen hardware (e.g., hardware-aware model architecture search) or using techniques to automatically optimize and tune ML model code to specific HW architectures — for instance, what the Apache TVM open source ML optimization and deployment stack does by employing ML-powered optimization techniques to generate highly efficient code, often with benefits from 2x-30x.
Making models use hardware more effectively offers a better end-user experience, lower cloud costs, and enables new applications. But more deeply, it also leads to lower environmental impact because of lower energy usage and better utilization of existing deployed hardware (lower hardware churn). This should not be ignored, given the environmental cost of global-scale AI/ML systems.
About the author: Luis Ceze is the CEO and co-founder of OctoML, which develops a commercial version of the open source Apache TVM compiler. He is also a professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington.