July 14, 2021

The Perfect Storm: How the Chip Shortage Will Impact AI Development

Luis Ceze

(alexgo.photography/Shutterstock)

This chip shortage has brought to light our dependency on hardware to run high-tech economies and the everyday lives of consumers. Today, chips can be found in everything from gaming consoles like Xbox Series, PlayStation 5 to household appliances like washing machines, refrigerators and more.

It’s important to note that this chip shortage is not just an isolated supply chain issue; it’s a supply chain issue that will impact every industry—and there’s no solution in sight. According to Forrester Research, the shortage is expected to last through 2022 and into 2023.

While the macro implications of the shortage are vast, there’s one aspect that’s been on my mind: Will this chip shortage negatively impact progress in the world of AI?

Whenever a chip is introduced, developers need new ways to connect to applications—models serve this purpose in AI. But what if the creation and evolution of AI models far outpace the design process and introduction of new hardware due to shortage? Answer: We could see a major decline in breakthrough AI performance across every industry because models and specialized hardware depend on each other—especially today given the Cambrian explosion of hardware.

But how did we get here? The key reason for the current silicon shortage is simple: demand outpacing supply. This is largely due to the pandemic and the expected recession that didn’t quite happen—silicon fabs ramped down capacity expecting a recession only to be surprised by fast-growing demand fueled by people living their lives through electronic devices.

Hardware Innovation Can’t Keep Up With Model Evolution

Specialized AI chips are attractive because they can offer much higher performance—and more importantly, performance per watt—by specializing their architecture and circuits to AI / ML workloads. This specialization, however, targets classes / types of models that evolve very quickly. The fact is that by the time a new shiny AI chip is ready to hit the market, the popular model architectures have already evolved beyond them. Transition from model research to deployment can happen in a matter of a few short months, rendering the model to hardware compatibility obsolete at a fast pace.

Taiwan Semiconductor Manufacturing Company (TSMC) owns a 55% share of the global semiconductor foundry market, per Trendforce (Michael-Vi/Shutterstock)

Consider a complex model like GPT-3. A model of this caliber has over 150 billion parameters and takes over 3×10^23 compute operations to train. It can take centuries if trained on a single modern GPU. Designing an efficient specialized chip for it requires understanding the specific types of data items, compute operations and sparsity distribution on the data, and designing hardware to support that well. But by the time a chip is ready to launch, the GPT-3 model will have been replaced by a more advanced version, or another model altogether—with potential trillions of parameters and very different compute operation mix and data sparsity distribution.

One could say that manufacturers can make chips more general, but that completely undermines all the innovation that has taken place by accelerating model performance via specialized hardware—and that’s simply not an option.

But It’s Not All Doom & Gloom—We can Make Due With What We Have

So how do we keep benefiting from advances in AI then? We make the most with the hardware we have now. That will involve developing new techniques to create a model more suitable for a chosen hardware (e.g., hardware-aware model architecture search) or using techniques to automatically optimize and tune ML model code to specific HW architectures — for instance, what the Apache TVM open source ML optimization and deployment stack does by employing ML-powered optimization techniques to generate highly efficient code, often with benefits from 2x-30x.

Making models use hardware more effectively offers a better end-user experience, lower cloud costs, and enables new applications. But more deeply, it also leads to lower environmental impact because of lower energy usage and better utilization of existing deployed hardware (lower hardware churn). This should not be ignored, given the environmental cost of global-scale AI/ML systems.

About the author: Luis Ceze is the CEO and co-founder of OctoML, which develops a commercial version of the open source Apache TVM compiler. He is also a professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington.

A Wave of Purpose-Built AI Hardware Is Building

‘Octomize’ Your ML Code

Applications: Artificial Intelligence

Technologies: Processors

Sectors: Academia

Vendors: OctoML

Tags: Apache TVM, chip fabs, chip shortage, COVID-19, CPU, GPT-3, GPU, Luis Ceze, machine learning, model training, specialized hardware

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

The Perfect Storm: How the Chip Shortage Will Impact AI Development

Hardware Innovation Can’t Keep Up With Model Evolution

But It’s Not All Doom & Gloom—We can Make Due With What We Have

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 16, 2024

April 15, 2024

April 12, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

The Perfect Storm: How the Chip Shortage Will Impact AI Development

Hardware Innovation Can’t Keep Up With Model Evolution

But It’s Not All Doom & Gloom—We can Make Due With What We Have

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 16, 2024

April 15, 2024

April 12, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link