March 21, 2023

Nvidia Unveils GPUs for Generative Inference Workloads like ChatGPT

Alex Woodie

The Grace Hopper "superchip"

Today at its GPU Technology Conference, Nvidia took the wraps off three new GPUs designed to accelerate inference workloads for generative AI applications, including generating text, images, and videos. It also launched a new GPU for recommendation models, vector databases, and graph neural nets.

Generative AI has surged in popularity since November, when OpenAI released ChatGPT to the world. Companies are now looking to use conversational AI systems (sometimes called chatbots) to service customer needs. That is great news for Nvidia, which makes the GPUs that are typically used to train large language models (LLMs) such as ChatGPT, GPT-4, BERT, or Google’s PaLM.

But in addition to training LLMs and generative computer vision models such as OpenAI’s DALL-E, GPUs can also be used to accelerate the inference side of the AI workload. To that end, Nvidia today unveiled three new GPUs designed to accelerate inference workloads.

The first is the Nvidia H100 NVL for Large Language Model Deployment. Nvidia says this new offering is “ideal for deploying massive LLMs like ChatGPT at scale.” It sports 188GB of memory and features a “transformer engine” that the company claims can deliver delivers up to 12x faster inference performance for GPT-3 compared to the prior generation A100, at data center scale.

The H100 NVL for LLM Deployment is composed of two previously announced H100 GPUs built on the PCI form factor connected via an NVLink bridge, and “will supercharge” LLM inferencing, says Ian Buck, Nvidia’s vice president of hyperscale and HPC computing.

The new Nvidia L4 GPU (Image courtesy Nvidia)

“These two GPUs work as one to deploy large language models and GPT models from anywhere from 5 billion parameters all the way up to 200 [billion parameters],” Buck said during a press briefing Monday. “It has 188 gigabytes of memory and is 12x faster, this one GPU, than the throughput of an DGX A100 system that’s being used today everywhere. I’m really excited about the Nvidia H100 NVL. It’s going to help democratize the ChatGPT use cases and bring that capability to every server in every cloud.”

The Santa Clara, California company also revealed more about the L40 GPU for Image Generation that it introduced last September. The new GPU SKU is optimized for graphics and AI-enabled 2D, video, and 3D image generation. Compared to the previous generation chip, the L40 for Image Generation delivers 7x the inference performance for Stable Diffusion (an AI image generator) and 12x the performance for powering Omniverse workloads.

Nvidia also revealed the L4 for AI Video. This GPU, which can serve as a general GPU for any workload, can deliver 120 times faster video inference than CPU servers, the company claims.

Finally, the company talked up its Grace Hopper processor as being ideal for graph recommendation models, vector databases, and graph neural nets. Sporing a 900 GB/s NVLink-C2C connection between CPU and GPU, the Grace Hopper “superchip” will be able to deliver 7X faster data transfers and queries compared to PCIe Gen 5, Nvidia says.

The new Nvidia L40 GPU (Image courtesy Nvidia)

“The Grace CPU and the Hopper GPU combined really excel at those very large memory AI tasks for inference, for workloads like large recommender systems, where they have huge embedding tables to help predict what customers need, want, and want to buy,” Buck says. “We see Grace Hopper superchip [bringing] amazing value in the areas of large recommender systems and vector databases.”

All of the new inference GPUs ship with Nvidia software, such as its AI Enterprise suite. This suite includes Nvidia’s TensorRT software development kit (SDK) high-performance deep learning inference and the Triton Inference Server, which is an open-source inference-serving software that helps standardize model deployment.

Some of Nvidia’s partners have already adopted some of these new products. Google Cloud, for instance, is using L4 in its Vertex AI cloud service. A company called Descript is using the L4 GPU in Google Cloud to power its generative AI service, which caters to video and podcast creators. Another startup called WOMBO is using L4 on Google Cloud to power its text-to-art generation service. A company called Kuaishou is also using L4 on Google Cloud to power its short video service.

The L4 GPU is available as a private preview on Google Cloud as well as through 30 server makers, including ASUS, Dell Technologies, HPE, Lenovo, and Supermicro. The L40 is available from a select number of system builders, while Grace Hopper and H100 NVL are expected to be available in the second half of the year.

Related Items:

GPT-4 Has Arrived: Here’s What to Know

Like ChatGPT? You Haven’t Seen Anything Yet

Hallucinations, Plagiarism, and ChatGPT

Applications: Artificial Intelligence

Technologies: Processors

Sectors: Retail

Vendors: NVIDIA

Tags: GPU, GPU Technology Conference, Grace Hopper, gtc, H100, inference workload, L4, L40, large language model, LLM, Nvidia

Only registered users may comment. Register using the form below.

Check off newsletters you would like to receive*
- HPCwire
- EnterpriseTech
- Datanami
- Technology Conferences & Events
- Advanced Computing Job Bank
- Technology Product Showcase
Email*
Name*
First Last
Organization*
Job Function*
Industry*
Country*
City*
State*
Province*
- Please check here to receive valuable email offers from Datanami on behalf of our select partners.

Nvidia Unveils GPUs for Generative Inference Workloads like ChatGPT

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Sponsored Partner Content

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Learn How to Build a Custom Chatbot Using a RAG Workflow in Minutes [Hands-on Demo]

Overcome ETL Bottlenecks with Metadata-driven Integration for the AI Era [Free Guide]

Gartner® Hype Cycle™ for Analytics and Business Intelligence 2023

The Art of Mastering Data Quality for AI and Analytics

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Building an Operational Data Warehouse for Real-time Analytics

Can You Use Kafka as a Database?

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Featured Events

Call & Contact Center Expo

AI & Big Data Expo North America 2024

AI Hardware & Edge AI Summit 2024

CDAO Government 2024

Nvidia Unveils GPUs for Generative Inference Workloads like ChatGPT

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

April 19, 2024

April 18, 2024

Most Read Features

Most Read News In Brief

Most Read This Just In

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Featured Events

Share

Copy short link