No, ChatGPT is not going to cause another GPU shortage

ChatGPT is exploding, and the backbone of its AI model relies on Nvidia graphics cards. About 10,000 Nvidia GPUs were used to train ChatGPT, an analyst said, and that as the service continues to expand, so will the need for GPUs. Anyone who has lived through the rise of crypto in 2021 can sense a GPU shortage on the horizon.

I have seen some journalists making the exact relation, but it is wrong. The days of lackluster crypto-driven types of GPUs are behind us. Although we are likely to see an increase in the demand for graphics cards as the AI ​​boom continues, this demand is not directed towards the best graphics cards installed in gaming rigs.

Why Nvidia GPUs are made for AI

A render of Nvidia's RTX A6000 GPU.

First, we’ll explore why Nvidia graphics cards are so good for AI. Nvidia has bet on AI for the past several years, and it has paid off with a surge in the company’s share price following the rise of ChatGPT. There are two reasons to see Nvidia at the center of AI training: Tensor Core and CUDA.

CUDA is Nvidia’s application programming interface (API), used in everything from its most expensive data center GPUs to its cheapest gaming GPUs. CUDA acceleration is supported by machine learning libraries such as TensorFlow, in extremely fast training and inference. CUDA is the driving force behind AMD in AI compared to Nvidia.

However, don’t confuse CUDA with Nvidia’s CUDA cores. CUDA is the platform on which a lot of AI apps run, while CUDA cores are just the cores inside Nvidia GPUs. They share a name, and CUDA cores are better optimized for running CUDA applications. Nvidia’s gaming GPUs have CUDA cores and they support CUDA apps.

Tensor cores are basically dedicated AI cores. They handle matrix multiplication, which is the secret sauce that speeds up AI training. The idea here is simple. Multiply multiple sets of data together, and train AI models faster by generating probabilistic results. Most processors handle tasks in a linear manner, whereas tensor cores can generate scenarios faster in a single clock cycle.

Then again, Nvidia’s gaming GPUs like the RTX 4080 have tensor cores (and sometimes even more than expensive data center GPUs). However, for all the specifications the Nvidia card has to speed up AI models, none of them are as important as memory. And Nvidia’s gaming GPUs don’t have a lot of memory.

it all comes down to memory

HBM memory heap.

According to Jeffrey Heaton, author of several books on artificial intelligence and professor at Washington University in St. Louis, “the size of the memory is what matters.” “If you don’t have enough GPU RAM, your model fitting/estimation just stops.”

Heaton, who has a YouTube channel devoted to how well AI models run on certain GPUs, noted that CUDA cores are also important, but memory capacity is the dominant factor when it comes to AI models. How does the GPU work for The RTX 4090 has a lot of memory by gaming standards – 24GB of GDDR6X – but far less than a data center-class GPU. For example, Nvidia’s latest H100 GPU has 80GB of HBM3 memory, as well as a massive 5,120-bit memory bus.

You can get by with less, but you still need a lot of memory. Heaton recommends that beginners have no less than 12GB, while a typical machine learning engineer will have one or two 48GB professional Nvidia GPUs. According to Heaton, “most workloads will fall more into the single A100 to eight A100 range.” Nvidia’s A100 GPU has 40GB of memory.

You can also see this scalping in action. Paget Systems shows a single A100 with 40GB of memory performing almost twice as fast as a single RTX 3090 with its 24GB of memory. And this is despite the fact that the RTX 3090 has almost twice as many CUDA cores and almost as many Tensor cores.

Memory is the bottleneck, not raw processing power. This is because training AI models depends on large datasets, and the more data you can store in memory, the faster (and more accurately) you can train a model.

different needs, different dies

Hopper H100 graphics card.

Nvidia’s gaming GPUs are generally not well suited for AI because of how little video memory they have compared to enterprise-grade hardware, but there’s a separate issue here as well. Nvidia’s workstation GPUs typically don’t share a GPU die with their gaming cards.

For example, the A100 that Heaton referred to uses the GA100 GPU, a die from Nvidia’s Ampere range that has never been used on gaming-focused cards (including the high-end RTX 3090 Ti). Was. Similarly, Nvidia’s latest H100 uses a completely different architecture than the RTX 40-series, which means it uses a different die as well.

There are exceptions. Nvidia’s AD102 GPU, which is inside the RTX 4090 and RTX 4080, is also used in a smaller range of Ada Lovelace Enterprise GPUs (L40 and RTX 6000). In most cases, though, Nvidia can’t reproduce a gaming GPU die for a data center card. They are worlds apart.

Due to the rise in popularity of crypto-mining and AI models we have seen some fundamental differences between the lack of GPUs. According to Heaton, the GPT-3 model requires more than 1,000 A100 Nvidia GPUs to train and about eight to run. These GPUs also have access to the high-bandwidth NVLink interconnect, whereas Nvidia’s RTX 40-series GPUs do not. This compares to the maximum 24GB of memory on Nvidia’s gaming cards with NVLink to several hundreds on GPUs like the A100.

There are other concerns too, like memory being allocated for professional GPUs more than gaming dies, but gone are the days of going to your local Micro Center or Best Buy for a chance to find a GPU in stock. Heaton summed up that point well: “A large language model like ChatGPT requires at least eight GPUs to run. Such estimates assume a high-end A100 GPU. My guess is that it’s a high-end GPU.” The reduction may cause, but may not affect, a gamer-class GPU with less RAM.

Editors’ Recommendations

Source link

Leave a Reply

Your email address will not be published.