How Nvidia DLSS 3 Works, and Why the FSR Isn’t Catching Up Yet

Nvidia’s RTX 40-series graphics cards are coming in just a few weeks, but all of the hardware improvements may be Nvidia’s golden egg: DLSS 3. It’s more than just an update to Nvidia’s popular DLSS (Deep Learning Super Sampling) feature . , and it could define Nvidia’s next generation of far more than the graphics cards themselves.

AMD has been working hard to bring its FidelityFX Super Resolution (FSR) on par with DLSS, and for the past several months, it has been successful. DLSS 3 looks like it will change that dynamic — and this time around, FSR may not be able to catch up anytime soon.

How DLSS 3 Works (and How It Doesn’t)

A chart showing how Nvidia's DLSS 3 technology works.

You would be forgiven for thinking that DLSS 3 is a brand new version of DLSS, but it is not. Or at least, it’s not completely new. The backbone of DLSS 3 is the same super-resolution technology that’s available in DLSS titles today, and Nvidia will likely continue to improve on this with new versions. Nvidia says you’ll now see the super-resolution portion of DLSS 3 as a separate option in graphics settings.

The new part is frame generation. DLSS 3 will generate a completely unique frame every other frame, essentially generating seven out of every eight pixels you see. You can see an example of this in the flow chart below. In the case of 4K, your GPU only renders pixels for 1080p and uses that information for not only the current frame but the next frame as well.

A chart showing how DLSS reconstructs 3 frames.

According to Nvidia, the frame generation will be a separate toggle from the super resolution. This is because for now Frame Generation only works on RTX 40-series GPUs, while Super Resolution will continue to work on all RTX graphics cards, even in games updated to DLSS 3. It should go without saying, but if half your frames are fully generated, that will increase your performance. very.

However, frame generation isn’t just some AI secret sauce. In tools like DLSS 2 and FSR, motion vectors are an important input for upscaling. They describe where objects are moving from frame to frame, but motion vectors only apply to geometry in a scene. Elements that do not have 3D geometry, such as shadows, reflections and particles, have traditionally been left out of the upscaling process to avoid visual artifacts.

A chart shing motion via Nvidia's DLSS 3.

Masking isn’t an option when an AI is generating completely unique frames, which is where the Optical Flow Accelerator in the RTX 40-series GPUs comes into play. It’s like a motion vector, except that the graphics card is tracking the movement of individual pixels from frame to frame. It contributes to AI-generated frames with optical flow field, motion vectors, depth and color.

It all sounds counterintuitive, but there’s a big problem with AI-generated frames: they increase latency. The AI-generated frame never passes through your PC – it’s a “fake” frame, so you won’t see it on a traditional FPS readout in games or tools like FRAPS. Therefore, the latency does not decrease even though there are so many additional frames, and due to the computational overhead of optical flow, the latency actually increases. Because of that, DLSS 3 requires Nvidia Reflex to offset the high latency.

Normally, your CPU stores a render queue for your graphics card to ensure that your GPU isn’t waiting to do work (which will lead to stutter and frame rate drops). Reflex dequeues the render queue and syncs your GPU and CPU so that as soon as your CPU can send instructions, the GPU starts processing them. When implemented on top of DLSS 3, Nvidia says that Reflex can sometimes even decrease latency.

where AI makes a difference

Microsoft Flight Simulator | NVIDIA DLSS 3 – Exclusive First-Look

AMD’s FSR 2.0 doesn’t use AI, and as I wrote about a while back, it proves that you can get the same quality as DLSS with algorithms rather than machine learning. DLSS 3 changes with the introduction of optical flow along with its unique frame generation capabilities.

Optical flow isn’t a new idea—it’s been around for decades and has applications in everything from video-editing applications to self-driving cars. However, computing optical flow with machine learning is relatively new due to the increase in datasets for training AI models. The reason why you might want to use AI is simple: it produces fewer visual errors when given enough training and doesn’t have much overhead at runtime.

DLSS is executing at runtime. To estimate how each pixel moves from one frame to another, it is possible to develop an algorithm free of machine learning, but it is computationally expensive, which runs counter to the whole point of supersampling in the first place. . With AI models that don’t require a lot of horsepower and enough training data – and rest assured, Nvidia has a lot of training data to work with – you can get optical flow that’s high quality and at runtime may be executed.

This improves the frame rate even in CPU limited games. Supersampling only applies to your resolution, which depends almost exclusively on your GPU. With a new frame excluding CPU processing, DLSS 3 can double the frame rate in a game, even if you have a full CPU bottleneck. This is impressive and is currently only possible with AI.

Why FSR 2.0 is not catching on (for now)

Comparison of FSR and DLSS image quality in God of War.

AMD has really accomplished the impossible with FSR 2.0. It looks great, and the fact that it’s brand-agnostic is even better. I’m ready to give up DLSS for FSR 2.0 since I first saw it deathloop, But as much as I enjoy FSR 2.0 and think it’s a great piece of kit from AMD, it’s not going to catch up to DLSS 3 anytime soon.

For starters, developing an algorithm that can track every pixel between frames free of artifacts is quite difficult, especially in 3D environments with dense fine detail (cyberpunk 2077 is a prime example). It is possible, but difficult. The bigger issue, however, is how bloated that algorithm needs to be. Tracking each pixel through 3D space, performing optical flow calculations, creating a frame, and cleaning up any mishaps along the way – that’s a lot to ask.

When a game is running and still providing frame rate improvements at the level of FSR 2.0 or DLSS, running it is asking for even more. Nvidia, even with dedicated processors and a trained model, still has to use reflexes to offset the high latency imposed by optical flow. Without that hardware or software, the FSR would probably trade off a lot of latency to generate the frames.

I have no doubt that AMD and other developers will eventually get there – or find another way around the problem – but it could be a few years down the road. Hard to say now.

Coming Soon – GeForce RTX 4090 DLSS 3 First Look Teaser Trailer

It’s easy to say that DLSS 3 sounds pretty exciting. Of course, we’ll have to wait until we’re here to validate Nvidia’s performance claims and see how the image quality holds up. As of now, we have a short video from Digital Foundry showing off DLSS 3 footage (above), which I highly recommend watching until we see further third-party testing. From our current vantage point, though, DLSS 3 certainly looks promising.

This article is part of ReSpec – an ongoing biennial column featuring discussion, advice and in-depth reporting on the technology behind PC gaming.

Editors’ Recommendations

Source link

Leave a Reply

Your email address will not be published.