Neural Supersampling for Real-Time Rendering

bigdict · 5 years ago

Nvidia recently demoed a similar technique running in real time: https://www.nvidia.com/en-us/geforce/news/nvidia-dlss-your-q...

jsheard · 5 years ago

More than demoed, they've shipped DLSS in quite a few games now. The 1.0 version was underwhelming but the 2.0 version works extremely well in practice.

However Nvidia are treating DLSS as their secret sauce and not publishing any details, so Facebook's more open research is interesting even if it's not as refined yet.

cma · 5 years ago

> However Nvidia are treating DLSS as their secret sauce and not publishing any details

https://developer.nvidia.com/gtc/2020/video/s22698

Scaevolus · 5 years ago

This appears to be more accurate than DLSS according to Fig. 11 in the paper, and has the 16x mode as well as 4x.

daenz · 5 years ago

Relevant performance details:

A Titan V GPU, using the 4x4 upsampling, at a target resolution of 1080p takes 24.42ms or 18.25ms for "fast" mode. This blows out the 11ms budget you have to render at 90hz (6.9ms for 144hz), and it doesn't appear to include rendering costs at all...that time is purely in upsampling.

Cool tech but a ways to go in order to make it useful for VR.

onion2k · 5 years ago

it doesn't appear to include rendering costs at all...that time is purely in upsampling

That part wouldn't be an issue if the plan is to render low resolution images in the cloud and stream them to a device that can upsample them locally. There wouldn't be any local rendering costs.

reitzensteinm · 5 years ago

I'd be very surprised if this is what it's to be used for. The technique requires color, depth and motion vectors. That's three separate video channels, and two of them contain data that isn't usually stuffed into videos.

Any compression artifacts are going to stick out like a sore thumb, so you'll need to stream very high quality, and you're going to have weird interactions between different layers being compressed differently.

esperent · 5 years ago

That might be the plan, but it seems redundant to require users to have high end gaming hardware just to stream games.

alkonaut · 5 years ago

How close are we to working Foveal rendering with eye tracking so that upsampling only needs to be done for a small area of the screen?

Spending precious milliseconds perfecting the corners of the image for VR seems like a complete waste.

RWSen · 5 years ago

Foveal rendering is in a weird spot. The software seems to be there, but mostly only in academia. The hardware is almost nowhere to be found, because it is another expense. People prefer spending that extra money on a better computer, since that improves every VR experience, not just the possibility of (part of) future experiences.

FVR needs a hook: what can it do that "dumb" VR headsets don't?

ethanwillis · 5 years ago

Do you need every single frame to be perfectly upsampled? Maybe there's a proportion of frames that could be rendered faster but with a less accurate method?

cheschire · 5 years ago

Oh interesting. I wonder if you could combine temporal antialiasing techniques with this to get a pseudo upsampling by only upsampling portions of the screen. Maybe focus on edges every other frame, and do different flat surfaces every few frames in between the edge passes. Then use TAA concepts to blend the pixels over frames.

garmaine · 5 years ago

You need every frame to take less than 11ms (for 90Hz) or 8.3ms (120Hz) to render, or else you will get stuttering.

Vel0cityX · 5 years ago

My thoughts exactly, I didn't see anything on the paper addressing that.

I suppose it's assumed that with the contributions of this one, future work can be done to make it faster.

rjeli · 5 years ago

Yeah, I wonder if they plan to fab an asic (FAIR already has their own accelerator asic) to run this on the next Oculus.

carrolldunham · 5 years ago

scanning through it for the clause that gives away what slight of hand was used to correctly get BERLIN from nothing. Suspects

  > and combines the additional auxiliary information
  > multiple frames

In other words, the label "Low Resolution Input" on the blurry images is misleading. The image should be labelled "some of the input".

Vel0cityX · 5 years ago

No idea what "some of the input" means, or why you thought "Low Resolution Input" is disingenuous?

It uses color, depth and subpixel motion vectors of 1-4 previous frames. All things that modern game engines can easily calculate. You didn't even need to read the paper to get this info, it's literally in a picture on the blog post.

carrolldunham · 5 years ago

Right - so a single low-res image should not be paired with the high-res one and labelled as input and output, because that implies the algorithm turned the one data into the other, which it did not do.

tanilama · 5 years ago

Isn't this literally NVIDIA's dlss? And it has already been productized.

Vel0cityX · 5 years ago

Except Nvidia has published pretty much nothing about their method.

pixelhorse · 5 years ago

They did publish a video on how it works, but I can't find it right now.

The inputs are similar:

https://www.nvidia.com/content/dam/en-zz/Solutions/geforce/n...

In contrast to DLSS1, the output of the NN is not color values, but sampling locations and weights, to look up the color values from the previous low-resolution frames.

warvstar · 5 years ago

Would love to add this to my game engine.

Vel0cityX · 5 years ago

Did you read the paper? Or the benchmarks at least? In its fastest mode, it takes like 18ms. Not even usable in real time if you target 30fps.

Great start but definitely needs additional work to be usable in games.

pseudosavant · 5 years ago

There is a big difference between latency and throughput. FPS is throughput. If you assume the entire system is producing only the current frame then those numbers are directly correlated. But most systems, especially game engines/hardware, always have multiple things going in parallel simultaneously.

The H.264 encoder on my CPU introduces >16.7ms of latency into a video stream, but it can encode hundreds of frames per second of SD video all day. Adding ~1 more frame of latency may be worth a quadrupling in image quality/resolution in most circumstances.