Show HN: Real-Time Gaussian Splatting

Correct me if I'm wrong but looking at the video this just looks like a 3D point cloud using equal-sized "gaussians" (soft spheres) for each pixel, that's why it looks still pixelated especially at the edges. Even when it's low resolution the real gaussian splatting artifacts look different with spikes an soft blobs at the lower resolution parts. So this is not really doing the same as a real gaussian splatting of combining different sized view-dependent elliptic gaussians splats to reconstruct the scene and also this doesn't seem to reproduce the radiance field as the real gaussian splatting does.

markisus · 10 months ago

I had to make a lot of concessions to make this work in real-time. There is no way that I know to replicate the fidelity of "actual" Gaussian splatting training process within the 33ms frame budget.

However, I have not baked in the size or orientation into the system. Those are "chosen" by the neural net based on the input RGBD frames. The view dependent effects are also "chosen" by the neural net, but not through an explicit radiance field. If you run the application and zoom in, you will be able to see the splats of different sizes pointing in different directions. The system as limited ability to re-adjust the positions and sizes due to the compute budget leading to the pixelated effect.

markisus · 10 months ago

I've uploaded a screenshot from LiveSplat where I zoomed in a lot on a piece of fabric. You can see that there is actually a lot of diversity in the shape, orientation, and opacity of the Gaussians produced [1].

[1] https://imgur.com/a/QXxCakM

OP, this is incredible. I worry that people might see a "glitchy 3D video" and might not understand the significance of this.

This is getting unreal. They're becoming fast and high fidelity. Once we get better editing capabilities and can shape the Gaussian fields, this will become the prevailing means of creating and distributing media.

Turning any source into something 4D volumetric that you can easily mold as clay, relight, reshape. A fully interactable and playable 4D canvas.

Imagine if the work being done with diffusion models could read and write from Gaussian fields instead of just pixels. It could look like anything: real life, Ghibli, Pixar, whatever.

I can't imagine where this tech will be in five years.

markisus · 10 months ago

Thanks so much! Even when I was putting together the demo video I was getting a little self-critical about the visual glitches. But I agree the tech will get better over time. I imagine we will be able to have virtual front row seats at any live event, and many other applications we haven't thought of yet.

echelon · 10 months ago

> I imagine we will be able to have virtual front row seats at any live event, and many other applications we haven't thought of yet.

100%. And style-transfer it into steam punk or H.R. Giger or cartoons or anime. Or dream up new fantasy worlds instantaneously. Explore them, play them, shape them like Minecraft-becomes-holodeck. With physics and tactile responses.

I'm so excited for everything happening in graphics right now.

Keep it up! You're at the forefront!

_verandaguy · 10 months ago

I know enough about 3D rendering to know that Gaussian splatting's one of the Big New Things in high-performance rendering, so I understand that this is a big deal -- but I can't quantify why, or how big a deal it is.

Could you or someone else wise in the ways of graphics give me a layperson's rundown of how this works, why it's considered so important, and what the technical challenges are given that an RGB+D(epth?) stream is the input?

markisus · 10 months ago

Gaussian Splatting allows you to create a photorealistic representation of an environment from just a collection of images. Philosophically, this is a form of geometric scene understanding from raw pixels, which has been a holy grail of computer vision since the beginning.

Usually creating a Gaussian splat representation takes a long time and uses an iterative gradient-based optimization procedure. Using RGBD helps me sidestep this optimization, as much of the geometry is already present in the depth channel and so it enables the real-time aspect of my technique.

When you say "big deal", I imagine you are also asking about business or societal implications. I can't really speak on those, but I'm open to licensing this IP to any companies which know about big business applications :)

spyder · 10 months ago

yuchi · 10 months ago

The output looks terribly similar to what sci-fi movies envisioned as 3D reconstruction of scenes. It is absolutely awesome. Now, if we could project them in 3D… :)

tough · 10 months ago

Apple Vision maybe?

drac89 · 10 months ago

maybe

whywhywhywhy · 10 months ago

Would be good to see how it's different from just the depth channel applied to the Z of the RGB pixels. Because it looks very similar to that.

The application has this feature and lets you switch back and forth. What you are talking about is the standard pointcloud rendering algorithm. I have an older video where I display the corresponding pointcloud [1] in a small picture in picture frame so you can compare.

I actually started with pointclouds for my VR teleoperation system but I hated how ugly it looked. You end up seeing through objects and objects becoming unparseable if you get too close. Textures present in the RGB frame also become very hard to make out because everything becomes "pointilized". In the linked video you can make out the wood grain direction in the splat rendering, but not in the pointcloud rendering.

[1] https://youtu.be/-u-e8YTt8R8?si=qBjYlvdOsUwAl5_r&t=14

sendfoods · 10 months ago

Please excuse my naive question - isn't Gaussian Splatting usually used to create 3D imagery from 2D? How does providing 3D input data make sense in this context?

Yes, the normal case uses 2D input, but it can take hours to create the scene. Using the depth channel allows me to create the scene in 33 milliseconds, from scratch, every frame. You could conceptualize this as a compromise between raw pointcloud rendering and fully precomputed Gaussian splat rendering. With pointclouds, you have a lot visual artifacts due to sparsity (low texture information, seeing "through" objects"). With Gaussian splatting, you can transfer a lot more of the 2D texture information into 3D space and render occlusion and view-dependent effects better.

Retr0id · 10 months ago

How do the view-dependent effects get "discovered" from only a single source camera angle?

That makes things clearer, thanks!

jayd16 · 10 months ago

Splatting is about building a scene that supports synthetic view angles.

The depth is helpful to properly handle the parallaxing of the scene as the view angle changes. The system should then ideally "in-paint" the areas that are occluded from the input.

You can either guess the input depth from matching multiple RGB inputs or just use depth inputs along with RGB inputs if you have them. It's not fundamental to the process of building the splats either way.

ttoinou · 10 months ago

Well if you have the D channel you might as well benefit from it and have better output

hi_hi · 10 months ago

While undoubtedly technically impressive, this left me a little confused. Let me explain.

What I think I'm seeing is like one of those social media posts where someone has physically printing out a tweet, taken a photo of them holding the printout, and then posted another social media post of the photo.

Is the video showing me a different camera perspective than what was originally captured, or is this taking a video feed, doing technical magic to convert to gaussian splats, and then converting it back into a (lower quality) video of the same view?

Again, congratulations, this is amazing from a technical perspective, I'm just trying to understand some of the potential applications it might have.

Yes this converts video stream (plus depth) into Gaussian splats on the fly. While the system is running you can move the camera around to view the splats at different angles.

I took a screen recording of this system as it was running and cut it into clips to make the demo video.

I hope that makes sense?

armchairhacker · 10 months ago

Gaussian Splatting looks pretty and realistic in a way unlike any other 3D render, except UE5 and some hyper-realistic not-realtime renders.

I wonder if one can go the opposite route and use gaussian splatting or (more likely) some other method to generate 3D/4D scenes from cartoons. Cartoons are famously hard to emulate in 3D even entirely manually; like with traditional realistic renders (polygons, shaders, lighting, post-processing) vs gaussian splats, maybe we need a fundamentally different approach.

mandeepj · 10 months ago

Another implementation of splat https://github.com/NVlabs/InstantSplat

jasonjmcghee · 10 months ago

The quality is better, no doubt, but this method (from the paper) takes on the order of 10-45s depending on input from their table. Which is much better than 10 minutes etc.

That being said, afaict OP's method is 1000x faster, at 33ms.

Note that the method you linked is "Splatting in Seconds" where as real-time requires splatting in tens of milliseconds.

I'm also following this work https://guanjunwu.github.io/4dgs/ which produces temporal Gaussian splats but takes at least half an hour to learn the scene.