LiveSplat is a system for turning RGBD camera streams into Gaussian splat scenes in real-time. The system works by passing all the RGBD frames into a feed forward neural net that outputs the current scene as Gaussian splats. These splats are then rendered in real-time. I've put together a demo video at the link above.
However, I have not baked in the size or orientation into the system. Those are "chosen" by the neural net based on the input RGBD frames. The view dependent effects are also "chosen" by the neural net, but not through an explicit radiance field. If you run the application and zoom in, you will be able to see the splats of different sizes pointing in different directions. The system as limited ability to re-adjust the positions and sizes due to the compute budget leading to the pixelated effect.
[1] https://imgur.com/a/QXxCakM
This is getting unreal. They're becoming fast and high fidelity. Once we get better editing capabilities and can shape the Gaussian fields, this will become the prevailing means of creating and distributing media.
Turning any source into something 4D volumetric that you can easily mold as clay, relight, reshape. A fully interactable and playable 4D canvas.
Imagine if the work being done with diffusion models could read and write from Gaussian fields instead of just pixels. It could look like anything: real life, Ghibli, Pixar, whatever.
I can't imagine where this tech will be in five years.
100%. And style-transfer it into steam punk or H.R. Giger or cartoons or anime. Or dream up new fantasy worlds instantaneously. Explore them, play them, shape them like Minecraft-becomes-holodeck. With physics and tactile responses.
I'm so excited for everything happening in graphics right now.
Keep it up! You're at the forefront!
Could you or someone else wise in the ways of graphics give me a layperson's rundown of how this works, why it's considered so important, and what the technical challenges are given that an RGB+D(epth?) stream is the input?
Usually creating a Gaussian splat representation takes a long time and uses an iterative gradient-based optimization procedure. Using RGBD helps me sidestep this optimization, as much of the geometry is already present in the depth channel and so it enables the real-time aspect of my technique.
When you say "big deal", I imagine you are also asking about business or societal implications. I can't really speak on those, but I'm open to licensing this IP to any companies which know about big business applications :)
I actually started with pointclouds for my VR teleoperation system but I hated how ugly it looked. You end up seeing through objects and objects becoming unparseable if you get too close. Textures present in the RGB frame also become very hard to make out because everything becomes "pointilized". In the linked video you can make out the wood grain direction in the splat rendering, but not in the pointcloud rendering.
[1] https://youtu.be/-u-e8YTt8R8?si=qBjYlvdOsUwAl5_r&t=14
The depth is helpful to properly handle the parallaxing of the scene as the view angle changes. The system should then ideally "in-paint" the areas that are occluded from the input.
You can either guess the input depth from matching multiple RGB inputs or just use depth inputs along with RGB inputs if you have them. It's not fundamental to the process of building the splats either way.
What I think I'm seeing is like one of those social media posts where someone has physically printing out a tweet, taken a photo of them holding the printout, and then posted another social media post of the photo.
Is the video showing me a different camera perspective than what was originally captured, or is this taking a video feed, doing technical magic to convert to gaussian splats, and then converting it back into a (lower quality) video of the same view?
Again, congratulations, this is amazing from a technical perspective, I'm just trying to understand some of the potential applications it might have.
I took a screen recording of this system as it was running and cut it into clips to make the demo video.
I hope that makes sense?
I wonder if one can go the opposite route and use gaussian splatting or (more likely) some other method to generate 3D/4D scenes from cartoons. Cartoons are famously hard to emulate in 3D even entirely manually; like with traditional realistic renders (polygons, shaders, lighting, post-processing) vs gaussian splats, maybe we need a fundamentally different approach.
That being said, afaict OP's method is 1000x faster, at 33ms.
I'm also following this work https://guanjunwu.github.io/4dgs/ which produces temporal Gaussian splats but takes at least half an hour to learn the scene.