Readit News logoReadit News
stormfather · 10 months ago
Its a time capsule, among other things. I want to take many, many videos of my grandpa's farm, and be able to walk around in it in VR using something like this in the future.
foxglacier · 10 months ago
You can do it using the more classic technique of photogrammetry. There are commercial products used by real estate salesmen to produce high quality "games" where you walk around inside a house, but they're more like Google Streetview where you swoosh between points where a 360 degree photo was taken. All those things will be more faithful than neurally generating next frames based on previous frames and control input.
das_keyboard · 10 months ago
> So, if traditional game worlds are paintings, neural worlds are photographs. Information flows from sensor to screen without passing through human hands.

I don't get this analogy at all. Instead of a human information flows through a neural network which alters the information.

> Every lifelike detail in the final world is only there because my phone recorded it.

I might be wrong here but I don't think this is true. It might also be there because the network inferred that it is there based on previous data.

Imo this just takes the human out of a artistic process - creating video game worlds and I'm not sure if this is worth archiving.

ajb · 10 months ago
>I don't get this analogy at all. Instead of a human information flows through a neural network which alters the information.

These days most photos are also stored using lossy compression which alters the information.

You can think of this as a form of highly lossy compression of an image of this forest in time and space.

Most lossy compression is 'subtractive' in that detail is subtracted from the image in order to compress it, so the kind of alterations are limited. However there have been previous non-subtractive forms of compression (eg, fractal compression) that have been criticised on the basis of making up details, which is certainly something that a neural network will do. However if the network is only trained on this forest data, rather than being also trained on other data and then fine tuned, then in some sense it does only represent this forest rather than giving an 'informed impression' like a human artist would.

andai · 10 months ago
>These days most photos are also stored using lossy compression which alters the information.

I noticed this in some photos I see online starting maybe 5-10 years ago.

I'd click through to a high res version of the photo, and instead of sensor noise or jpeg artefacts, I'd see these bizarre snakelike formations, as though the thing had been put through style transfer.

Legend2440 · 10 months ago
>It might also be there because the network inferred that it is there based on previous data.

There is no previous data. This network is exclusively trained on the data he collected from the scene.

Valk3_ · 10 months ago
This might be a vague question, but what kind of intuition or knowledge do you need to work with these kind of things, say if you want to make your own model? Is it just having experience with image generation and trying to incorporate relevant inputs that you would expect in a 3D world, like the control information you added for instance?
ollin · 10 months ago
I think https://diamond-wm.github.io is a reasonable place to start (they have public world-model training code, and people have successfully adapted their codebase to other games e.g. https://derewah.dev/projects/ai-mariokart). Most modern world models are essentially image generators with additional inputs (past-frames + controls) added on, so understanding how Diffusion/IADB/Flow Matching work would definitely help.
Valk3_ · 10 months ago
Thanks!
udia · 10 months ago
Very nice work. Seems very similar to the Oasis Minecraft simulator.

https://oasis.decart.ai/

ollin · 10 months ago
Yup, definitely similar! There are a lot of video-game-emulation World Models floating around now, https://worldarcade.gg had a list. In the self-driving & robotics literature there have also been many WMs created for policy training and evaluation. I don't remember a prior WM built on first-person cell-phone video, but it's a simple enough concept that someone has probably done it for a student project or something :)
AndrewKemendo · 10 months ago
I think this is very interesting because you seem to have reinvented NeRF, if I’m understanding it correctly. I only did one pass through but it looks at first glance like a different approach entirely.

More interesting is that you made an easy to use environment authoring tool that (I haven’t tried it yet) seems really slick.

Both of those are impressive alone but together that’s very exciting.

bjornsing · 10 months ago
NeRF is a more complex and constrained approach, based on a kind of ray tracing. But results are obviously similar.
AndrewKemendo · 10 months ago
Right which is why i said it’s an entirely different approach but results in almost the same kind of output
throwaway314155 · 10 months ago
Really cool. How much compute did you require to successfully train these models? Is it in the ballpark of something you could do with a single gaming GPU? Or did you spin up something fancier?

edit: I see now that you mention a pricepoint of 100 GPU-hours/roughly 100$. My mistake.

bjornsing · 10 months ago
What used to be cutting edge research not so long ago is now a fun hobby project. I love it.
Jotalea · 10 months ago
It's a really interesting project, reminds me of the 360° videos I used to watch on my phone, back in 2015.

But there's one thing that I'm a little bit worried about: I was getting like 8 stable FPS on my 3 years old flagship phone. My concern is that these models are not optimized to run on this type of hardware, which may or may not lead to hardware obsolescence quicker than planned. And it's not like these aren't powerful, they really are.

ollin · 10 months ago
Curious, which device/OS/browser? I did all my testing on 4-year old hardware (iPhone 13 Pro, M1 Pro MBP), and the model itself is extremely tiny (~1GFLOP) so I'm optimistic that performance issues would be solvable with a better software stack (e.g. native app).
Jotalea · 10 months ago
I was on my Samsung Galaxy S21FE (Snapdragon 888), on the latest version of the Firefox browser for Android (138.0), on One UI 6.1 (Android 14). It is possibly the most powerful device I own, that's why I was concerned.