That has been VR for a while, but I think VR experiences are now truly worthwhile pursuing in 2024 and beyond. I think we've been there for a few years now.
It does depend a lot on what you prefer to do with your free time in the first place. But if you happen to be killing time on YouTube or something anyway, watching a VR travel walk for example is actually really worthwhile.
I also find that because it is so immersive, I will consume less content because I am actually satisfied earlier than if I had been killing time on 2D content.
The problem with existing VR tech is that it's still too cumbersome to use. Nobody wants to wear a hefty computer strapped to their face for casual use, let alone for prolonged periods of time. The form factor needs to be much more lightweight and unobtrusive for the product to have a chance at mass adoption.
There are AR devices in the form of sunglasses now, but the displays and experience pale in comparison to VR headsets.
We're still a few more generations away from VR headsets shrinking and AR glasses improving for this experience to be universally enjoyable.
For truly immersive VR video it seems we’ll need to support moving your head in 6 degrees of freedom, so you can peak around objects etc. We’re a long ways off from it being realtime, but I can imagine generative AI being used to interpolate frames from an array of cameras spaced fairly widely apart.
There was a lot of work on big light-field capturing rigs in the past, but honestly, with the current advancements in neural radiance fields, gaussian splatting, meshing and other scene reconstruction techniques even monocular capture can generate pretty amazing results. I'm sure that having ultra-high res binocular video could be converted into some pretty high quality free-look output (that can at least be played back in real-time) on high-powered mobile devices could be possible today (w/ just some additional optimization/engineering work).
Yes, this is the closest to what I would consider a true VR photo. Stereoscopic images are gimmicky and feels utterly restrictive compared to this.
From the same lab 2 years later, there's "Immersive Light Field Video
with a Layered Mesh Representation" which is the same, but instead of photo it's video.
But even with that, it would allow only limited movement because if you move to much in the reconstructed space then it would have to show you occluded areas that has been never recorded. But for that case generative AI could be used to fill in the the gaps, which means it wouldn't be true to life reconstruction anymore, but it could be still useful where that isn't that important.
In the Vision Pro demo appointment reel (i.e. anybody can demo the headset for free), everything is 6DOF, most is just stereo. Immersive VR is interactivity, which is more about good UX / gamification and less about using a dual-8k stereo capture device.
Not really, that image is stereo but it isn’t 360 degree and it can’t support arbitrary viewing angles and positions after the fact. It works perfectly for passthrough because the headset cameras are physically locked to your head.
This is great but… isn’t there something fundamentally eye-straining about capturing media at a given 'interocular' distance and then displaying it to a person at their own individual and different 'interocular' distance?
Interocular, meaning distance between a pair of eyes/lenses
From playing games that get that distance wrong, it doesn't give me a headache or anything, it just makes the whole world feel bigger or smaller than reality. Really frustrating in games that try to replicate a real feeling. In a driving sim, for example, it can make you feel like you're driving a toy car rather than a full sized one, or that you're a kid in the seat of your parents car. Obviously not a feeling you want to invoke unintentionally, but perhaps there's ways in software to adjust?
No. While not everyone's eyes may be spaced at the "normal" 64mm distance, delivering images shot with that spacing won't look significantly different to people with differing eye spacing. And even if they did, the distortion would be one of scale.
For example, if you wanted to shoot something from a cat's POV, you'd put the lenses closer together and shoot objects closer to the camera. That would make it easier to fuse images of a mousehole right in front of your face. Things farther away would simply look "less 3-D" and therefore abnormally far a way to a human viewer.
I don't think they're just taking the images and projecting them onto each lens of the vision pro. Forgetting about inter-ocular distance for a second - that just wouldn't work for head tracking and letting you move around the image.
Instead I assume they're using the pair of lenses to approximate a 3d model which allows for display at a variety of angles (all reasonably close to where the camera is), and then rendering from that model for the person looking at the image based on the angle they are actually looking at it from. Which solves the interocular distance problem, because you render for each eye based on where the persons eye is resulting in a distance between the two display images that doesn't have to match the camera.
Not exactly the same, but I like making stereo photos. I just use my iphone to take one photo left, move it over by a bit, then take another photo. I don't think I would notice if I was off by a centimeter or so. But maybe in VR it's different.
I never liked the 3D era of TVs that were so popular in the CES some years ago, I found it gimmicky and not that immersive and the 3D effect requiere an almost perfect scenario to work. But then I got to play with a Meta Quest, my hype for 3D came back.
I wonder if we'll see more productions like this thanks to these cameras being more common.
Yeah it’s pretty remarkable that as soon as the bandwidth and storage requirements for video catch up, the industry quickly pushes past to the next level. I’m sure there’s a way that will be found to distribute the output, but we’re still talking veritable supercomputers needed to edit the raw files that this camera pumps out.
Watching video of 3d movie vs insta360 home made one in my meta quest 3, I wonder the definition of immerse experience. 3d is not 360. Not that immersive may I say. Obviously for cinema perhaps you have a limit to how wide the scene go as you need to direct the audience eye. And ears.
Hence there is a limit and that camera is very camera like. Anyway wait for a cheaper version so to use my cheaper meta quest 3. Both 3d and 360 had their role.
It does depend a lot on what you prefer to do with your free time in the first place. But if you happen to be killing time on YouTube or something anyway, watching a VR travel walk for example is actually really worthwhile.
I also find that because it is so immersive, I will consume less content because I am actually satisfied earlier than if I had been killing time on 2D content.
There are AR devices in the form of sunglasses now, but the displays and experience pale in comparison to VR headsets.
We're still a few more generations away from VR headsets shrinking and AR glasses improving for this experience to be universally enjoyable.
https://www.roadtovr.com/exclusive-lytro-reveals-immerge-2-0...
Recent research examples:
* https://vidu4d-dgs.github.io/
* https://fudan-zvg.github.io/4d-gaussian-splatting/
* https://guanjunwu.github.io/4dgs/
* https://aoliao12138.github.io/VideoRF/
[0] https://store.steampowered.com/app/771310/Welcome_to_Light_F...
From the same lab 2 years later, there's "Immersive Light Field Video with a Layered Mesh Representation" which is the same, but instead of photo it's video.
https://fudan-zvg.github.io/4d-gaussian-splatting/
But even with that, it would allow only limited movement because if you move to much in the reconstructed space then it would have to show you occluded areas that has been never recorded. But for that case generative AI could be used to fill in the the gaps, which means it wouldn't be true to life reconstruction anymore, but it could be still useful where that isn't that important.
Interocular, meaning distance between a pair of eyes/lenses
For example, if you wanted to shoot something from a cat's POV, you'd put the lenses closer together and shoot objects closer to the camera. That would make it easier to fuse images of a mousehole right in front of your face. Things farther away would simply look "less 3-D" and therefore abnormally far a way to a human viewer.
Instead I assume they're using the pair of lenses to approximate a 3d model which allows for display at a variety of angles (all reasonably close to where the camera is), and then rendering from that model for the person looking at the image based on the angle they are actually looking at it from. Which solves the interocular distance problem, because you render for each eye based on where the persons eye is resulting in a distance between the two display images that doesn't have to match the camera.
https://xkcd.com/941/
I wonder if we'll see more productions like this thanks to these cameras being more common.
But are we going to see better true 360° 3d cameras than the Insta360 Titan?
https://variety.com/2024/film/news/apple-vision-pro-original...
Current VR porn is typically 8K 60 fps at the max, which already creates HUGE files, like 500 MB/minute.
Hence there is a limit and that camera is very camera like. Anyway wait for a cheaper version so to use my cheaper meta quest 3. Both 3d and 360 had their role.