Holy sh*t, can you imagine a year from now if they start using something like this for concerts or basketball games? Like imagine rewatching a basketball game but being able to move the camera on the court???? Might not be possible yet but this shows the techs possible. Let alone someday being able to scale it to realtime someday maybe lol
A thought experiment I like to employ when imagining the impact of a new piece of tech: reverse the timeline. If the tech were the status quo, how would the current status quo be marketed?
If moving the camera were the norm, the current status quo would probably be marketed along the lines of "No more micro-managing camera views, arguing over playback speed or fiddling with the timeline – this next leap in technology introduces pre-edited video, where each game is preprocessed by a team of highly-skilled, professional producers, selecting the best viewing angle and playback speed so you and your friends can just sit back, relax and enjoy the game."
If the fictional press release sounds good enough, the tech probably won't hit.
I remember 30 years ago during switching to digital TV broadcasting that proponents of the tech tried to sell a future where viewers of sports events would be able to select which camera to watch. Then again, imagine watching a game with 10 friends and trying to agree over cameras...
Great point. Are you familiar with McLuhan's media tetrad?
The hypothetical status quo is a wonderful example of retrieval. Every new technology should be expected to have the effect of reemphasizing something previously obsolete.
I think for routine game-watching you're 100% right. But for replaying amazing/interesting/controversial plays, this tech would an enormous improvement over being captive to the broadcast team. Everyone would love to be able to grab control and fly around and zoom in on that one power dunk, critical fumble, bad foul call, etc. on demand.
You want a killer app for VR/AR goggle style things? You’re right this would be amazing.
Apple demoed some kind of volumetric video to the press with the Vision Pro. There was a short clip of a concert and an NBA game (Nuggets?) among other things. I heard a number of people said it was like being there.
This is a step past that. Apple’s was recorded from some kind of special camera rig (I assume), but I seriously doubt it was full volumetric video from a large number of angles. It sounded more like volumetric video if you were stuck in a (very good) seat in the venue.
I’d be curious to know just how much horsepower it takes to play these back.
I thought they recorded their special videos using the Vision Pro itself, which has enough sensors to build depth maps of the scene and provide novel views within a small range from the original position.
But I am half speculating and I don't really remember. That's just the impression I remember having.
From the paper, it seems a 3060 is enough for 60FPS for the DNA-Rendering dataset. On the full-screen datasets it manages 25 fps. A 4090 might be needed to stay above 60 fps.
Still pretty heavy I’d say but it certainly came a long way and shows us real volumetric video is doable.
I started working on something like that a couple decades ago. I figured, with all of the camera feeds at a football game, there would be plenty of views generate 3d models, even with relatively naive approaches. Then the NFL did it shortly after (2008) [1], and it didn't catch on.
This is what they use in the latest FIFA / FC games, they call it Volumetric Data Capture, basically using video footage from real sports events to capture model and animation data for the players, allowing their unique mannerisms and movements to translate to the game. In previous iterations they would have football players in motion capture suits, but they wouldn't get all the players, plus if they did it would be in stifled, studio conditions, not their natural environment.
Anyway, not quite the same as turning a match into 3D, but definitely related.
We did this (as a side effect) for the premier league ~2009-2012 (liquidated JUST before VR appeared, where the content worked fantastically, and then ~2014 with the moverio glasses, even better in AR)
We did live player tracking (~33 cameras) on-site at every game, and for fun rendered players fifa-style free-camera. We even did some renders (capture of realtime engine) for canal+ highlights as an experiment.
edit: my own gpgpu-only,(frag shaders :), sub-100ms, uncalibrated-cameras (footage directly from sky/match of the day) r&d a few years later, also works really well on a LookingGlass https://twitter.com/HoloSports/status/1327375694884646913
(I took this to sky sports but they said it was a bit too in-the-future)
Actually a company has been working on this for a few years now, and I believe they are currently in production. Their focus is football/soccer I believe. I was going to do a research internship at them before I dropped it for a different one. Here it is:
https://www.beyondsports.nl/
Looking at it, they heavily focus on tracking the movements of players now to replay in AR
This is what we were building at my previous start-up, though we had a focus on outdoor sports. We had built a 3D virtual world and used GPS tracks to follow athletes (ultra-marathons, paragliders), etc.
We theorized that 2 go-pro cameras on the athlete would let us completely re-create the entire scene from all angles, and inform an AI of how to re-paint our virtual world with real-world weather environments etc.
Unfortunately, 5 years ago, everyone said I was crazy to think any of this was possible.
There is a video capture of our 3D scenes from 2017 on our old website (we were a full 3d world, not video) https://ayvri.com - the tech was acquired just over a year ago.
They don't even stream the NBA games in 4k because TV networks only support 1080p. I doubt they'd buy into such an expensive technology for such a niche audience.
It will be very interesting to watch how tech like this affects mainstream society.
I imagine pornography will use it at some point soon. Maybe something like chaturbate where your interactions with the cam performer are more customized?
Could it be used with CCTV to reconstruct crime scenes or accidents?
Wedding videos might be a popular use, being able to watch from new angles could be a killer app.
Or a reworking of the first Avengers movie, view all the action from multiple viewpoints.
And all this will probably be built in to the pixel 18 pro or something.
"Light Field" photography has existed for a few years now, yet there is still no porn using it that I am aware of.
I tried a demo a while back that was very impressive, despite being relatively low-res stock footage. Simply being able to move your head a few inches in any direction without taking the world with you is a much better experience than contemporary VR video.
This seems unprecedented. Imagine if you have this but you can update the scene programmatically. Ask your AI to change the location or actors. Now you have a very convincing artificial scene with anything you can imagine in it.
I imagine this would be helpful when making movies if you could basically play around with the scenes without having to refilm it several times to get the best one.
When it comes to perspective and the like, they already do this; multiple camera angles, CGI, and the odd reshoot. Like having Henry Cavill come back for a reshoot, then CGIing out the mustache he had for his next role.
Between this and LLMs, we're half-way to building a holodeck. What's missing at this point is just hard light - i.e. the being able to feel the physical substance of simulated objects, and being able to experience it all without a wearable/personal device.
I suppose you could build up what would essentially be a 4D sprite sheet or animation set of a character and use that to support natural looking arbitrary movement. I'm not sure that isn't just a mo-cap character with extra steps, though.
Even the most skilled animators with years of budget still can’t escape the uncanny valley which is why CG animation has converged on a style of blob-humans as the current standard.
I have very little hope of AI driven animation looking ok in the next many decades. Don’t underestimate how hardwired your senses are at finding artifacts in movement. Static images are much easier to “fake”.
> we precompute the physical properties on the point clouds for real-time rendering. Although large in size (30 GiB for 0013_01), these precom-
puted caches only reside in the main memory and are not explicitly stored on disk,
Does the cache size scale linearly with the length of the video? 0013_01 is only 150 frames. And how long does the cache take to generate?
Looks so, suspect the authors precomputed everything they can to reach the highest frame rate. Like predecoding all frames in a movie into raw pixels?
I think volumetric video should be thought of as a regular video, where the decoding and playback happen at the same time. A few papers down the line this could be easily implemented.
How many cameras does this method require? As far as I can tell from the paper it still generates from multi-view source data. I can't say for sure but it seems like a large number from what I can parse as a layman.
So, something we could easily see done at say... an NBA event or football field... hell I imagine i can think of some ... adult use cases that would probably make a bundle off of tech like this if it can be optimized down... as my favorite youtuber would say ... WHAT A TIME TO BE ALIVE!
Very cool renderings, but ironically my browser is having a heck of a time rendering their website. The short videos keep jumping around, starting and stopping randomly... which i guess is very VR.
Add volumetric sound, integrate VR and you almost have recreated braindance from the Cyberpunk 2077 game. Doesn't seem that far off in the distance.
The missing component from complete braindance would be integrating physical senses. AFAIK we are pretty far away from having anything revolutionary in that domain. Would love to be proven wrong, however.
If I’m understanding the paper correctly then the four dimensions are the position, density, radius, and color of the spheres in their volumetric model. So for any given viewing position and point in time, their model produces a 4D scene that is then rasterized to 2D.
don't forget to add the 3 color dimensions. (this may seem pendantic, but when doing feature-extraction, these extra dimensions really are significant)
So's a video game, and we call that "real-time 3D". Time is mentioned, but it isn't counted again as a dimension, perhaps because any given momentary view is a time slice, not a time range like it is an XYZ range.
It's not a 3D model that is animated using a skeleton and keyframes like traditional 3D. It's many consecutive 3D models that create the illusion of continuous motion (aka video).
4D is the name that has come to describe the jump from static 3D models (photogrammetry) to 3D "video" models.
Time is the forth dimension. The input data is a video, so the model learns the colors and the position of the elements (basically points). You can rende the scene from any angle at any time once the models is trained
Downvoted at the time I see it, but actually correct. It's based on K-planes https://arxiv.org/pdf/2301.10241.pdf which effectively splits each space-time relationship off from the spatial relationship. It's just mathematics, guys. The original NeRF paper talked about a 5D coordinate. You know like a k-dimensional vector?
Yea it's probably to have a catchy name and get some attention. Although it's technically accurate to call it 4D since it includes time, I think 3D video recording would probably get the point across to more people in a less sensationalist way.
Is it technically accurate? Seems like its actually 6Dof view angles + time. The paper mentions 4D view, 4D point cloud, dynamic 3D scene and 4D feature grid.
If moving the camera were the norm, the current status quo would probably be marketed along the lines of "No more micro-managing camera views, arguing over playback speed or fiddling with the timeline – this next leap in technology introduces pre-edited video, where each game is preprocessed by a team of highly-skilled, professional producers, selecting the best viewing angle and playback speed so you and your friends can just sit back, relax and enjoy the game."
If the fictional press release sounds good enough, the tech probably won't hit.
I remember 30 years ago during switching to digital TV broadcasting that proponents of the tech tried to sell a future where viewers of sports events would be able to select which camera to watch. Then again, imagine watching a game with 10 friends and trying to agree over cameras...
The hypothetical status quo is a wonderful example of retrieval. Every new technology should be expected to have the effect of reemphasizing something previously obsolete.
Wait, you have 10 friends?
Apple demoed some kind of volumetric video to the press with the Vision Pro. There was a short clip of a concert and an NBA game (Nuggets?) among other things. I heard a number of people said it was like being there.
This is a step past that. Apple’s was recorded from some kind of special camera rig (I assume), but I seriously doubt it was full volumetric video from a large number of angles. It sounded more like volumetric video if you were stuck in a (very good) seat in the venue.
I’d be curious to know just how much horsepower it takes to play these back.
But I am half speculating and I don't really remember. That's just the impression I remember having.
Still pretty heavy I’d say but it certainly came a long way and shows us real volumetric video is doable.
It's definitely the future!
[1] 2008, https://www.cnet.com/tech/services-and-software/nfl-demos-li...
[2] Some renewed interest, in 2016: https://bleacherreport.com/articles/2659861-future-of-the-nf...
Anyway, not quite the same as turning a match into 3D, but definitely related.
https://gamermatters.com/ea-sports-fc-24-makes-use-of-volume...
We did live player tracking (~33 cameras) on-site at every game, and for fun rendered players fifa-style free-camera. We even did some renders (capture of realtime engine) for canal+ highlights as an experiment.
edit: my own gpgpu-only,(frag shaders :), sub-100ms, uncalibrated-cameras (footage directly from sky/match of the day) r&d a few years later, also works really well on a LookingGlass https://twitter.com/HoloSports/status/1327375694884646913 (I took this to sky sports but they said it was a bit too in-the-future)
Looking at it, they heavily focus on tracking the movements of players now to replay in AR
We theorized that 2 go-pro cameras on the athlete would let us completely re-create the entire scene from all angles, and inform an AI of how to re-paint our virtual world with real-world weather environments etc.
Unfortunately, 5 years ago, everyone said I was crazy to think any of this was possible.
There is a video capture of our 3D scenes from 2017 on our old website (we were a full 3d world, not video) https://ayvri.com - the tech was acquired just over a year ago.
I imagine pornography will use it at some point soon. Maybe something like chaturbate where your interactions with the cam performer are more customized?
Could it be used with CCTV to reconstruct crime scenes or accidents?
Wedding videos might be a popular use, being able to watch from new angles could be a killer app.
Or a reworking of the first Avengers movie, view all the action from multiple viewpoints.
And all this will probably be built in to the pixel 18 pro or something.
Here is Jack Black demoing similar NSA technology in 1998 https://www.youtube.com/watch?v=3EwZQddc3kY
Also the Star Trek Next Generation forensics episode with the holodeck…
I tried a demo a while back that was very impressive, despite being relatively low-res stock footage. Simply being able to move your head a few inches in any direction without taking the world with you is a much better experience than contemporary VR video.
Yeah, "just". Even though we have no idea how to even approach that.
I have very little hope of AI driven animation looking ok in the next many decades. Don’t underestimate how hardwired your senses are at finding artifacts in movement. Static images are much easier to “fake”.
Does the cache size scale linearly with the length of the video? 0013_01 is only 150 frames. And how long does the cache take to generate?
I think volumetric video should be thought of as a regular video, where the decoding and playback happen at the same time. A few papers down the line this could be easily implemented.
How many cameras does this method require? As far as I can tell from the paper it still generates from multi-view source data. I can't say for sure but it seems like a large number from what I can parse as a layman.
The missing component from complete braindance would be integrating physical senses. AFAIK we are pretty far away from having anything revolutionary in that domain. Would love to be proven wrong, however.
A 2d volume (similar to an image) has pixels stored in 2d coordinates
A 3d volume has points stored in 3d coordinates. Imagine an image for every vertical slice of a brain scan.
A 4d volume has points stored in 4d coordinates, where the newest dimension is time. Imagine a 3d volumetric capture for each frame in time.
It's not a 3D model that is animated using a skeleton and keyframes like traditional 3D. It's many consecutive 3D models that create the illusion of continuous motion (aka video).
4D is the name that has come to describe the jump from static 3D models (photogrammetry) to 3D "video" models.