Readit News logoReadit News
superfish · 2 months ago
"Unsplash > Gen3C > The fly video" is nightmare fuel. View at your own risk: https://apple.github.io/ml-sharp/video_selections/Unsplash/g...
uwela · 2 months ago
Goading companies into improving image and video generation by showing them how terrible they are is only going to make them go faster, and personally I’d like to enjoy the few moments I have left thinking that maybe something I watch is real.

It will evolve into people hooked into entertainment suits most of the day, where no one has actual relationships or does anything of consequence, like some sad mashup of Wall-E and Ready Player One.

If we’re lucky, some will want to meatspace with augmented reality.

Maybe we’ll get really nice holovisions, where we can chat with virtual celebrities.

Who needs that?

We’re already having to shoot up weight-loss drugs because we binge watch streaming all the time. We’ve all given up, assuming AI will do everything. What good will come from having better technology when technology is already doing harm?

camgunz · 2 months ago
It turns out the Great Filter is that any species with the technology to colonize space also has the technology to soma itself into annihilation.

https://en.wikipedia.org/wiki/Great_Filter

Traubenfuchs · 2 months ago
Early AI „everything turns into dog heads“ vibes. Beautiful.
drcongo · 2 months ago
I miss those. Anyone know if it's still possible to get the models etc. needed to generate them?
schneehertz · 2 months ago
san check, 1d10
ghurtado · 2 months ago
Seth Brundle has entered the chat.
Leptonmaniac · 2 months ago
Can someone ELI5 what this does? I read the abstract and tried to find differences in the provided examples, but I don't understand (and don't see) what the "photorealistic" part is.
emsign · 2 months ago
Imagine history documentaries where they take an old photo and free objects from the background and move them round giving the illusion of parallax movement. This software does that in less than a second, creating a 3D model that can be accurately moved (or the camera for that matter) in your video editor. It's not new, but this one is fast and "sharp".

Gaussian splashing is pretty awesome.

crazygringo · 2 months ago
Oh man. I never thought about how Ken Burns might use that.

Already you sometimes see where manually cut out a foreground person from the background and enlarge them a little bit and create a multi-layer 3D effect, but it's super-primitive and I find it gimmicky.

Bringing actual 3D to old photographs as the camera slowly pans or rotates slightly feels like it could be done really tastefully and well.

kurtis_reed · 2 months ago
What are free objects?
ares623 · 2 months ago
Takes a 2D image and allows you to simulate moving the angle of the camera with correct-ish parallax effect and proper subject isolation (seems to be able to handle multiple subjects in the same scene as well)

I guess this is what they use for the portrait mode effects.

derleyici · 2 months ago
It turns a single photo into a rough 3D scene so you can slightly move the camera and see new, realistic views. "Photorealistic" means it preserves real textures and lighting instead of a flat depth effect. Similar behavior can be seen with Apple's Spatial Scene feature in the Photos app: https://files.catbox.moe/93w7rw.mov
carabiner · 2 months ago
Black Mirror episode portraying what this could do: https://youtu.be/XJIq_Dy--VA?t=14. If Apple ran SHARP on this photo and compared it to the show, that would be incredible.

Or if you prefer Blade Runner: https://youtu.be/qHepKd38pr0?t=107

diimdeep · 2 months ago
One more example from Star Trek Into Darkness https://youtu.be/p7Y4nXTANRQ?t=61
zipy124 · 2 months ago
Basically depth estimation to split the scene into various planes, and then inpainting to work out the areas in the obscured parts of the planes, and then the free movement of them to allow for parallax. Think of 2D side scrolling games that have various different background depths to give illusion of motion and depth.
eloisius · 2 months ago
From a single picture it infers a hidden 3D representation, from which you can produce photorealistic images from slightly different vantage points (novel views).
avaer · 2 months ago
There's nothing "hidden" about the 3d represenation. It's a point cloud (in meters) with colors, and a guess at the the "camera" that produced it.

(I am oversimplifying).

skygazer · 2 months ago
Apple does something similar right now in their photos app, generating spatial views from 2d photos, where parallax is visible by moving your phone. This paper’s technique seems to produce them faster. They also use this same tech in their Vision Pro headset to generate unique views per eye, likewise on spatialized images from Photos.
p-e-w · 2 months ago
Agreed, this is a terrible presentation. The paper abstract is bordering on word salad, the demo images are meaningless and don’t show any clear difference to the previous SotA, the introduction talks about “nearby” views while the images appear to show zooming in, etc.
avaer · 2 months ago
It makes your picture 3D. The "photorealistic" part is "it's better than these other ways".
rcarmo · 2 months ago
Well, I got _something_ to work on Apple Silicon:

https://github.com/rcarmo/ml-sharp (has a little demo GIF)

I am looking at ways to approximate Gaussian splats without having to reinvent the wheel, but I'm a bit over my depth since I haven't been playing a lot of attention to those in general.

esperent · 2 months ago
I'm quite delighted that the gif banding artefacts make it look life the photi of a fire is flickering, and also highly impressed that the AI was able to recognize the fire as a photo within a photo and keep it in 2d.
7moritz7 · 2 months ago
The example doesn't look particularly impressive to say the least. Look at the bottom 20 %
rcarmo · 2 months ago
I just refactored the rendering and resampling approach. Took me a few tries to figure out how to remove the banding masks from the layers, but with more stacked layers and a bit of GPT-foo to figure out the API it sort of works now (updated the GIF)

Keep in mind that this is not Gaussian splat rendering but just a hacked approximation--on my NVIDIA machine that looks way smoother.

supermatt · 2 months ago
I note the lack of human portraits in the example cases.

My experience with all these solutions to date (including whatever apple are currently using) is that when viewed stereoscopically the people end up looking like 2d cutouts against the background.

I haven't seen this particular model in use stereoscopically so I can't comment as to its effectiveness, but the lack of a human face in the example set is likely a bit of a tell.

Granted they do call it "Monocular View Synthesis", but i'm unclear as to what its accuracy or real-world use would be if you cant combine 2 views to form a convincing stereo pair.

sorenjan · 2 months ago
They're using their Depth Pro model for depth estimation, and that seems to do faces really well.

https://github.com/apple/ml-depth-pro

https://learnopencv.com/depth-pro-monocular-metric-depth/

supermatt · 2 months ago
Im not sure how the depth estimation alone translates into the view synthesis, but the current implementation on-device is definitely not convincing for literally any portrait photographs I have seen.

True stereoscopic captures are convincing statically, but don't provide the parallax.

moondev · 2 months ago
diimdeep · 2 months ago
No, model works without CUDA then you have .ply that you can drop into gaussian splatter viewer like https://sparkjs.dev/examples/#editor

CUDA is needed to render side scrolling video, but there is many ways to do other things with result.

delis-thumbs-7e · 2 months ago
Interestingly Apple’s own models don’t work on MPS. Well, I guess you just have to wait for few years..
rcarmo · 2 months ago
matthewmacleod · 2 months ago
This is specifically only for video rendering. The model itself works across GPU, CPU, and MPS.
gs17 · 2 months ago
The gaussian splat output can be generated with CPU (this was honestly one of the easiest AI repos to get running).
avaer · 2 months ago
Is there a link with some sample gaussian splat files coming from this model? I couldn't find it.

Without that that it's hard to tell how cherry-picked the NVS video samples are.

EDIT: I did it myself, if anyone wants to check out the result (caveat, n=1): https://github.com/avaer/ml-sharp-example

yodon · 2 months ago
> photorealistic 3D representation from a single photograph in less than a second
derleyici · 2 months ago
Apple's Spatial Scene in the Photos app shows similar behavior, turning a single photo into a small 3D scene that you can view by tilting the phone. Demo here: https://files.catbox.moe/93w7rw.mov
Traubenfuchs · 2 months ago
It‘s awful and often creates a blurry mess in the imaginated space behind the object.

Photoshop content aware fill could do equally or better many years ago.