RealFill: Image completion using diffusion models

There's definitely value in providing this functionality for photographs taken in the present.

But I think the real value -- and this is definitely in Google's favor -- is providing this functionality for photos you have taken in the past.

I have probably 30K+ photos in Google Photos that capture moments from the past 15 years. There are quite a lot of them where I've taken multiple shots of the same scene in quick succession, and it would be fairly straightforward for Google to detect such groupings and apply the technique to produce synthesized pictures that are better than the originals. It already does something similar for photo collages and "best in a series of rapid shots." They surface without my having to do anything.

BoppreH · 2 years ago

That's exactly why I've been keeping all "duplicates" in my photo collections.

They do take up a lot of space, and just today I asked in photo.stackexchange for backup compression techniques that can exploit inter-image similarities: https://photo.stackexchange.com/questions/132609/backup-comp...

syntaxfree · 2 years ago

Suggestion: stack the images vertically or horizontally. Frequency spectrum compression schemes like JPG will see the similarity in the fine details.

bick_nyers · 2 years ago

Tiled/stacked approach as others mention is good, and probably the best approach. Could also try doing an uncompressed format (even just .png uncompressed) or something simple like RLE then 7zip them together since 7zip is the only archive format that does inter-file (as opposed to intra-file) compression as far as I am aware.

Unfortunately lossless video compression won't help here as it will compress frames individually for lossless.

randyrand · 2 years ago

most duplicates are from the same vantage point. these are not. i.e. you don't need to keep them all.

RockRobotRock · 2 years ago

Stupid question. Would a block based deduplicating file system solve this?

thesuavefactor · 2 years ago

Every picture is a picture from the past though

royaltheartist · 2 years ago

Oh yeah, what about this old Kodak I found in my grandpa's attic that prints pictures showing how people are going to die?

jawns · 2 years ago

Philosophically, yes. But some photo-editing techniques rely on data that is not backfillable and must be recorded at capture time. And even in cases where there is no functional impediment to applying it against historical photos, sometimes there is product gatekeeping to contend with.

parineum · 2 years ago

Here's a picture of me in the future.

ekianjo · 2 years ago

Not the pictures where you age people artificially

makapuf · 2 years ago

Every existing pictures are.

amelius · 2 years ago

Every state machine is bound to cycle at some point, even if it has the size of the universe.

fenomas · 2 years ago

> ..fairly straightforward for Google to detect such groupings and apply the technique to produce synthesized pictures that are better than the originals.

Wouldn't an operation like this require some kind of fine-tuning? Or do diffusion models have a way of using images as context, the way one would provide context to an LLM?

sangnoir · 2 years ago

I think simpler algorithms (e.g. image histograms) can get you a long way. Regardless of the mechanism, Google Photos already has the capability to detect similar images, which is used to generate animated gifs.

Me: Facebook AI, please post an entry about my vacation on Cape Cod and create a bunch of photos to go with it.

Facebook: Great. I'd be happy to. Any more detail you'd like to add?

Me: Make us look attractive. Show that we're a having a great time. Also, we went to see the Chatham Lighthouse.

Facebook: OK, done!

...

Facebook: You've received 48 likes. Your mother would like to know if you had any salt water taffy.

Me: Yes, and please create a picture of my oldest daughter having trouble chewing it.

Facebook: Done.

derefr · 2 years ago

When you think about it, the only thing that's weird about this hypothetical conversation is the context of it being about (purported) photographs.

We expect images that look like photographs — at least when taken by amateurs — to be the result of a documentary process, rather than an artistic one. They might be slightly filtered or airbrushed, but they won't be put together from whole cloth.

But amateur photography is actually the outlier, in the history of "capturing memories"!

If you imagine yourself before the invention of photography, describing your vacation to an illustrator you're commissioning to create a some woodblock-print artwork for a set of christmas cards you're having made up, the conversation you've laid out here is exactly how things would go. They'd ask you to recount what you saw, do a sketch, and then you'd give feedback and iterate together with them, to get a final visual down that reflects things the way you remember them, rather than the way they were, per se.

jprete · 2 years ago

This is an interesting point. Usually people claim technology goes inexorably forward, yet here we are, merrily destroying trust in the most objective method we have to record the past!

jayunit · 2 years ago

https://web.archive.org/web/20140222103103/http://subterrane...

ShakataGaNai · 2 years ago

Sounds like the plot line to an episode of Black Mirror, but also something that is far too likely to happen.

ormax3 · 2 years ago

https://petapixel.com/2022/12/14/man-fakes-an-entire-month-o...

simoneau · 2 years ago

me: Facebook AI, please post a tender moment between me and my father when I was a boy. Include some photos.

Facebook: I'd be happy to. Are there any more details you'd like to include?

me: Please show how he didn't understand me at first, but then he looks at me and starts crying with love and regret.

Facebook: Done. Your relationship with your father must have been deeply fulfilling.

y-curious · 2 years ago

Incredible. Man, am I going to be telling my grandkids about a time when you could believe your eyes and ears on the internet.

DaiPlusPlus · 2 years ago

What if we're already living in the future, and everything we're experiencing right-now is being AI generated?

...that, and other thoughts I have while baked.

seydor · 2 years ago

You guys are very unambitious.

FB AI, make a series of posts about me climbing mount everest, meeting dalai lama, curing cancer, bringing peace to ukraine, changing my name to Melon Tusk, announcing running for president and adopting a dog named Molly

toyg · 2 years ago

But see, that's the sort of thing that would give it away.

You got to shoot for something just attainable enough to sound credible, while still being at the "enviable" end of the spectrum.

"FB AI, make a series of pictures of my first 3 months at Goldman Sachs in 2021. Include me shaking hands with the VP of software as I receive a productivity award for making them $1m in a week. Include a group photo of me and 12 other people (all C execs and my VP must be there). Crosspost all to LinkedIn, with notifications muted."

"Ok done"

"ChatGPT, take my existing CV and replace entries from 2021 onwards with a job as Head of Performance Monitoring at Goldman Sachs, reporting to VP of software. Include several projects with direct CEO and CFO involvement. Crosspost changes to LinkedIn."

"Ok done"

... and now I can go job-hunting.

crazygringo · 2 years ago

Wow.

This actually feels like it could be an incredibly valuable post-production tool in film and TV, once they get it working consistently across multiple frames.

Not only for more flexibility in "uncropping" after shooting (there was a tree/wall in the way), but this could basically be the holy grail solution for converting 4:3 to widescreen without cutting off content on the top and bottom.

qingcharles · 2 years ago

I already use Photoshop Generative Fill for uncropping videos, but it only works for fixed camera shots. Photoshop just added feature where you can just drag the video file in and do the uncrop in one step.

The problem I'm solving is converting videos from widescreen to vertical and sometimes you need some extra height.

jiggawatts · 2 years ago

> widescreen to vertical

You’re a monster.

kennyadam · 2 years ago

Mind if I ask why you'd need to do that? It's a huge amount of if the frame being generated artificially, especially if you're talking cinema aspect ratio wide-screen.

waynenilsen · 2 years ago

removing the cameraman from the shot is probably pretty close to the top of the list also

Wistar · 2 years ago

… especially on highly reflective subject surfaces such as cars.

Gabrys1 · 2 years ago

Initially I read the above as removing the cameramen from the process of taking photos (which is also where this is going)

emodendroket · 2 years ago

I can see it working great for some stuff but wouldn't you ultimately face the issue with more artistic work that the framing might not be very good if just artificially extending.

It definitely needs to be applied judiciously on a shot-by-shot basis.

There have been quite a few 4:3-to-widescreen conversions that were done using the original film that was actually shot in widescreen and cropped for TV.

Sometimes, the wider shot makes perfect sense. Sometimes, they keep the original cropped one but cut off top/bottom. Sometimes it's a combination of the two. It all depends on what's being framed -- two people in a car usually benefits from cropping (nobody needs the bottom third of the frame occupied by the car's hood), while a close-up on someone's face usually benefits from extending the sides (otherwise it's an uncomfortable mega-close up that cuts off their mouth).

But having the flexibility to extend horizontally gives you the artistic possibilities.

miahwilde · 2 years ago

wow x2. You're right video is where this is really cool. Take enough video of a scene and you can then create most any photo from any angle in it.

Dead Comment

Jorge1o1 · 2 years ago

Wow. The use case that comes to mind for me is when you take a big family photo (or 20) and someone inevitably ends up cut-off by accident.

So then you just feed RealFill the 20 pictures you took and your uncle is magically painted in.

jetrink · 2 years ago

Also getting everyone smiling with their eyes open at the same time. Phone cameras could record a group photo for five or ten seconds and use the best expression from different times for each person.

patapong · 2 years ago

Or you take a single picture of a group in front of a monument, but cut it off. As I understand it you could find pictures of the monument online, run the model, and have a picture with the group and the entire monument.

Probably google can even do this automatically - I would not be surprised if I get suggestions to fix images with cut off buildings via Google Photos in the future! Would be so cool.

Pixel phones already have some features kind of like this so it makes sense.

lazycouchpotato · 2 years ago

I feel like this is already a thing with certain photo editing applications, if not built into phones themselves.

twism · 2 years ago

From the leaks this may be coming to the Pixel 8

xwdv · 2 years ago

You don’t even need to take the photo, with enough images of each family member and images of a tourist destination you can just automatically construct a photo of everyone together at the location, saving the costs and carbon footprint of getting everyone together.

cubefox · 2 years ago

And then why demand "photos" of family excursions at all, when it is just an AI imagining how things probably were happening at the time, or would have happened? We should just stick to our own imperfect memory.

TheJoeMan · 2 years ago

That reminds me of https://hackaday.com/2023/06/02/ai-camera-imagines-a-photo-o...

A box that takes your gps location, weather, etc and autogenerates a photo from your PoV.

deckar01 · 2 years ago

I have been working on a holographic camera, but the ultra-cheap pinhole cameras I chose for the array have two issues: the exposure can't be controlled and the lenses are poorly aligned. I can calibrate away most lens aberrations with OpenCV, but some of the outliers have so much cropping that I am discarding 75% of my good pixels to get a coherent result. I was considering using NeRFs to reproject the ideal camera angles, but COLMAP is not very tolerant of brightness fluctuations and NeRF training is relatively slow (considering my goal is video). This would be a nice solution to my problem, because I have a comprehensive set of angles to pull context from.

cryptoz · 2 years ago

So is the weather just hallucinated then? We're just making up memories and calling them real? And advertising this blatently, called rainy days sunny and sunny days rainy? My god I hate this so much.

Not even a discussion about if this might be harmful or what the risks are or anything, just plain old "THIS FAKE MOMENT WAS REAL AND YOU'LL BELIEVE IT"?!

I really have a hard time with this. Wow I'm upset, more than I expected. The tech is fine yeah but the marketing is just deeply upsetting.

dymk · 2 years ago

> We're just making up memories and calling them real?

This has always been the case, you just don't remember it, and the (human) hallucinated details are usually just not important enough to care about.

dwallin · 2 years ago

Seems like the real utility of this technique will be as a way to vastly improve the temporal stability with a variety of generative video techniques. For example, if you are trying to use a video as a base for a new generative video: Take the first frame of your video and run it through SD with the control net of your choice. Then take that initial image and run it through this process to generate a new base model and then use that to generate your second frame. Now you can use that second frame to feed back into your model and rinse and repeat, always using the past few frames to inform the latest.

goodmachine · 2 years ago

That makes sense. If I understand correctly, this 'loopback' technique is being used below as you describe. Alarming video, btw.

https://www.reddit.com/r/StableDiffusion/comments/16uqqrh/ho...

endisneigh · 2 years ago

an interesting use case for this once the compute is there is to basically allow for ai powered digital zoom-out. it could work by instructing the user to take several pictures around the target, and then you take regular pictures of your subject.

then, as you like, you can do an "ai zoom out" to get zoomed out pictures, no longer constrained by your lens or distance.

I imagine this to be included relatively soon, just like how panaromas were once a niche thing that became much easier to do with some good ui/ux. pretty much any modern phone can do them without having to struggle with lining up photos and what not.

one thing that does greatly concern me about the demo/site is that they have "authentic" and "recover" as terms. the result here is not authentic nor has anything been "recovered." it's an illusion at best. I personally don't like how they portray the new image as being equivalent as if the lens framed it in the original picture. it's not, as they show themselves in the later portion (near the end) with the text sign. seriously irresponsible framing (pun intended) to what's otherwise very cool tech.

richardw · 2 years ago

Nice idea. Might not need multiple pics given Google’s image dataset and ability to recognise what you’re looking at.

Give that a couple generations. “You were at location X and didn’t take a pic. We generated you some selfies, choose one that you like.”

cvwright · 2 years ago

If they wanted to, they could find real pictures of you, taken by other Google users who were there at the same time.

I don't know if that's more or less creepy than the AI stuff...

gs17 · 2 years ago

Agreed, the "Reference-Driven Generation" part is totally fine, but "authentic" is overselling it.

satvikpendem · 2 years ago

This already exists in Stable Diffusion and others, called outpainting.