Readit News logoReadit News
minimaxir · 5 years ago
The first two demo videos are interesting examples of using StyleCLIP's global directions to guide an image toward a "smiling face" as noted in that paper with smooth interpolation: https://github.com/orpatashnik/StyleCLIP

I had ran a few chaotic experiments with StyleCLIP a few months ago which would work very well with smooth interpolation: https://minimaxir.com/2021/04/styleclip/

Chilinot · 5 years ago
That first picture of mark zuckerberg smiling is just straight up cursed. Interesting write up though.
Doxin · 5 years ago
I audibly went "GAH!" when that scrolled into view. Impressive work.
Lichtso · 5 years ago
The previous approaches learned screen-space-textures for different features and a feature mask to compose them.

Now it seems to actually learn the topology lines of the human face [0], as 3D artists would learn them [1] when they study anatomy. It also uses quad grids and even places the edge loops and poles in similar places.

[0] https://nvlabs-fi-cdn.nvidia.com/_web/alias-free-gan/img/ali... [1] https://i.pinimg.com/originals/6b/9a/0c/6b9a0c2d108b2be75bf7...

eru · 5 years ago
Yes. It's interesting that imposing what are essentially 2d invariance constraints leads the network to learn what we regard as 3D concepts.
pvillano · 5 years ago
There are some interesting 2d things our eyes do for 3d. If something is on the ground, half is above the horizon and half is below. Parallax is a 2d phenomenon.
goldemerald · 5 years ago
After styleGAN-2 came out, I couldn't image what improvements could be made over it. This work is truly impressive.

The comparisons are illuminative: StyleGAN2's mapping of texture to specific pixel location looks very similar to poorly implemented video-game textures. Perhaps future GAN improvements could come from tricks used in non-AI graphic development.

tyingq · 5 years ago
>I couldn't image what improvements could be made over it

Still has the telltale of mismatched ears and/or earrings. This seems the most reliable way to recognize them. Well, and the nondescript background.

sbierwagen · 5 years ago
Teeth too. Partially covered objects in 3D space have been hard for a GAN to figure out. (See also hands)

I wonder what dataset you could even use to tell a GAN about human internals. 3D renders of a skull with various layers removed?

mzs · 5 years ago
Mismatched reflections across eyes is the dead give-away for me.
isoprophlex · 5 years ago
If ReLU-introduced high frequency components are indeed the culprit, won't using "softened" ReLU (without discontinuity in the derivative at 0) everywhere solve the problem, too?
Imnimo · 5 years ago
I wonder if you could make the noise inputs work again by using the same process as for the latent code - generate the noise in the frequency domain, and apply the same shift and careful downsampling. If you apply the same shift to the noise as to the latent code, then maybe the whole thing will still be equivariant? In other words, it seems like the problem with the per-pixel noise inputs is that they stay stationary while the latent is shifted, so just shift them also!
evo · 5 years ago
I wonder if there are learnings from this that could be transposed into the 1-D domain for audio; as far as I know, aliasing is a frequent challenge when using deep learning methods for audio (e.g. simulating non-linear circuits for guitar amps).
fogof · 5 years ago
You can see what they're saying about the fixed in place features with the beards in the first video, but StyleGAN gets the teeth symmetry right whereas this work seems to have trouble with it. Why don't the teeth in the StyleGAN slide around like the beard does?
minimaxir · 5 years ago
That's likely the GANSpace/SeFa part of the manipulation.

> In a further test we created two example cinemagraphs that mimic small-scale head movement and facial animation in FFHQ. The geometric head motion was generated as a random latent space walk along hand-picked directions from GANSpace [24] and SeFa [50]. The changes in expression were realized by applying the “global directions” method of StyleCLIP [45], using the prompts “angry face”, “laughing face”, “kissing face”, “sad face”, “singing face”, and “surprised face”. The differences between StyleGAN2 and Alias-Free GAN are again very prominent, with the former displaying jarring sticking of facial hair and skin texture, even under subtle movements

Geee · 5 years ago
In video 9 teeth are sliding.
jerf · 5 years ago
That's starting to be high enough quality that you could start considering using that for some Hollywood-grade special effects. That beach morph stuff is pretty impressive. Faces, perhaps not quite there yet because we are so hyper-focused on those biologically, but you could make one heck of a drug trip scene or a Doctor Strange-esque scene with much less effort with some of those techniques, effort perhaps even getting down to the range of Youtuber videos in the near enough future.
eru · 5 years ago
jerf · 5 years ago
First, that's not the same technique and it's not being used for the same purpose.

Second, Hollywood doesn't care about that problem. They will take the best application of the technique, and they don't care if they have to apply a few manual touchups on the result. As long as there is one way of using the system to do the sort of thing they showed in the sample, it won't matter to them that they can't embed a full video game into the neural network itself. They only care about the happy path of the tech.

Someone's probably already starting the company now to use this in special effects, or putting someone on research in an existing company.