Off topic: Always define your abbreviations. To find out what CNN stands for here, you either have to read a comment thread on HN, or go to the paper and read the introduction. The linked page doesn't even mention neural networks. And as some other commenter here has mentioned, CNN has other more well known meanings than Convolutional Neural Networks.
It was drilled into us in university (engineering) that you spell out abbreviations and acronyms on first use, no matter how well known you think it is.
Some cases I've seen lately seem to forgo this not out of ignorance but as a form of eletism/knowledge gate keeping.
> Some cases I've seen lately seem to forgo this not out of ignorance but as a form of elitism/knowledge gate keeping.
It's a natural tendency for ingroups. Nearly any video game forum, or anything else that's full of hobbyists will ultimately contain posts that are absolutely full of acronyms. And they're impenetrable. Bear in mind, I'm not defending this behavior, and certainly not disagreeing with you.
May be this is specific to I.T or Computer Science? Where there are thousands of abbreviations and acronyms which itself is often the name people use. SQL, DRAM, CPU, HTTP, SRAM, FPGA, URL, TCP/IP, UDP, NAT, DHCP, GPL, etc.
I mean if you are discussing technicals of Neural network you expect your audience to at least know CPU, GPU, and FPGA. And if you are discussing software development I hope I dont have to spell out GPL.
So I dont think it is a form of eletism/knowledge gate keeping. In the age of internet you can search those "acronyms" meant without the full name, which isn't something could be easily done 15 to 20 years ago.
In other industry such as Mobile Wireless Networking, those acronyms are often clearly spell out because there are comparatively little of it. FDD, TDD, MIMO, NR or LTE are often spelt out in full when they first use.
The same could be said for my biology education. But it never seems to stop those biologists publishing papers from invocing obscure acroynms and phrases without definition.
Genuinely thought this was some reference to something on the topic of 'fake news' Abbreviations are great if you're using the term multiple time. Not as an intro.
I also thought it was a reference to a manner of distinguishing genuine Cable News Network (CNN) screenshots from doctored ones divulged by “fake news” outlets.
This is a paper that was published in CVPR (Conference on Computer Vision and Pattern Recognition). In that context it is unambiguous that CNN means Convolutional Neural Networks.
Not really, I just thought that CNN (the news network) uses generated images in their articles. The topic about recognizing them, or even generating them, would make sense in CVPR.
If that were their justification, they wouldn't need to define the acronym in the paper. However, in the introduction section of the actual paper they do define CNN. But they use the acronym 9 times before defining it, which is what's kinda weird.
And the website isn't published in the CVPR, it's published on the internet.
I found CNN a bit confusing, even though I did guess it meant Convolutional Neural Net.
Perhaps my pre-caffeine morning brain is overly pedantic but Generative Nets use deconvolutions to generate images from latent codes, so using CNN rather than GAN (Generative Adversarial Network) is a bit confusing in this context.
CNNs are used by VAEs (Variational AutoEncoders, also generative) use convolutions to produce the latent codes and the discriminator (adversarial) part of GAN training uses convolutions.
I think Generative Networks ( or GNNs ;-) ) would perhaps have been clearer.
The character limit is 80. Many abbreviations on the front-page are almost certainly caused by the low character limit making it difficult to express concepts that don't have a singular word for them.
I think it depends upon the audience and the work being written. Anything on the level of a news article definitely should spell it out, but a forum post about some game can get away with the common acronyms used by the community. It is part of knowing your audience.
Yes, with an abbreviation like CNN it is remarkably presumptuous to not define it in this article. I followed the headline specifically because it was in the title and I assumed it referred to the news network.
it’s a scientific paper from a computer vision conference, it would be absurd in that context to assume anyone reading it doesn’t know that it stands for convolutional neural network. they didn’t write this with Hacker News in mind.
Also, the convolution part is only a speedup thing. You can do very similar neural network operations without the convolution, except that everything will be much slower and you'd need a lot more memory.
It's not really "only a speedup thing" because the training process is different: as a CNN learns to (say) recognize dog-noses in the top left portion of the image, it's simultaneously learning to recognize dog-noses everywhere else too. A fully-connected MLP with the same layer structure doesn't have that property.
It's true that once you've trained your CNN you could make a non-convolutional NN that computes exactly the same things but less efficiently, but the point of an NN is not just what it can compute -- there are lots of systems that can, given enough parameters, approximate arbitrary functions well -- but how you train it.
I wonder if these results hold when the CNN-generated images are converted to an analog medium and back to digital (say scanning a printout or taking a screencap).
If not, this might indicate that the fingerprints or artifacts left by the generators are not of the "perceptible" variety.
Also a discriminator trained from this experiment might be useful to train a more powerful generator.
(1) We show that when the correct steps
are taken, classifiers are indeed robust to common operations
such as JPEG compression, blurring, and resizing.
(2) When using Photoshop like methods the detector performs at chance (is useless).
Also, if the images are not recognizable as fakes by humans, then it's good enough. What would be interesting in going further than that? I actually see it as a feature if at the same time it's possible to prove when images are fake.
Interesting. They train an image classifier to detect images that were generated by a GAN-trained CNN. I wonder if it could be possible to include this classifier in the training loss, such that the generated images fly under its radar as much as possible. If this makes sense, then I guess the cat-and-mouse game just gained another level. On the other hand, what the classifier is detecting could be a fingerprint of the CNN architecture itself.
(Full disclosure: I have only read the abstract so far.)
> Due to the difficulties in achieving Nash equilibria, none of the current GAN-based architectures are optimized to convergence, i.e. the generator never wins against the discriminator.
If I understand the terms used, it sounds like you're suggesting adding this classifier to the discriminator, to avoid detection. Since they are already failing to pass their existing discriminators, it seems like they could try to not be detected, but they wouldn't actually succeed.
Absolutely possible, might even be a good idea, but my expectation is that the results won't be robust: the fakes will be uncovered by a slightly differently trained classifier. Maybe even the same classifier with a different random initialization.
Sounds like overfitting the defense against classification. Would existing solutions to overfitting possibly fix this (though make such a network even more expensive to train)?
I have read the paper and there are plenty of useful references and points: related work, the 11 CNN based image generators models, and the discussion part.
But sadly I could not obtain a clear picture of what is the difference between their detector and a baseline one. There are some minor points and references about upsampling, downsampling, resizing, cropping and fourier spectra comparison across generators, but those seems to be just comments and comparison and not crucial points in the construction of the detector. Furthermore data augmentation doesn't play a big role, they say that it usually improves (a little) the detector.
As a math person I like to get some more meat from papers, but here it seems that little tricks allow then to win the game. Perhaps that is the way (little or no math involved) to make advances. Well, at least they say that shallow methods modify the fingerprint of the fourier spectra so that now you can't detect which is the generator of the image.
Perhaps the "universal word" was what captured my attention.
If this "universal detector" is now used as a discriminator and the original models are fine-tuned/re-trained then it will stop being a universal detector no?
As long as the underlying CNN math stays the same it would not matter, Uber worked on a coord2conv to make better image outputs from CNNs without the artefacts this method capitalises on.
I’ve seen a number of attempts to identify deepfakes and other forms of manipulated images using AI. This seems like a fool’s errand since it becomes a never ending adversarial AI arms race.
Instead, I haven’t seen a proposal for a system I think could work well. Camera and phone manufacturers could have their devices cryptographically sign each photo or video taken. And that’s it. From that starting place, you can build a system on top of it to verify that the image on the site you’re reading is authentic. What am I missing that makes this an invalid approach?
I do understand that this would require manufacturers to implement, but it seems achievable to get them onboard. I even think you get one company like Apple to do this and it’s enough traction for the rest of the industry to have to follow suit.
Does it matter that they are easy to spot when the damage they can do would be well underway before a trusted service invalidates the image?
I am coming at this from the angle of, who would use this type of service other than the courts? Certainly major news organizations could benefit but we have numerous recent examples where they have either run with CNN imagery but they have also purposefully run video and use images of similar events to portray the view they wanted for a current event.
Of course in the end, if the end game is to have news, image, and video, validation there will need to be more than one and in separate enough areas of the world to have some chance all would not be intimidated / infiltrated to the point they are not trust worthy
Some cases I've seen lately seem to forgo this not out of ignorance but as a form of eletism/knowledge gate keeping.
It's a natural tendency for ingroups. Nearly any video game forum, or anything else that's full of hobbyists will ultimately contain posts that are absolutely full of acronyms. And they're impenetrable. Bear in mind, I'm not defending this behavior, and certainly not disagreeing with you.
May be this is specific to I.T or Computer Science? Where there are thousands of abbreviations and acronyms which itself is often the name people use. SQL, DRAM, CPU, HTTP, SRAM, FPGA, URL, TCP/IP, UDP, NAT, DHCP, GPL, etc.
I mean if you are discussing technicals of Neural network you expect your audience to at least know CPU, GPU, and FPGA. And if you are discussing software development I hope I dont have to spell out GPL.
So I dont think it is a form of eletism/knowledge gate keeping. In the age of internet you can search those "acronyms" meant without the full name, which isn't something could be easily done 15 to 20 years ago.
In other industry such as Mobile Wireless Networking, those acronyms are often clearly spell out because there are comparatively little of it. FDD, TDD, MIMO, NR or LTE are often spelt out in full when they first use.
For example: images generated by convolutional neural networks (CNN) are easy to identify.
UK reports rampant student marijuana use before class
That headline has quite a different meaning if “UK” is abbreviating “United Kingdom” versus “University of Kentucky”.
And the website isn't published in the CVPR, it's published on the internet.
Perhaps my pre-caffeine morning brain is overly pedantic but Generative Nets use deconvolutions to generate images from latent codes, so using CNN rather than GAN (Generative Adversarial Network) is a bit confusing in this context.
CNNs are used by VAEs (Variational AutoEncoders, also generative) use convolutions to produce the latent codes and the discriminator (adversarial) part of GAN training uses convolutions.
I think Generative Networks ( or GNNs ;-) ) would perhaps have been clearer.
Did they really hoped for that their paper will remain in a specific group of experts? I seriously doubt so.
Edit: Although, I see it does in first use in the introduction, so maybe that's just conforming to whoever's style guide.
"However, these methods represent only two instances of a broader set of techniques: image synthesis via convolutional neural networks (CNNs)."
[1] https://arxiv.org/pdf/1912.11035.pdf
Would this be receiving as much attention if they had used "Convolutional Neural Networks" instead of just CNN?
It's true that once you've trained your CNN you could make a non-convolutional NN that computes exactly the same things but less efficiently, but the point of an NN is not just what it can compute -- there are lots of systems that can, given enough parameters, approximate arbitrary functions well -- but how you train it.
If not, this might indicate that the fingerprints or artifacts left by the generators are not of the "perceptible" variety.
Also a discriminator trained from this experiment might be useful to train a more powerful generator.
(1) We show that when the correct steps are taken, classifiers are indeed robust to common operations such as JPEG compression, blurring, and resizing.
(2) When using Photoshop like methods the detector performs at chance (is useless).
Also, if the images are not recognizable as fakes by humans, then it's good enough. What would be interesting in going further than that? I actually see it as a feature if at the same time it's possible to prove when images are fake.
(Full disclosure: I have only read the abstract so far.)
If I understand the terms used, it sounds like you're suggesting adding this classifier to the discriminator, to avoid detection. Since they are already failing to pass their existing discriminators, it seems like they could try to not be detected, but they wouldn't actually succeed.
Could the classifier that they're using here be used as a discriminator in a GAN, to help train it to avoid this detection method?
But sadly I could not obtain a clear picture of what is the difference between their detector and a baseline one. There are some minor points and references about upsampling, downsampling, resizing, cropping and fourier spectra comparison across generators, but those seems to be just comments and comparison and not crucial points in the construction of the detector. Furthermore data augmentation doesn't play a big role, they say that it usually improves (a little) the detector.
As a math person I like to get some more meat from papers, but here it seems that little tricks allow then to win the game. Perhaps that is the way (little or no math involved) to make advances. Well, at least they say that shallow methods modify the fingerprint of the fourier spectra so that now you can't detect which is the generator of the image.
Perhaps the "universal word" was what captured my attention.
Instead, I haven’t seen a proposal for a system I think could work well. Camera and phone manufacturers could have their devices cryptographically sign each photo or video taken. And that’s it. From that starting place, you can build a system on top of it to verify that the image on the site you’re reading is authentic. What am I missing that makes this an invalid approach?
I do understand that this would require manufacturers to implement, but it seems achievable to get them onboard. I even think you get one company like Apple to do this and it’s enough traction for the rest of the industry to have to follow suit.
I am coming at this from the angle of, who would use this type of service other than the courts? Certainly major news organizations could benefit but we have numerous recent examples where they have either run with CNN imagery but they have also purposefully run video and use images of similar events to portray the view they wanted for a current event.
Of course in the end, if the end game is to have news, image, and video, validation there will need to be more than one and in separate enough areas of the world to have some chance all would not be intimidated / infiltrated to the point they are not trust worthy