Watermark Anything - Readit News

Invisible watermarks is just steganography. Once the exact method of embedding is known it is always possible to corrupt an existing watermark - however in some cases it may not be possible to tell if a watermark is present, such as if the extraction procedure always produces high entropy information even from unwatermaked content.

pierrefdz · 9 months ago

Watermarking is not just steganography and steganography is not just watermarking

In June 1996, Ross Anderson organized the first workshop dedicated specifically to information hiding at Cambridge University. This event marked the beginning of a long series known as the Information Hiding Workshops, during which foundational terminology for the field was established. Information hiding, i.e., concealing a message within a host content, branches into two main applications: digital watermarking and steganography. In the case of watermarking, hiding means robustly embedding the message, permanently linking it to the content. In the case of steganography, hiding means concealing without leaving any statistically detectable traces.

References: 1. R. J. Anderson, editor. Proc. 1st Intl. Workshop on Inf. Hiding, volume 1174 of LNCS, 1996. 2. B. Pfitzmann: Information hiding terminology - Results of an informal plenary meeting and additional proposals. In Anderson [1], pages 347–350.

Jerrrrrrry · 9 months ago

[x] is just [y] with more steps

Stenography is just security by more obscurity.

Specifically, shuffling compression, bit-rate, encryption, and barely human-perceivable signal around mediums (x-M) to obscure the entrophic/random state of any medium as to not break the generally-available plausible-deniability from a human-perception.

Can't break Shannon's law, but hides who intent of who is behind the knocks on the all doors. Obscures which house Shannon lives in, and whom who knocks wishes to communicate.

saithound · 9 months ago

> Stenography is just security by more obscurity

Security-by-obscurity is when security hinges on keeping your algorithm itself (as opposed to some key) hidden from the adversary.

I don't see how it has any connnection with what you're alluding to here.

Hugsun · 9 months ago

Note that stenography is very different from steganography.

https://en.wikipedia.org/wiki/Stenography

https://en.wikipedia.org/wiki/Steganography

causal · 9 months ago

I was thinking that too. This seems like a useful tool for a secret communication protocol.

Is this a big deal? I'm a layman here so this seems like a needed product but I have a feeling I'm missing something.

Jach · 9 months ago

Various previous attempts at invisible/imperceptible/mostly imperceptible watermarking have been trivially defeated, this attempt claims to be more robust to various kinds of edits. (From the paper: various geometric edits like rotations or crops, various valuemetric edits like blurs or brightness changes, and various splicing edits like cutting parts of the image into a new one or inpainting.) Invisible watermarking is useful for tracing origins of content. That might be copyright information, or AI service information, or photoshop information, or unique ID information to trace leakers of video game demos / films, or (until the local hardware key is extracted) a form of proof that an image came from a particular camera...

mananaysiempre · 9 months ago

... Ideal for a repressive government or just a mildly corrupt government agency / corporate body to use to identify defectors, leakers, whistleblowers, or other dissidents. (Digital image sensors effectively already mark their output due to randomness of semiconductor manufacturing, and that has already been used by abovementioned actors for the abovementioned purposes. But that at least is difficult.) Tell me with a straight face that a culture that produced Chat Control or attempted to track forwarding chains of chat messages[1] won’t mandate device-unique watermarks kept on file by the communications regulator. And those are the more liberal governments by today’s standards.

I’m surprised how eager people are to build this kind of tech. It was quite a scandal (if ultimately a fruitless one) when it came out colour printers marked their output with unique identifiers; and now that generative AI is a thing stuff like TFA is seen as virtuous somehow. Can we maybe not forget about humans?..

[1] I don’t remember where I read about the latter or which country it was about—maybe India?

est · 9 months ago

In my previous experience the "resizing & rotate" always defeats all kinds of watermarks. For example, crop a 1000x1000 image to 999x999, and rotate it by 1°

also there's "double watermark" attack, just run the result image through the watermark process again, usually the original watermark would be lost

mintplant · 9 months ago

My assumption is that this will be used to watermark images coming out of cloud-based generative AI.

jsheard · 9 months ago

And they'll say it's to combat disinformation, but it'll actually be to help themselves filter AI generated content out of new AI training datasets so their models don't get Habsburg'd.

Animats · 9 months ago

Why? Those are not copyrightable.

clueless · 9 months ago

this is one of the primary communication methods of oversea agents in CIA, interesting to have it be used more broadly </joke>

bagels · 9 months ago

Do you have a source? I'd be interested in reading more about this.

Dead Comment

coppsilgold · 9 months ago

wkirby · 9 months ago

Link to the paper in the README is broken. I believe this is the correct link to the referenced paper: https://arxiv.org/abs/2411.07231

Jaxan · 9 months ago

There is some nice information in the appendix, like:

“One training with a schedule similar to the one reported in the paper represents ≈ 30 GPU-days. We also roughly estimate that the total GPU-days used for running all our experiments to 5000, or ≈ 120k GPU-hours. This amounts to total emissions in the order of 20 tons of CO2eq.”

I am not in AI at all, so I have no clue how bad this is. But it’s nice to have some idea of the costs of such projects is.

barbazoo · 9 months ago

> This amounts to total emissions in the order of 20 tons of CO2eq.

That's about 33 economy class roundtrip flights from LAX to JFK.

https://www.icao.int/environmental-protection/Carbonoffset/P...

azinman2 · 9 months ago

It’s very interesting this is gpu time based because:

1. Different energy sources produce varyings of co2

2. This likely does not include co2 to make the GPUs or machines

3. Humans involved are not added to this at all, and all of the impact they have on the environment

4. No ability to predict future co2 from using this work.

Also if it really matters, then why do it at all? If we’re saying hey this is destroying the environmental and care, then maybe don’t do that work?

Deleted Comment

svilen_dobrev · 9 months ago

so say i have a site with 3000 images, 2M pixel each. How many GPU-months it would take to mark them? And, what gigabytes i would have to keep for the model?

doctorpangloss · 9 months ago

I wonder what will come of all the creative technologists out there, trying to raise money to do "Watermarking" or "Human Authenticity Badge," when Meta will just do all the hard parts for free: both the technology of robust watermarking, and building an insurmountable social media network that can adopt it unilaterally.

EGreg · 9 months ago

How do you think they trained their image AI? Instagram.

How was copilot trained? Github.

Zoom, others would love to use your data to train their AI. It’s their proprietary advantage!

Joel_Mckay · 9 months ago

It is called DRM codecs, and that has been around for 30+ years.

We did consider a similar FOSS project, but didn't like the idea of helping professional thieves abusing dmca rules.

Have a nice day. =3

turbocon · 9 months ago

Onavo · 9 months ago

What if the watermark becomes a latent variable that's indirectly learnt by a subsequent model trained on its generated data? They will have to constantly vary the mark to keep it up to date. Are we going to see Merkle tree watermark database like we see for certificate transparency? YC, here's your new startup idea.

nickpinkston · 9 months ago

I can imagine some kind of public/private key encrypted watermark system to ensure the veracity / provenance of media created via LLMs and their associated user accounts.

jerf · 9 months ago

There's many reasons why people are concerned about AI's training data becoming AI generated. The usual one is that the training will diverge, but this is another good one.

thanksgiving · 9 months ago

I think there should be an input filter that if it sees a watermark refuses to use that input and continues with the next input

wongarsu · 9 months ago

Camera makers are all working on adding cryptographic signatures to captured images to prove their provenance. The current standard embeds this in metadata, but if they start watermarking the images themselves then skipping watermarked images during training would quickly become an issue

rodneyg_ · 9 months ago

Does this watermark still work if someone screenshots an image?

pornel · 9 months ago

Yes. The data is embedded in the pixels of the image, and it's embedded in a way that survives recompression of the image and some editing.

nebalee · 9 months ago

Does it still work when I take a photo of the screen with a camera?

dangoodmanUT · 9 months ago

try running the code to find out

matrixhelix · 9 months ago

Now we need a link to the "Unwatermark Anything" repo

https://github.com/XuandongZhao/WatermarkAttacker

vdvsvwvwvwvwv · 9 months ago

     cat img > /dev/null | echo ""

kittikitti · 9 months ago

These can be easily jailbroken by quantizing the weights lower.

I think the intent is for deploying this between APIs and models.