Proposal: AI Content Disclosure Header

nrmitchi · 5 days ago

This seems like a (potential) solution looking for a nail-shaped problem.

Yes, there is a huge problem with AI content flooding the field, and being able to identify/exclude it would be nice (for a variety of purposes)

However, the issue isn't that content was "AI generated"; as long as the content is correct, and is what the user was looking for, they don't really care.

The issue is content that was generated en-masse, is largely not correct/trustworthy, and serves only to to game SEO/clicks/screentime/etc.

A system where the content you are actually trying to avoid has to opt in is doomed for failure. Is the purpose/expectation here that search/cdn companies attempt to classify, and identify, "AI content"?

TylerE · 5 days ago

It's the evil bit, but unironically.

edoceo · 5 days ago

For today's lucky 10k:

https://www.ietf.org/rfc/rfc3514.txt

Note date published

yahoozoo · 5 days ago

It says in the first paragraph it’s for crawlers and bots. How many humans are inspecting the headers of every page they casually browse? An immediate problem that could potentially be addressed by this is the “AI training on AI content” loop.

TrueDuality · 5 days ago

How many of the makers of these trash SEO sites are going to voluntarily identify their content as AI generated?

nrmitchi · 5 days ago

It would still be required for the content producer (ie, the content-spam-farm) to label their content as such.

The current approach is that the content served is the same for humans and agents (ie, a site serves consistent content regardless of the client), so who a specific header is "meant for" is a moot point here.

nikolayasdf123 · 5 days ago

I believe this is why Google did SynthID https://deepmind.google/science/synthid/

throwaway13337 · 5 days ago

Can we have a disclosure for sponsored content header instead?

I'd love to browse without that.

It does not bother me that someone used a tool to help them write if the content is not meant to manipulate me.

Let's solve the actual problem.

handfuloflight · 5 days ago

We already have those legally mandated disclosures per the FTC.

AKSF_Ackermann · 5 days ago

It feels like a header is the wrong tool for this, even if you hypothetically would want to disclose that, would you expect a blog cms to offer the feature? Or a web browser to surface it?

weddpros · 5 days ago

Maybe we should avoid training AI with AI-generated content: that's a use case I would defend.

Still I believe MIME would be the right place to say something about the Media, rather than the Transport protocol.

On a lighter note: we should consider second order consequences. The EU commission will demand its own EU-AI-Disclosure header be send to EU citizens, and will require consent from the user before showing him AI generated stuff. UK will require age validation before showing AI stuff to protect the children's brains. France will use the header to compute a new tax on AI generated content, due by all online platform who want to show AI generated content to french citizens.

That's a Pandora box I wouldn't even talk about, much less open...

ronsor · 5 days ago

> The EU commission will demand its own EU-AI-Disclosure header be send to EU citizens, and will require consent from the user before showing him AI generated stuff. UK will require age validation before showing AI stuff to protect the children's brains. France will use the header to compute a new tax on AI generated content, due by all online platform who want to show AI generated content to french citizens.

I think the recent drama related to the UK's Online Safety Act has shown that people are getting sick of country-specific laws simply for serving content. The most likely outcome is sites either block those regions or ignore the laws, realizing there is no practical enforcement avenue.

blibble · 5 days ago

> Maybe we should avoid training AI with AI-generated content: that's a use case I would defend.

if this takes off I'll:

   - tag my actual content (so they won't train on it)
   - not tag my infinite spider web of automatically generated slop output (so it'll poison the models)

win win!

ronsor · 5 days ago

then they'll start ignoring the header and it'll be useless

(of course, it was never going to be useful)

paulddraper · 5 days ago

Content-Type/MIME type is for the format.

There are dedicated headers for other properties, e.g. language.

weddpros · 5 days ago

Actually you're 100% correct.

Feels weird to me that encoding is part of MIME, but language isn't, although I understand why.

Deleted Comment

giancarlostoro · 5 days ago

It depends but for example if I wanted to train a LoRa that outputs a certain art style from a specific model, I have no issue with this being done. Its not like you are making a model from scratch.

vntok · 5 days ago

This feels like the Security Flag proposal (https://www.ietf.org/rfc/rfc3514.txt)

gruez · 5 days ago

or end up like california prop 65 warnings: https://en.wikipedia.org/wiki/1986_California_Proposition_65

userbinator · 5 days ago

Approximately as useless as "do not track".

woah · 5 days ago

Seems like someone just trying to get their name on a published IETF standard for the bragging/resume rights

xgulfie · 5 days ago

This is like asking the fox to announce itself before entering the henhouse