I appreciate the effort, but without any legal backing these signals are just going to be ignored like robots.txt. Hell even if they were legally binding they'd probably still be ignored if scrapers thought they could obfuscate the paper trail enough to get away with it.
To add anecdotally based on logging on my portfolio site, all major US players (OpenAI, Google, Anthropic, Meta, CommonCrawl) appeared to respect robots.txt as they claim to do (can't say the same of Alibaba).
Sometimes I do still get requests with their useragents, but generally from implausible IPs (residential IPs, or "Google-Extended" from an AWS range, or same IP claiming to be multiple different bots, ...) - never from the bots' actual published IP addresses (which I did see before adding robots.txt) - which makes me believe it's some third party either intentionally trolling or using the larger players as cover for their own bots.
Asking people to read your content with a specific purpose or intent has traditionally not been very successful or useful. I understand people are frustrated with the knowledge transfer, but if the goal was to increase the reach of your ideas, it's being accomplished.
AI being involved changes the scale and scope, but it doesn't change the fundamentals. China and India were already imitating and cloning everything for their markets and for ours.
We have had virtually zero success enforcing patent, copyright and barely even the lowest bar trademark enforcement. There may not be any framework for this kind of enforcement that I want to see that would be effective, but I am open to ideas that don't involve government overreach etc.
The ietf should be concerned with user concerns. If they make standards about AI preferences it should be around memory and language and stuff like that, not meddling with legal matters that are outside of their scope and expertise.
Does anyone know whether there any licences or licence derivatives - like the various flavors of Creative Commons - that currently restrict usage by AI LLMs?
The legal machinery is already in place, we now need precisely that: a standard for machine-readable reservations.
Sometimes I do still get requests with their useragents, but generally from implausible IPs (residential IPs, or "Google-Extended" from an AWS range, or same IP claiming to be multiple different bots, ...) - never from the bots' actual published IP addresses (which I did see before adding robots.txt) - which makes me believe it's some third party either intentionally trolling or using the larger players as cover for their own bots.
AI being involved changes the scale and scope, but it doesn't change the fundamentals. China and India were already imitating and cloning everything for their markets and for ours.
We have had virtually zero success enforcing patent, copyright and barely even the lowest bar trademark enforcement. There may not be any framework for this kind of enforcement that I want to see that would be effective, but I am open to ideas that don't involve government overreach etc.
Deleted Comment