X changes its terms to bar training of AI models using its content

If an artist or author can't do this, social media shouldn't be able to do it either.

If Xai wants to train on public corpus, it shouldn't be allowed to prevent its own corpus from being used.

We need regulations to limit the power grabs. Train all you like, but don't dare try to constrain to your walled gardens.

We should also probably nip the "foundation model company / also a social media company" conglomeration in the bud.

mgraczyk · 8 months ago

Artists can do this, and they do

loudmax · 8 months ago

Yes, but do artists have the ability to actually monitor and enforce this? You have to have the capacity and the wherewithal and to test these models to even know that your data is being ingested into AI.

Big companies like the New York Times and Twitter/X have the funds to pay for this. Miscellaneous artists probably don't.

teeray · 8 months ago

> If an artist or author can't do this, social media shouldn't be able to do it either.

Even if this is done, the case of starving artist v. megacorp will probably go to whoever wields the most money and lawyers. To add insult to injury, the artist’s opponent is fueled by their ill-gotten gains.

yndoendo · 8 months ago

This is dependent on country. USA, yes with their draconian methods. Countries like the UK, the looser of the suit pays all the cost. UK layers have no problem taking low wealth client cases they know will win. UK allows for David vs Goliath and David to win. US up lifts Goliath as a God.

jimbokun · 8 months ago

If social media can do this, an artist or author should be able to do it, too.

vouaobrasil · 8 months ago

Social media should do it to set a legal precedent.

> We need regulations to limit the power grabs. Train all you like, but don't dare try to constrain to your walled gardens.

No, no one should train, period.

echelon · 8 months ago

> No, no one should train, period.

I get that you have your own opinion, but I'm personally tired of living in the butter-churning era and would prefer that this all went a bit faster.

I want my real time super high fidelity holo sim, all of my chores to be automatically done, protein folding, drug discovery. The life extension, P = NP future. No more incrementalism.

If the universe only happens once, and we're only awake for a geological blink of an eye, I'd rather we have an exciting time than just be some paper-pushing animals that pay taxes and vanish in a blip.

I'd be really excited if we found intelligent aliens, had advanced cloning for organ transplants and longevity, developed a colony on Mars, and invented our robotic successor species. Xbox and whatever most normal people look forward to on a day to day basis are boring.

It would be interesting to have a "classical AI model", trained on the contents of the Harvard libraries before 1926 and now out of copyright.

gausswho · 8 months ago

It does surprise me that we haven't seen nations revise their copyright window back to something sensible in a play to seed their own nascent AI industry. The American founding fathers thought 20 years was enough. I'm sure there'd be repercussions in the banking system, but at some point it might be worth the trade.

blibble · 8 months ago

they can't

a 50 year minimum is part of the berne convention, which itself is as close to a universal law as humanity has

(even North Korea is a signatory)

MattGaiser · 8 months ago

Why would it matter? Copyright has been irrelevant so far.

eru · 8 months ago

What's the connection with the banking system?

nickpsecurity · 8 months ago

I wish someone would update and use PG19 for 7-30B+ model:

https://github.com/google-deepmind/pg19

That gives us a model that's 100% open and reproducible with low, legal risk. It would also be a nice test of how much AI's generalize from or repeat behavior in their pretraining data.

Then, a new model using that, The Stack, and FreeLaw's stuff (by paying them to open source it). No Github Issues or anything with questionable licenses or terms of service violations. That could be the next baseline for lawful models with coding ability, too. Research in coding AI's might use it.

kibwen · 8 months ago

Careful, you might create an artificial superintelligence that way. Safer to just train on the Twitter dataset.

Shadowmist · 8 months ago

that’s how you end up with an Artificial Idiot.

mbg721 · 8 months ago

If you thought AI now had out-of-control racism...

carlio · 8 months ago

It'd look like this: https://www.smbc-comics.com/comic/copyright

Dead Comment

murph-almighty · 8 months ago

I've similarly wondered if I could get a pre-2024 Wikipedia if just for the "fact based" flavor LLM

landl0rd · 8 months ago

Do you think Wikipedia starting in '24 was polluted by AI slop? This is certainly possible, I'm just not aware of it happening.

Wikipedia periodically publishes database dumps and the Internet Archive stores old versions: https://archive.org/search?query=subject%3A%22enwiki%22%20AN...

Plus you could also grab the latest and just read the 12/31/23 revisions.

malinens · 8 months ago

What happened to wikipedia in 2024?

zombot · 8 months ago

Right, stealing training data from others is OK, having it stolen from you is not. What else is new?

keyle · 8 months ago

New logo every couple of years and Bob's your uncle.

ivape · 8 months ago

X/Twitter has became extremely prohibitive with just about everything since Elon took over. Their API pricing was antagonistic toward even indie developers. Elon is not a generous guy.

newsbinator · 8 months ago

> Elon is not a generous guy

Why would he be?

threeseed · 8 months ago

Almost certainly the easter egg found in the Trump "Big Beautiful Bill" which prevents states from enacting AI regulations also came from Musk.

That way he can continue to steal from others and lock competitors out whilst being comfortable knowing that no laws will be enacted to prevent it.

api · 8 months ago

We really need a one bill one topic amendment. We are going to get to where there is one bill a year that nobody reads and everything else by executive order, at which point congress is just for show.

labster · 8 months ago

Yep, Musk saying he’s going to fund primary campaigns against congressmembers who vote for the Big Beautiful Bill is all just a brilliant bit of reverse psychology.

Or more likely, Congress is super worried about Roko’s Basilisk.

NekkoDroid · 8 months ago

> Almost certainly the easter egg found in the Trump "Big Beautiful Bill" which prevents states from enacting AI regulations also came from Musk.

My guess is on Peter Thiel

mgoetzke · 8 months ago

why do you think he is so evil but all others are benign ?

lesuorac · 8 months ago

Who's training an AI on the "Tweet" button text?

Or are they trying to forgo section 230 protection and claim ownership of content uploaded to the site?

GuB-42 · 8 months ago

These are just terms of service, not copyright.

It means that assuming training AI models is fair use (if it wasn't AI companies including xAI would be in trouble), they can't really stop you.

But now, essentially, they are telling you that they can block your account or IP address if you do. Which I believe they can for basically any reason anyways.

grugagag · 8 months ago

How would they know you’re training some LLM though?

lambertsimnel · 8 months ago

Perhaps they want the prohibition on using the site content for AI training to be considered based on something other than their ownership of it, like bandwidth usage or users' rights

HenryBemis · 8 months ago

They will get paid to share our (your) data and they will use the money for infra and new yachts.

cameldrv · 8 months ago

Naturally I'm sure Grok reads the terms of service on every website it scrapes and doesn't use content from sites that prohibit it.

Deleted Comment

thih9 · 8 months ago

I think the rules should be stricter.

I’d prefer an explicit opt in from the content author being required for anyone to perform any model training with any given data.

Alternatively, require all weights, prompts and chat logs to have the same visibility as the original datasets.

None of this is going to happen and current decisions about uncopyrightable ai[1] are already good; but still, it feels like there is room for abuse.

[1]: https://en.m.wikipedia.org/wiki/Th%C3%A9%C3%A2tre_D%27op%C3%...

Well, you explicitly opt-in to Twitter ToS whenever you post anything there.

This is not opt-in how I understand it. When there is no alternative, or the alternative is not using a service, I'd call it a hard requirement instead.

I like how opt-in is handled by GDPR; e.g.: "Consent must be a specific, freely given, plainly worded, and unambiguous affirmation given by the data subject (...) A data controller may not refuse service to users who decline consent to processing that is not strictly necessary in order to use the service.", source: https://en.wikipedia.org/wiki/General_Data_Protection_Regula...

Animats · 8 months ago

michaelcampbell · 8 months ago

"its content" indeed.

matwood · 8 months ago

Weird this just happened. I assumed all sites with any sort of content changed their terms soon after ChatGPT hit the scene.

nailer · 8 months ago

Yep, from https://the-decoder.com/reddit-ends-its-role-as-a-free-ai-tr... :

You must not, and must not allow those acting on your behalf to:

...use the Data APIs to encourage or promote illegal activity or violation of third party rights (including using User Content to train a machine learning or AI model without the express permission of rightsholders in the applicable User Content);

soulofmischief · 8 months ago

In my eyes that is considered fair use, and I think the courts will come to agree unless they are financially incentivized to look the other way and thus create a moat for existing players at the expense of newcomers.