Readit News logoReadit News
jawns · 5 months ago
I'm an author, and I've confirmed that 3 of my books are in the 500K dataset.

Thus, I stand to receive about $9,000 as a result of this settlement.

I think that's fair, considering that two of those books received advances under $20K and never earned out. Also, while I'm sure that Anthropic has benefited from training its models on this dataset, that doesn't necessarily mean that those models are a lasting asset.

shermozle · 5 months ago
It's far from fair given that if _I_ breach copyright and get caught, I go to jail, not just pay a fine.
dragonwriter · 5 months ago
> It's far from fair given that if _I_ breach copyright and get caught, I go to jail, not just pay a fine.

This settlement has nothing to do with any criminal liability Anrhropic might have, only tort liability (and it doesn’t involves damages, not fines.)

mcv · 5 months ago
Yeah, but this is a corporation. They don't go to jail. They're only people when it's beneficial to them.
YetAnotherNick · 5 months ago
You wont be put to jail if you breach copyright in almost any country, at least not just for downloading content from libgen or torrent. If you are talking about Swartz, he was going to jail for wire fraud and hacking not breaching copyright.
weird-eye-issue · 5 months ago
No you wouldn't
illiac786 · 5 months ago
Wow, where do you live?

I don’t get fined 7000USD for illegally downloading 3 books for example, much less. Although if I’m a repeat offender it can go up to prison I think.

singpolyma3 · 5 months ago
You don't though
stevage · 5 months ago
What? Who goes to jail over copyright infringement?
DyslexicAtheist · 5 months ago
arent the US trying to extradite Kim Dotcom for years now? (or were at least in the past)

Dead Comment

jonplackett · 5 months ago
Will you actually get the mo ey or will your publisher finally earn out the advances?
gpm · 5 months ago
Just a FYI that it's closer to $6750 (Anthropic pays $9000, but 25% is likely to go to the attorneys - the exact number here is up to the court).

Can't help but feel the reporting about $3000/work is going to leave a lot of authors disappointed when they receive ~$2250 even if they'd have been perfectly happy if that was the number they initially saw.

tartoran · 5 months ago
> I think that's fair, considering that two of those books received advances under $20K and never earned out.

It may be fair to you but how about other authors? Maybe it's not fair at all to them.

terminalshort · 5 months ago
Do they sell their books for more than $3000 per copy? In that case it isn't fair. Otherwise they are getting a windfall because of Anthropic's stupidity in not buying the books.
jawns · 5 months ago
Then they can opt out of the class.
eschaton · 5 months ago
In my opinion, as a class member you should push for two things:

1. Getting the maximum statutory damages for copyright infringement, which would be something like &250,000 per instance of infringement; you can be generous and call their training and reproduction of your works as a single instance, though it’s probably many more than that. 2. An admission of wrongdoing plus withdrawal from the market and permanent deletion of all models trained on infringed works. 3. A perpetual agreement to only train new models on content licensed for such training going forward, with safeguards to prevent wholesale reproduction of works.

It’s no less than what they would do if they thought you were infringing their copyrights. It’s only fair that they be subject to the same kind of serious penalties, instead of something they can write off as a slap on the wrist.

sh1mmer · 5 months ago
I’m curious about how you confirmed some things you wrote were in the dataset.
fsckboy · 5 months ago
>Also, while I'm sure that Anthropic has benefited from training its models on this dataset

I thought that they didn't use this data for training, the "crime" here was making the copies.

>I think that's fair, considering that two of those books received advances under $20K and never earned out.

i don't understand your logic here, if they never earned out that means you were already "overpaid" compared to what they were worth in the market. shouldn't fairness mean this extra bonus goes first to cover the unmet earnout?

thayne · 5 months ago
How much of that $9000 will go to your publisher?
jawns · 5 months ago
Remains to be seen, but generally the holder of copyright is the author not the publisher.
Unai · 5 months ago
As I understand, this case is not about training but about illegitimately sourcing the books, so unless you sell your books at $3k per copy, I don't see how it is fair.

Deleted Comment

midnitewarrior · 5 months ago
What's more fair is for Anthropic to put 5% of their preferred shares at their most recent valuation into a pool that the authors of these books can make a claim against. For 18 months, any author in this cache of books can claim their ownership and rights to their proportional amount of the shares within all claimants.

Perhaps tokenize all of the books and assign proportionally for token count of each publication.

xvector · 5 months ago
What a ridiculous assertion. They're already getting 100-1000x the value of their books. Truly bloodlust knows no bounds.
suyash · 5 months ago
For you might be okay, but there are others who probably are losing way too much money because of what happened. Anthropic need to pay 5x to 10x more, it needs to set a deterrent.
Suppafly · 5 months ago
>I think that's fair, considering that two of those books received advances under $20K and never earned out.

Doesn't that mean the money should go to your publisher instead of you?

echelon · 5 months ago
> that doesn't necessarily mean that those models are a lasting asset.

It remains to be seen, but typically this forms a moat. Other companies can't bring together the investment resources to duplicate the effort and they die.

The only reasons why this wouldn't be a moat:

1. Too many investment dollars and companies chasing the same goal, and none of them consolidate. (Non-consolidation feels impractical.)

2. Open source / commoditize-my-complement offerings that devalue foundation models. We have a few of these, but the best still require H100s and they're not building product.

I think there's a moat. I think Anthropic is well positioned to capitalize from this.

Deleted Comment

franze · 5 months ago
where can i check if my book was in it?
simonw · 5 months ago
One of the sources is LibGen, you can search that with this tool: https://www.theatlantic.com/technology/archive/2025/03/searc...
nextworddev · 5 months ago
Fair for you maybe
k__ · 5 months ago
Cool.

Where can I check if I'm eligible?

motbus3 · 5 months ago
Who am I to say anything.

It is just another opinion.

It is not about 9k for your knowledge in that book. Is 9k for taking you out. The faster they are able to grab data and process the less chance you have to make money from your work.

The money is irrelevant if we allow them to break the law. They even might pay you 9k for those books, but you might never get anything again because they would have made copyright useless

hsaliak · 5 months ago
Might be fair for you, is it fair to JK Rowling?
stubish · 5 months ago
Yes. JK Rowling can still sue about her work being used for training. This lawsuit is about the illegal downloading of her works.
iamsaitam · 5 months ago
Does JK Rowling really deserve any fairness? She doesn't seem to think that everyone deserves it

Dead Comment

visarga · 5 months ago
How is it fair? Do you expect 9,000 from Google, Meta, OpenAI, and everyone else? Were your books imitated by AI?

Infringement was supposed to imply substantial similarity. Now it is supposed to mean statistical similarity?

jawns · 5 months ago
You've misunderstood the case.

The suit isn't about Anthropic training its models using copyrighted materials. Courts have generally found that to be legal.

The suit is about Anthropic procuring those materials from a pirated dataset.

The infringement, in other words, happened at the time of procurement, not at the time of training.

If it had procured them from a legitimate source (e.g. licensed them from publishers) then the suit wouldn't be happening.

wingspar · 5 months ago
My understanding is this settlement is about the MANNER in which Anthropic acquired the text of the books. They downloaded illegal copies of the books.

There was no issues with the physical copies of books they purchased and scanned.

I believe the issue of USING these texts for AI training is a separate issue/case(s)

Retric · 5 months ago
Penalties can be several times actual damages, and substantial similarity includes MP3 files and other lossy forms of compression which don’t directly look like the originals.

The entire point of deep learning is to copy aspects from training materials, which is why it’s unsurprising when you can reproduce substantial material from a copyrighted work given the right prompts. Proving damages for individual works in court is more expensive than the payout but that’s what class action lawsuits are for.

gruez · 5 months ago
>Were your books imitated by AI?

Given that books can be imitated by humans with no compensation, this isn't as strong as an argument as you think. Moreover AFAIK the training itself has been ruled legal, so Anthropic could have theoretically bought the book for $20 (or whatever) and be in the clear, which would obviously bring less revenue than the $9k settlement.

SilasX · 5 months ago
Be careful what you wish for.

While I'm sure it feels good and validating to have this called copyright infringement, and be compensated, it's a mixed blessing at best. Remember, this also means that your works will owe compensation to anyone you "trained" off of. Once we accept that simply "learning from previous copyrighted works to make new ones" is "infringement", then the onus is on you to establish a clean creation chain, because you'll be vulnerable to the exact same argument, and you will owe compensation to anyone whose work you looked at in learning your craft.

This point was made earlier in this blog post:

https://blog.giovanh.com/blog/2025/04/03/why-training-ai-can...

HN discussion of the post: https://news.ycombinator.com/item?id=43663941

simonw · 5 months ago
This settlement isn't about an LLM being trained in your work, it's about Anthropic downloading a pirated ebook of your work. https://simonwillison.net/2025/Sep/6/anthropic-settlement/
brendoelfrendo · 5 months ago
It's a good thing that laws can be different for AI training and human consumption. And I think the blog post you linked makes that argument, too, so I'm not sure why you'd contort it into the idea that humans will be compelled to attribute/license information that has inspired them when creating art.
marcus_holmes · 5 months ago
LLMs cannot create copyrightable works. Only humans can do that [0]. So LLMs are not making new copyrightable works.

[0] not because we're so amazingly more creative. But because copyright is a legal invention, not something derived from first principles, and has been defined to only apply to human creations. It could be changed to apply to LLM output in the future.

_DeadFred_ · 5 months ago
An infinitely scaling commercial for profit product designed to replace every creative by applying software processing to previous works is treated very differently than a sentient human being and their process of creativity.

The fact AI proponents can't see that is insane. Reminds me of the quote:

"It is difficult to get a man to understand something, when his salary depends upon his not understanding it."

abtinf · 5 months ago
This is basically the socialist/communist argument for mass expropriation.
rideontime · 5 months ago
Direct link to Judge Alsup's order: https://www.bloomberglaw.com/public/desktop/document/Bartzet...

Name should sound familiar to those who follow tech law; he presided over Oracle v Google, along with Anthony Levandowski's criminal case for stealing Waymo tech for Uber.

wrsh07 · 5 months ago
As someone who has had a passing interest in most of these cases, I've actually come to like Alsup and am impressed by his technical understanding.

His orders and opinions are, imo, a success story of the US judicial system. I think this is true even if you disagree with them

darkwizard42 · 5 months ago
He actually does understand most of what he is ruling on which is a welcome surprise. Not just legal jargon but also the technical spirit of what is at stake.
bsimpson · 5 months ago
He's also the one who called bullshit when Oracle tried to claim that Java's function signatures were so novel they should be eligible for copyright. (Generally, arts are copyrightable and engineering is not - there's a creativity requirement.)

They tried to say `rangeCheck(length, start, end)` was novel. He spat back that he'd written equivalent utility functions as a hobbyist hundreds of time!

kemitchell · 5 months ago
Art versus engineering is a very dangerous generalization of the law. There is a creativity requirement for copyrightability, but it's an explicitly low bar. Search query "minimal degree of creativity".

The Supreme Court decision in Oracle v Google skipped over copyrightability and addressed fair use. Fair use is a legal defense, applying only in response to finding infringement, which can only be found if material's copyrightable. So the way the Supreme Court made its decision was weird, but it wasn't about the creativity requirement.

anp · 5 months ago
Comments so far seem to be focusing on the rejection without considering the stated reasons for rejection. AFAICT Alsup is saying that the problems are procedural (how do payouts happen, does the agreement indemnify Anthropic from civil “double jeopardy”, etc), not that he’s rejecting the negotiated payout. Definitely not a lawyer but it seems to me like the negotiators could address the rejection without changing any dollar numbers.
yladiz · 5 months ago
Yes, exactly. The article is pretty clear that it’s rejected without prejudice and that a few points need to be ironed out before he gives a preliminary approval. I suspect a lot of folks didn’t read much/any of TFA.

I do wonder if all of the kinks will be smoothed out in time. Not a lawyer too, but the timeline to create the longer list is a bit tight, and generally feels like we could see an actual rejection or at least a stretched out process here that goes on for a few more months at least before approval.

qingcharles · 5 months ago
Exactly. The judge is doing exactly what he's designed to do in a civil case -- help forge an agreement between the parties that doesn't come back to bite anyone in the future. The last thing a judge wants is a case getting reopened and relitigated a year from now because there was a "bug" in the settlement.
lxe · 5 months ago
Good. Approving this would have set a concerning precedent.

Edit: My stance on information freedom and copyright hasn't changed since Aaron Swartz's death in 2013. Intellectual property laws, patents, copyright, and similar protections feel outdated and serve mainly to protect established interests. Despite widespread piracy making virtually all media available immediately upon release, content creators and media companies continue to grow and profit. Why should publishers rely on century-old laws to restrict access?

tene80i · 5 months ago
Because whenever anyone argues that all creative and knowledge works should be freely available, accessible without compensating the creators, they conveniently leave out software and the people who make it.

Moreover, IP law protects plenty of people who aren’t “established interests”. You just, perhaps, don’t know them.

lxe · 5 months ago
I make the software. I use free software and I contribute to free software. I wish all the software were free from all sorts of restrictions.
gabriel666smith · 5 months ago
Would it actually set any kind of legal precedent, or just establish a sort of cultural vibe baseline? I know Anthropic doesn't have to admit fault, and I don't know if that establishes anything in either direction. But I'm not from the US, so I wouldn't want to pretend to have intimate knowledge of its system.

The number of bizarre, contradictory inferences this settlement asks you to make - no matter your stance on the wider question - is wild.

gpm · 5 months ago
The settlement doesn't set any kind of precedent at all.

The existing ruling in the case establish "persuasive" (i.e. future cases are entirely free to disagree and rule to the contrary) precedent - notably including the part about training on legally acquired copies of books (e.g. from a book store) being fair use.

Only appeals courts establish binding precedent in the US (and only for the courts under them). A result of this case settling is that it won't be appealed, and thus won't establish any binding precedent one way or another.

> The number of bizarre, contradictory inferences this settlement asks you to make - no matter your stance on the wider question - is wild.

What contradictions do you see? I don't see any.

stingraycharles · 5 months ago
A settlement means that no legal precedent is set, so I can only assume a cultural precedent.

Sometimes these companies specifically seek out a settlement to avoid setting a legal precedent in case they feel like they will lose.

jokoon · 5 months ago
That tiles is weird, what is an "Anthropic judge"?
rideontime · 5 months ago
The judge for the Anthropic lawsuit, obviously.

Deleted Comment

giveita · 5 months ago
A human judge. Make the most of it, times are changing.
m463 · 5 months ago
not to be confused with an anthropomorphic judge.
phaedryx · 5 months ago
It sounds like the judge works for Anthropic
alok-g · 5 months ago
Indeed. While I could sense what was implied, I also thought of some newly-launched 'AI Judge' by Anthropic making the said claim. :-)
puppycodes · 5 months ago
I have no empathy for multi-billion dollar companies but intellectual property and copyright does nothing for positive for humanity.
program_whiz · 5 months ago
In an economy where ideas have value, it seems logical we should have property protection, much like we do for physical goods. Its easy to argue "ideas should be freely shared", but if an idea takes 20 years and $100M dollars to develop, and there are no protections for ideas, then no one will take the time to develop them. Most modern technology we have is due to copyright/patents (drugs, electronics, entertainment, etc.), because without those protections, no one would have invested the time and energy to develop them in the first place.

I believe you are probably only looking at the current state of the world and seeing how it "stifles competition" or "hampers innovation". Those allegations are probably true to some extent, especially in specific cases, but its also missing the fact that without those protections, the tech likely wouldn't be created in the first place (and so you still wouldn't be able to freely use the idea, since the person who invented it wouldn't have).

8note · 5 months ago
> drugs

this is a kinda strange example, since the discovery tends to be government funded research, and the safety shown by private money

the USSR went to space without those protections. its not like property protections are the only thing that has driven invention.

MIT licenses are also pretty popular as are creative commons licenses.

people also do things that don't make a lot of money, like teaching elementary school. it costs a ton of money to make and run all those schools, but without any intellectual property being created that can be sold or rented out.

i dont believe that nobody would want to build much of the things we have now, if there wasnt IP around them. Making and inventing things is fun

Permit · 5 months ago
> but if an idea takes 20 years and $100M dollars to develop, and there are no protections for ideas, then no one will take the time to develop them

This sounds trivially true but I have some trouble reconciling it with reality. For example the Llama models probably cost more than this to develop but are made freely available on GitHub. So while it’s true that some things won’t be built, I think it’s also the case that many things would still be built.

tolerance · 5 months ago
I appreciate you giving the parent comment a fair chance.

As a society we’re having trouble defining abstract components of the self (consciousness, intelligence, identity) as is. What makes the legislative notion of an idea and its reification (what’s actually protected under copyright laws) secure from this same scrutiny? Then patent rights. And what do you think may happen if the viability of said economy comes into question afterwards?

netbsdusers · 5 months ago
It's just a fiction to allow something freely copiable - pure information - to be pretended to be a commodity. If the AI firms have only a single redeeming feature, then it is that in them the copyright mafia finally has to face someone their own size, rather than driving little people to suicide, as they did to Aaron Swartz.
jonathanstrange · 5 months ago
Only people who don't create anything say that. Every musician and every author I know (including myself) thinks they should have some rights concerning the distribution and sale of the products of their work. Why should a successful book author be forced to live on charity?
BrawnyBadger53 · 5 months ago
Weird framing, I don't think this is what they were suggesting
arduanika · 5 months ago
What do you do for work, and do you believe it should be given away for free? Or are you just talking about other people's work?
nextworddev · 5 months ago
Are we even sure some of these posters aren’t LLms
2OEH8eoCRo0 · 5 months ago
I think the term has gotten way too long (70+ years at least) and we can thank Disney for that.

Deleted Comment

cleandreams · 5 months ago
The judge IIRC found that training models using copyrighted materials was fair use. I disagree. Furthermore this will be a problem for anyone who generates text for a living. Eventually LLMs will undercut web, news, and book publishing because LLMs capture the value and don't pay for it. The ecosystem will be harmed.

The only problem the judge found here was training on pirated texts.

xvector · 5 months ago
The ecosystem is irrelevant, the development of AI is a far higher priority than the ecosystem.
nicce · 5 months ago
Said by every for-profit company ever.
firesteelrain · 5 months ago
How do any of these AI companies protect authors by users uploading full PDF or even plaintext of anything? Aren’t the same piracy concerns real even if they train on what users are providing ?
jahbrewski · 5 months ago
If you’re vacuuming, shouldn’t you be responsible for what you’re vacuuming?
gowld · 5 months ago
If this is detected as leading to copyright violation, then that can be the subject of a lawsuit.

Since the violation is detected via model output, it doesn't matter what the input method is.

robryan · 5 months ago
Training aside, an llm reading a pdf as part of a prompt feels similar to say Dropbox storing a pdf for you.
pessimizer · 5 months ago
I bet you could get a court to say it was legally identical.

I think the Aereo case, and Scalia's dissent, are super relevant here. It's when the court decided to go with vibes, instead of facts. The inevitable result of that (which Scalia didn't predict) was selective enforcement.

edit: so what I really mean is that I bet you could get a court to say whatever you wanted about it if you were far wealthier and more influential than your opponents.

terminalshort · 5 months ago
It's not similar at all because you can't get the book back out of the LLM like you can out of Dropbox. Copyright law is concerned with outputs, not with inputs. If you could make a machine that could create full exact copies of books without ever training on or copying those books, that would still be infringement.