I found this article frustratingly vague on how prosecraft.io actually worked. As far as I can tell, the author scraped the web for books, including in-copyright books. Then he analyzed it with techniques based on "classical" natural language processing techniques, rather than transformers or deep learning. He appears to have retained the books he scraped for future analysis. The site itself seems to use only snippets.
However, the apology [0] says that the creator did not "intend" to participate in AI that can "create zero-effort impersonations of artists." I'm not sure if the wording is unintentionally vague, or if there is some way his project could be used in that way.
For what it's worth, the Computational Story Lab's hendometer [1] seems to have largely out-of-copyright books from Project Gutenberg, plus the Harry Potter series.
Edit: Apparently he was working on an LLM project. https://twitter.com/stealcase/status/1688721685585809408. It's unclear whether he was planning to use the books he scraped (although as @stealcase points out, GPT-Neox itself was trained on books that were pirated).
If he says he didn't do something, the pitchfork mob will simply tell each other that he is lying. They will do this in the most confused and twisted way possible, driven by lack of understanding of what was happening combined with a need to drive outrage and thereby advertise their work.
If he says he didn't intend to do that thing, this is still compatible with a later update that he didn't do that thing, but immediately dampens the outrage machine. The reader who knows nothing about either side relaxes -- "No need for me to get worked up, because we won". Conveniently, saying he didn't intend to do the thing is also compatible with a later reveal that he was doing the thing (perhaps for later release, since he wasn't clearly doing the thing here).
Therefore, regardless of whether he was doing what he was accused of doing, this is the lowest energy response, and probably the default unless this was the hill he wanted to defend.
The best response, for us all collectively, is to always ignore everyone's opinion online. There is zero value in anything on reddit, twitter, facebook, the media these days.
Just ignore it. All of it. Outrage or not.
I see downvotes, but I mean it. You know who you listen to? Your friends. Your neighbours. Your local community. You listen to PEOPLE, not sockpuppets. You listen to legitimate human beings, not AI generated blather, or curated news stories, or groups working together to generate hate, outrage, to stoke anger, upset.
You listen to actual, real PEOPLE.
You want to go to reddit? Twitter? Anything? Fine. But treat it as 100% fiction, pure entertainment, and never let it affect YOU.
> However, the apology [0] says that the creator did not "intend" to participate in AI that can "create zero-effort impersonations of artists." I'm not sure if the wording is unintentionally vague, or if there is some way his project could be used in that way.
This seems FUDdy. "Intend" isn't in the apology at all, and the wording that is there says clearly that generative AI came after prosecraft, so there's no way the tool could be used for it.
> It's unclear whether he was planning to use the books he scraped
This also seems unwarranted. The tweet about fine-tuning an LLM came 5-6 years after the guy made prosecraft; why suggest they might involve the same dataset?
I apologize for the quotes around intend. I wrote it without, then I forgot it was a paraphrase and added them back again. Unfortunately, I cannot edit my comment to fix that.
I do think “intend” is a reasonable paraphrase of “never wanted to.”
(Edited to add) I don’t think prosecraft was a finished project and he was definitely still working on his other tool for writers that incorporates some of the same tools.
> The tweet about fine-tuning an LLM came 5-6 years after the guy made prosecraft; why suggest they might involve the same dataset?
The reason being that he had mentioned he was planning to use the scraped books for future analysis.
I am a bit confused about what's so outrageous about this tool. It seems that both the book authors, and some of the people in the discussion here, conflate rudimentary statistics about a book (number of words of certain kind) with the latest wave of generative AI. They are very different in both what value they provide, and what risk they pose to book authors.
The tool that book authors got outraged about only provides basic metrics, not dissimilar from other metrics such as "page count", and can't be used to produce new content which could deprive the book authors from revenue.
If you read through the angry Twitter thread it's clear that almost everyone thinks that either a) the site is a pirate site that lets you download books or b) that the site lets you generate works in the style of an author. Neither of which is true of course.
There are a handful (like < 3 people) who seem to understand what the site actually does who were still angry because the creator seems to have pirated the books. I actually don't know about the legality of something like that. Surely providing pirated books is illegal, but IDK if acquiring pirated books actually is.
I think it's clear though that most of the outrage would still be there even if the author had purchased each and every book.
If you want to do this kind of thing, let authors opt-in (or publishers).
Yes, it will take effort and probably go slow, but if the tool is really useful and amazing, it should be doable.
I suspect the authors are put-off by a couple things:
- the text of the works scanned seems like it may be from pirated sources. That poisons the project, no matter what it does with the scans, for many authors.
- the use of these scans in a commercial product
The article itself is clueless… it doesn’t engage authors’ concerns at all, and just portrays authors as stoopid AI-fearful luddites.
> If you want to do this kind of thing, let authors opt-in (or publishers).
If it's fair use, why should you have to do that? The same copyright law protecting author's ownership rights over their art also provide "fair use" to other people. Someone may disagree with current fair use laws (and I suspect many outraged here do not), but that's a broader issue not related to this particular tool. It just 100% seems like misdirected AI outrage.
> the text of the works scanned seems like it may be from pirated sources.
Do you have a source for this? I didn't see that mentioned in the article.
> Do you have a source for this? I didn't see that mentioned in the article.
The person who runs prosecraft says "I looked to the internet for more text that I could analyze, and I used web crawlers to find more books." [0]
I'm just inferring, but if they had, say, purchased each of these books, or borrowed them from the library, or only sourced from sites that ensure the copyright is satisfied, then they might have mentioned it.
(FWIW, the blog post says the other source for the 25K works was their personal library, so I'm assuming the bulk of the 25K come from the internet, though I know some people have prodigious personal libraries.)
"How much of someone else's work can I use without getting permission?
Under the fair use doctrine of the U.S. copyright statute, it is permissible to use limited portions of a work including quotes, for purposes such as commentary, criticism, news reporting, and scholarly reports."
> The article itself is clueless… it doesn’t engage authors’ concerns at all, and just portrays authors as stoopid AI-fearful luddites.
Going off of some of the tweets about this that initially whipped up the outrage about this…it’s not like they were making a nuanced case about their concerns, they were basically just stomping their feet and shouting.
> If you want to do this kind of thing, let authors opt-in (or publishers).
"This kind of thing" is factual information about the book, such as page or word count, ly-adverb count, etc. Small snippets, something permissible under copyright law today, that were heavily editorialized and commented on were displayed.
To suggest that counting words and pages is something that should not be allowed is silly.
> The article itself is clueless…
Says the person making stuff up to force a narrative.
The person doing this had the rights to do this, and was very clearly within his rights to do this under copyright law. Counting words is not a crime.
> The article itself is clueless… it doesn’t engage authors’ concerns at all, and just portrays authors as stoopid AI-fearful luddites.
The authors quotes speak for themselves. They very clearly and ignorantly claimed that this was an "AI training project" when it was nothing of the sort.
Statistical analysis is only useful if you have enough data to analyse, so there is in fact a threshold of number of books to cross before the tool can even really exist. If you read his post, the initial goal was to get stats about typical word count, typical amount of passive speech, etc. Requiring opt-in for these broad statistics, through outrage only since this project is CLEARLY legal in the United States, means that tools like this will never exist. Which seems net bad to me.
If you are saying it should be opt-in only for the pages analyzing specific books, like the instigator of this outrage screen-shotted, well that seems to fall squarely into the critical analysis bucket, so that is also quite ridiculous.
I understand some folks being unhappy that a portion of the works were pirated, but it seems like most of the outraged would be outraged even if he personally purchased each and every ebook.
Also, if you read through the Twitter thread a lot of the authors (not 100%, but a LOT) are doing a really great job portraying themselves as "stoopid AI-fearful luddites". Many of them think the site is somehow like ChatGPT and they don't bother to dig any deeper, or really at all.
Yeah, the article represents the voice of the authors in two tweets, from authors not apparently notable enough to have a wikipedia page. One I couldn't even find on Goodreads. It's obvious there's more to this than just the tweets presented. The article is unhelpful in this regard.
While I would agree in theory that a project like this would be best with opt-in, in reality that would just not work. Publishers would never opt-in to it, if they even respond to your requests at all.
Or, if you do it, do it privately and don't share it on the internet?
I'm not sure why this is a difficult idea; if asking for something and getting permission to do it is so difficult that 'would just not work. Publishers would never opt-in to it'
...then, it seems really obvious that even if you want to do it, technically can do it and you could maybe make a legal argument to doing it doesn't violate any laws...
...why would you do it? Why would you post about doing it?
Come on, that's literally being a selfish dick; spitting in people's faces and waving a 'too bad, you can't sue me' flag.
There are so many things, so many mannnny things that you could work on, why would you choose to pick something that you knew would upset people and you knew you wouldn't get permission to do if you asked?
Authors are not demigods, they don’t have a right to control the use of their works, only the reproduction.
When you publish a book you “consent” to the fact that people are going to take it apart, talk about it, review it, quote from it, and yes run statistics on it. If an author doesn’t want that to happen then they shouldn’t publish a book. Just keep it private, only distribute it to people you trust after they sign an NDA.
As far as anyone knows, no piracy has occurred. In the US you are allowed to scan books, index them, and post excerpts - it’s called Google Books and there was a big case that affirmed that it is legal. Downloading a book from a pirate website for the purpose of indexing by a computer program is not piracy, you have simply outsourced the scanning stage to someone else. It is only an issue if you download from some p2p protocol (such as a torrent) that also uploads and shares the book.
Because the authors were AI-fearful luddites. From "Book" to "Program that judges books" lies well beyond any argument that the use of the derivative work could supersede the original. It's such clear cut transformative use that the authors come across as grossly misinformed about copyright law as a whole.
Perhaps there is an argument for generative AI possibly superseding the original, in that people might start asking an AI to generate them stories "in the style of x" instead of buying the author's books, but this wasn't that. It was just some fun data analysis of books.
Summary: prosecraft.io counted word occurrences and presented statistics about them. I don't think you even need fair use for this, because this is something you obviously are allowed to do, without any permissions. This is not generative AI, this is old school statistics.
And then it sometimes presented a page worth of quoted text from a book. Which should fall under fair use.
> counted word occurrences and presented statistics about them. I don't think you even need fair use for this, because this is something you obviously are allowed to do, without any permissions
You're pretty much describing exactly what an LLM "learns" about text. I agree that it should obviously fall under fair use, but as the author of this article found out, there are quite a few who (very vocally) disagree.
I think there is a big difference in terms of data recovery though. You can't take a compression algorithm, for example, and claim that its "just some statistical analysis" when it can reproduce the original perfectly. Heck, even if it can reproduce it approximately, that's a lot different than what we see in this particular example, where the data could not be used to reproduce a text at all.
Hrm. It seems like the authors are caught up in things like "vividness" score and the "sentiment analysis" of the text; I guess because it's loosely related to AI?
But it seems like a bulk of the stats collected are things that I would find really useful. I've probably asked myself, "how many words are in this book" on 10+ separate occasions, both as a reader and as a writer.
It also seems like there were also counts of things like adjectives, verbs, adverbs, passive verbs, etc -- stats that I might want to know about a novel.
The bulk of the service seems rather "boring" and non-AI. Unfortunate that the whole thing was taken down because of a few features. Hopefully it'll come back.
For this particular example, the tool doesn't seem like it's a big deal. It just analyzes works for data. I'm not sure how this would be any different from a literary critic doing the same thing manually.
In general, though, I think artists would be less hostile to technological innovations if the people imploring them to "figure out how to embrace the technology rather than fear it" weren't actively trying to destroy their livelihoods, almost always without the slightest interest in helping them figure out the new economic situation. The attitude is, "It's the reality now, deal with it," all while enjoying the job security and high salaries of tech jobs. You can see the same attitude displayed when it comes to piracy: "too bad, deal with it, I have a good job, I don't care if you don't anymore."
This stuff would be received far better by the creative community if AI companies were to say, establish an artist sponsorship program, push for UBI, or otherwise show that they care even a tiny bit about the people they're making redundant.
I agree with you. There’s a pattern that I see a lot, of having:
1. large powerful players doing something not entirely helpful;
2. victims of that protesting that change vehemently; all that in vain because the players are powerful and have sheltered themselves from criticism, usually via lobbying;
3. regulatory capture or protests go after a smaller player, which is widely advertised to accuse 2. of going too far — even when the problem in 1. is still entirely there, and now ignored.
It’s definitely the case with globalization (large conglomerate benefit, people protest, and a small artisan who started selling abroad is featured being victimized by tariffs), fossil fuel (large oil extractor, climate advocate, farmer seeing fertilizer prize go up), immigration, American cultural hegemony, car dominance over cities, etc.
That pattern allows larger players still doing harm to wash their morals. I feel like we need better antibodies to say: No, this does not absolve them.
I am defining "push for UBI" as "actually do something to pressure the government" and not just state that a for profit business you've established is trying to accomplish that goal.
I will admit that I am mildly confused by this outrage, but it is X/twitter so the standards are different.
All that said, I remember doing basic text analysis in college and then sentiment analysis in my MBA class.. is the concern out there, because of how source material was acquired?
Not an artist myself, but this basic assumption in tech that you can just take somebody's shit without informing them, without permission, without compensation, without basic due diligence, and then go do whatever the hell you want with it needs to stop.
For the artists' sake but also for tech's sake. This model can't work, it's a complete dead-end that will wipe out livelihoods and culture.
But I can ensure you artists can/will be equally hypocritical themselves. Surely they've pirated themselves, removed paywalls from articles, blocked ads, borrowed the neighbor's Netflix account.
I think it applies to many technologies other than generative AI. How many devs actually think about ethics nowadays? I think it's all lost in the big companies they work for, behind the excuse that "it is not their job to figure out how their work is being used".
Interestingly, I think most devs would think twice before being paid for designing a missile. But somehow they don't really seem to think about the impact of work that is not obviously a weapon. Social network, Stable Diffusion, ChatGPT, SpaceX... everything disruptive has the potential to be very bad (I see a lot more harmful use-cases for ChatGPT than legit ones, but maybe that's just me). But somehow engineers seem to believe that it is not their problem.
Absolutely, and I think the recent Oppenheimer movie was an excellent take on this exact subject. At some point, you don't get to throw up your hands and say, "technology is just neutral” and absolve yourself of any responsibility for what you’ve put into the world.
My summary of the case: Someone did statistical analysis of a bunch of texts and created a tool that evaluates your text according to the developed model. Writers accused him of plagiarizing/using the content of their works.
Something that we need to learn is that these brief outbreaks on social media burn themselves out pretty quickly. Everyone shouts for a bit and then moves onto to the next bit of manufactured outrage.
However, the apology [0] says that the creator did not "intend" to participate in AI that can "create zero-effort impersonations of artists." I'm not sure if the wording is unintentionally vague, or if there is some way his project could be used in that way.
For what it's worth, the Computational Story Lab's hendometer [1] seems to have largely out-of-copyright books from Project Gutenberg, plus the Harry Potter series.
[0]: https://blog.shaxpir.com/taking-down-prosecraft-io-37e189797...
[1]: https://hedonometer.org/books/v3/863/
Edit: Apparently he was working on an LLM project. https://twitter.com/stealcase/status/1688721685585809408. It's unclear whether he was planning to use the books he scraped (although as @stealcase points out, GPT-Neox itself was trained on books that were pirated).
If he says he didn't intend to do that thing, this is still compatible with a later update that he didn't do that thing, but immediately dampens the outrage machine. The reader who knows nothing about either side relaxes -- "No need for me to get worked up, because we won". Conveniently, saying he didn't intend to do the thing is also compatible with a later reveal that he was doing the thing (perhaps for later release, since he wasn't clearly doing the thing here).
Therefore, regardless of whether he was doing what he was accused of doing, this is the lowest energy response, and probably the default unless this was the hill he wanted to defend.
Just ignore it. All of it. Outrage or not.
I see downvotes, but I mean it. You know who you listen to? Your friends. Your neighbours. Your local community. You listen to PEOPLE, not sockpuppets. You listen to legitimate human beings, not AI generated blather, or curated news stories, or groups working together to generate hate, outrage, to stoke anger, upset.
You listen to actual, real PEOPLE.
You want to go to reddit? Twitter? Anything? Fine. But treat it as 100% fiction, pure entertainment, and never let it affect YOU.
This seems FUDdy. "Intend" isn't in the apology at all, and the wording that is there says clearly that generative AI came after prosecraft, so there's no way the tool could be used for it.
> It's unclear whether he was planning to use the books he scraped
This also seems unwarranted. The tweet about fine-tuning an LLM came 5-6 years after the guy made prosecraft; why suggest they might involve the same dataset?
I do think “intend” is a reasonable paraphrase of “never wanted to.”
(Edited to add) I don’t think prosecraft was a finished project and he was definitely still working on his other tool for writers that incorporates some of the same tools.
> The tweet about fine-tuning an LLM came 5-6 years after the guy made prosecraft; why suggest they might involve the same dataset?
The reason being that he had mentioned he was planning to use the scraped books for future analysis.
The tool that book authors got outraged about only provides basic metrics, not dissimilar from other metrics such as "page count", and can't be used to produce new content which could deprive the book authors from revenue.
There are a handful (like < 3 people) who seem to understand what the site actually does who were still angry because the creator seems to have pirated the books. I actually don't know about the legality of something like that. Surely providing pirated books is illegal, but IDK if acquiring pirated books actually is.
I think it's clear though that most of the outrage would still be there even if the author had purchased each and every book.
Techdirt's analysis of the legality seems correct to me. TL;DR is that it seems legal.
Dead Comment
Yes, it will take effort and probably go slow, but if the tool is really useful and amazing, it should be doable.
I suspect the authors are put-off by a couple things:
- the text of the works scanned seems like it may be from pirated sources. That poisons the project, no matter what it does with the scans, for many authors.
- the use of these scans in a commercial product
The article itself is clueless… it doesn’t engage authors’ concerns at all, and just portrays authors as stoopid AI-fearful luddites.
If it's fair use, why should you have to do that? The same copyright law protecting author's ownership rights over their art also provide "fair use" to other people. Someone may disagree with current fair use laws (and I suspect many outraged here do not), but that's a broader issue not related to this particular tool. It just 100% seems like misdirected AI outrage.
> the text of the works scanned seems like it may be from pirated sources.
Do you have a source for this? I didn't see that mentioned in the article.
The person who runs prosecraft says "I looked to the internet for more text that I could analyze, and I used web crawlers to find more books." [0]
I'm just inferring, but if they had, say, purchased each of these books, or borrowed them from the library, or only sourced from sites that ensure the copyright is satisfied, then they might have mentioned it.
(FWIW, the blog post says the other source for the 25K works was their personal library, so I'm assuming the bulk of the 25K come from the internet, though I know some people have prodigious personal libraries.)
[0] https://blog.shaxpir.com/taking-down-prosecraft-io-37e189797...
You may not be legally required to do that, but it can be an excellent move that benefits you nonetheless.
Much like how Weird Al isn't legally required to get permission to make a parody of a popular song, but he does so anyway.
But in this case, I don't think you even need to invoke Fair Use. I think what he did simply isn't a copyright violation in the first place.
In reality, the legality of this was never the issue anyway. The issue was that doing this made the authors angry, and the dev didn't want that.
"How much of someone else's work can I use without getting permission?
Under the fair use doctrine of the U.S. copyright statute, it is permissible to use limited portions of a work including quotes, for purposes such as commentary, criticism, news reporting, and scholarly reports."
https://www.copyright.gov/help/faq/faq-fairuse.html
Limited portions, not the entire work.
Going off of some of the tweets about this that initially whipped up the outrage about this…it’s not like they were making a nuanced case about their concerns, they were basically just stomping their feet and shouting.
If your engagement only reaches the level of twitter, you aren't really engaging at all.
Deleted Comment
"This kind of thing" is factual information about the book, such as page or word count, ly-adverb count, etc. Small snippets, something permissible under copyright law today, that were heavily editorialized and commented on were displayed.
To suggest that counting words and pages is something that should not be allowed is silly.
> The article itself is clueless…
Says the person making stuff up to force a narrative.
The person doing this had the rights to do this, and was very clearly within his rights to do this under copyright law. Counting words is not a crime.
The authors quotes speak for themselves. They very clearly and ignorantly claimed that this was an "AI training project" when it was nothing of the sort.
https://twitter.com/scumbelievable/status/168915466478730444...
So the two authors who are gloating about "killed that stupid fuckin AI thing" - I'm supposed to be engaged with their concerns ? Please.
If you are saying it should be opt-in only for the pages analyzing specific books, like the instigator of this outrage screen-shotted, well that seems to fall squarely into the critical analysis bucket, so that is also quite ridiculous.
I understand some folks being unhappy that a portion of the works were pirated, but it seems like most of the outraged would be outraged even if he personally purchased each and every ebook.
Also, if you read through the Twitter thread a lot of the authors (not 100%, but a LOT) are doing a really great job portraying themselves as "stoopid AI-fearful luddites". Many of them think the site is somehow like ChatGPT and they don't bother to dig any deeper, or really at all.
Or, if you do it, do it privately and don't share it on the internet?
I'm not sure why this is a difficult idea; if asking for something and getting permission to do it is so difficult that 'would just not work. Publishers would never opt-in to it'
...then, it seems really obvious that even if you want to do it, technically can do it and you could maybe make a legal argument to doing it doesn't violate any laws...
...why would you do it? Why would you post about doing it?
Come on, that's literally being a selfish dick; spitting in people's faces and waving a 'too bad, you can't sue me' flag.
There are so many things, so many mannnny things that you could work on, why would you choose to pick something that you knew would upset people and you knew you wouldn't get permission to do if you asked?
When you publish a book you “consent” to the fact that people are going to take it apart, talk about it, review it, quote from it, and yes run statistics on it. If an author doesn’t want that to happen then they shouldn’t publish a book. Just keep it private, only distribute it to people you trust after they sign an NDA.
As far as anyone knows, no piracy has occurred. In the US you are allowed to scan books, index them, and post excerpts - it’s called Google Books and there was a big case that affirmed that it is legal. Downloading a book from a pirate website for the purpose of indexing by a computer program is not piracy, you have simply outsourced the scanning stage to someone else. It is only an issue if you download from some p2p protocol (such as a torrent) that also uploads and shares the book.
Perhaps there is an argument for generative AI possibly superseding the original, in that people might start asking an AI to generate them stories "in the style of x" instead of buying the author's books, but this wasn't that. It was just some fun data analysis of books.
And then it sometimes presented a page worth of quoted text from a book. Which should fall under fair use.
https://blog.shaxpir.com/taking-down-prosecraft-io-37e189797...
You shouldn't, at least for posting basic statistics. They're facts, not copyrightable.
You're pretty much describing exactly what an LLM "learns" about text. I agree that it should obviously fall under fair use, but as the author of this article found out, there are quite a few who (very vocally) disagree.
I think that using an LLM to get insights on the text should be ok, it's the generation part that scares them. probably rightly so.
But it seems like a bulk of the stats collected are things that I would find really useful. I've probably asked myself, "how many words are in this book" on 10+ separate occasions, both as a reader and as a writer.
It also seems like there were also counts of things like adjectives, verbs, adverbs, passive verbs, etc -- stats that I might want to know about a novel.
The bulk of the service seems rather "boring" and non-AI. Unfortunate that the whole thing was taken down because of a few features. Hopefully it'll come back.
In general, though, I think artists would be less hostile to technological innovations if the people imploring them to "figure out how to embrace the technology rather than fear it" weren't actively trying to destroy their livelihoods, almost always without the slightest interest in helping them figure out the new economic situation. The attitude is, "It's the reality now, deal with it," all while enjoying the job security and high salaries of tech jobs. You can see the same attitude displayed when it comes to piracy: "too bad, deal with it, I have a good job, I don't care if you don't anymore."
This stuff would be received far better by the creative community if AI companies were to say, establish an artist sponsorship program, push for UBI, or otherwise show that they care even a tiny bit about the people they're making redundant.
1. large powerful players doing something not entirely helpful;
2. victims of that protesting that change vehemently; all that in vain because the players are powerful and have sheltered themselves from criticism, usually via lobbying;
3. regulatory capture or protests go after a smaller player, which is widely advertised to accuse 2. of going too far — even when the problem in 1. is still entirely there, and now ignored.
It’s definitely the case with globalization (large conglomerate benefit, people protest, and a small artisan who started selling abroad is featured being victimized by tariffs), fossil fuel (large oil extractor, climate advocate, farmer seeing fertilizer prize go up), immigration, American cultural hegemony, car dominance over cities, etc.
That pattern allows larger players still doing harm to wash their morals. I feel like we need better antibodies to say: No, this does not absolve them.
Sam Altman, for all his faults, is actually a massive proponent of UBI. I mean, that was one of the claimed objectives of Worldcoin (though he advocates for UBI in general: https://thewalrus.ca/will-universal-basic-income-save-us-fro... )
All that said, I remember doing basic text analysis in college and then sentiment analysis in my MBA class.. is the concern out there, because of how source material was acquired?
Not an artist myself, but this basic assumption in tech that you can just take somebody's shit without informing them, without permission, without compensation, without basic due diligence, and then go do whatever the hell you want with it needs to stop.
For the artists' sake but also for tech's sake. This model can't work, it's a complete dead-end that will wipe out livelihoods and culture.
But I can ensure you artists can/will be equally hypocritical themselves. Surely they've pirated themselves, removed paywalls from articles, blocked ads, borrowed the neighbor's Netflix account.
Interestingly, I think most devs would think twice before being paid for designing a missile. But somehow they don't really seem to think about the impact of work that is not obviously a weapon. Social network, Stable Diffusion, ChatGPT, SpaceX... everything disruptive has the potential to be very bad (I see a lot more harmful use-cases for ChatGPT than legit ones, but maybe that's just me). But somehow engineers seem to believe that it is not their problem.
I wonder if similar language exists in other copyright systems, but I would imagine it is likely the opposite...
Because one thing about the generative models is that you could in theory get the model to recite copyrighted work, word by word.
Always feel bad for people who cave to the mob, usually if the mob is yelling at you you’re on the right track