Readit News logoReadit News
qup commented on Meta torrented & seeded 81.7 TB dataset containing copyrighted data   arstechnica.com/tech-poli... · Posted by u/gameshot911
Timon3 · a year ago
Then they'd already be smaller, so there's no reason to make them smaller. Or am I misunderstanding your question?
qup · a year ago
Okay, they would be smaller, but you said "big corporations should not be able to exist" and they would already be a big corporation with just search--they started this way.

Or, just to follow it through, let's say "WidgetBoss LLC" makes a new Widget that every single human has to have, they become the biggest company ever by making one widget. What will you do to make them smaller? Why?

I have a big problem with Google & Meta, and I can understand arguments about those companies. But not just "big companies" as a generality.

But that's how everyone speaks now. "Literally every billionaire is evil and exploiting blah blah blah"

qup commented on Meta torrented & seeded 81.7 TB dataset containing copyrighted data   arstechnica.com/tech-poli... · Posted by u/gameshot911
rcMgD2BwE72F · a year ago
>What are we actually worried about happening?

Few company can amass such quantities of knowledge and leverage it all for their own, very-private profits. This is unprecedented centralization of power, for a very select few. Do we actually want that? If not, why not block this until we're sure this a net positive for most people?

qup · a year ago
Meta open-sourced it my guy
qup commented on Meta torrented & seeded 81.7 TB dataset containing copyrighted data   arstechnica.com/tech-poli... · Posted by u/gameshot911
volkk · a year ago
> Are AI-written books getting published?

actually i think they are. lots of e-book slop

> If they start out-competing humans, is that bad?

Not inherently, but it depends on what you mean by out-competing. Social media outcompeted books and now everyone's addicted and mental illness is more rampant than ever. IMO, a net negative for society. AI books may very well win out through sheer spam but is that good for us?

qup · a year ago
Nobody has responded to me with anything about how authors are harmed, so I don't really get who we're protecting here.

It feels more like we just want to punish people, particularly rich people, particularly if they get away with stuff we're afraid to try.

qup commented on Meta torrented & seeded 81.7 TB dataset containing copyrighted data   arstechnica.com/tech-poli... · Posted by u/gameshot911
TeMPOraL · a year ago
The right to read and study you have by default. It's getting your hands on a book that has legal caveats attached.
qup · a year ago
Yes, but getting your hands on the material isn't a very interesting legal question IMO.

Whether you can train your LLM on it is a very interesting question.

I've personally never been in favor of punishing people for downloading (or seeding) things.

qup commented on Meta torrented & seeded 81.7 TB dataset containing copyrighted data   arstechnica.com/tech-poli... · Posted by u/gameshot911
sanderjd · a year ago
I think the concern goes to the point of copyright to begin with, which is to incentive people to create things. Will the inclusion of copyrighted works in llm training (further) erode that incentive? Maybe, and I think that's a shame if so. But I also don't really think it's the primary threat to the incentive structure in publishing.
qup · a year ago
> the point of copyright to begin with, which is to incentive people to create things

Is it?

(I don't agree)

qup commented on Meta torrented & seeded 81.7 TB dataset containing copyrighted data   arstechnica.com/tech-poli... · Posted by u/gameshot911
jjmarr · a year ago
> Are AI-written books getting published?

Yes, online bookstores are full of them:

https://www.nytimes.com/2023/08/05/travel/amazon-guidebooks-...

The issue is there's an asymmetry between buyer/seller for books, because a buyer doesn't know the contents until you buy the book. Reviews can help, but not if the reviews are fake/AI generated. In this case, these books are profitable if only a few people buy them as the marginal cost of creating such a book is close to zero.

qup · a year ago
This really has fuck-all to do with copyright though, correct?

If you can't tell how the content is before you read it, it could be written by a monkey.

qup commented on Meta torrented & seeded 81.7 TB dataset containing copyrighted data   arstechnica.com/tech-poli... · Posted by u/gameshot911
timeon · a year ago
Prosecutors filed for Swartz 50 years of imprisonment and $1 million in fines.

Can you calculate how many years that would be for Mark and his people?

qup · a year ago
I ran it, it came out to zero
qup commented on Meta torrented & seeded 81.7 TB dataset containing copyrighted data   arstechnica.com/tech-poli... · Posted by u/gameshot911
nyoomboom · a year ago
Remembering Aaron Swartz in this moment
qup · a year ago
Would Aaron have preferred us to download the material and train the AI?
qup commented on Meta torrented & seeded 81.7 TB dataset containing copyrighted data   arstechnica.com/tech-poli... · Posted by u/gameshot911
pseudalopex · a year ago
A model is not a backup.
qup · a year ago
Then why are we mad about the copyright stuff?
qup commented on Meta torrented & seeded 81.7 TB dataset containing copyrighted data   arstechnica.com/tech-poli... · Posted by u/gameshot911
gameshot911 · a year ago
Critically, by torrenting they also directly distributed the copywritten material itself. That is a standalone infringement separate from any argument about trained LLMs.
qup · a year ago
And punishing them in the normal manner will be an incredibly small slap on the wrist, and do absolutely nothing to help us find out what will play out in court regarding a fair-use defense on training AI with copyrighted material.

u/qup

KarmaCake day2452November 23, 2022View Original