Facebook is going after LLaMA repos with DMCA's

As the weights have leaked anyway;

one great move that FB could have done would be to ride the wave of positive PR + get all investors hyped:

---> "Meta is a credible alternative to OpenAI, the company is switching from "Meta"-bullshit to an "IA"-first company",

and get the investors to pump the Meta stock,

and then dilute some of the shares to raise some cash (or issue new shares to newly specialized IA hires).

But no, FB is still going after the VR gimmicks and NFTs.

4 billion USD per quarter wasted on Oculus (!), while they could use this money to fund and support a whole ecosystem around LLaMA.

pavlov · 2 years ago

Is IA something different from AI? Or maybe just the French version ("intelligence artificielle" I imagine...)

My feeling is that a lot of Meta's AI/ML work actually ties into the AR/VR long-term dream. How do you make the so-called metaverse alive? By having people design it themselves. They're not going to do that in Maya, that's for sure. But if they could create virtual spaces and virtual people with Holodeck-style natural language instructions...

rvnx · 2 years ago

I apologize, my French accent broke into the wild.

Yes, it's totally AI, and IA is the French version :)

joseda-hg · 2 years ago

I've been known to mix both when switching context between Spanish and English

Programmers in particular tend to overuse anglicisms, and often I end up mentally doing spanglish in my head,

"Necesito este value" "Este command deberia hacer esto" "Este iterator va a hacer $something a este object"

LtdJorge · 2 years ago

Latin languages use IA

vanilla_nut · 2 years ago

Maybe they mean "information architecture." Facebook does produce Docusaurus, after all, one of the best documentation platforms out there today :)

giancarlostoro · 2 years ago

Could just be a small typo too

erikstarck · 2 years ago

Intelligence Augmentation?

marktangotango · 2 years ago

IMO it highlights just what a commodity they weights are. If all one needs is the weights to reproduce the work, then where is the value? I mean there is very little moat here. Further, what does it say about consciousness and individuality if we all are simply the values of the weights in our wet neural networks? Or whatever the biological equivalent is?

TeMPOraL · 2 years ago

There's nothing "simply" here. The weights in question are a particular configuration of several gigabytes worth of data. They're not random. Getting anything comparable by randomly generating a number this long is a "total atoms in the universe to the power of total atoms in the universe" kind of a deal.

In abstract terms, those weights are by far the most dense form of meaning we've ever dealt with.

nico · 2 years ago

That’s like saying that creating a song or a movie has no value because after they are created anyone can download a file with it.

However it does open up a point which is: should we allow people to make huge amounts of money by infinitely copying and distributing their own work? Should we be protecting this model?

akiselev · 2 years ago

Nothing. It says nothing about consciousness and individuality. Numerical weights and physical neurons aren’t even remotely comparable.

thfuran · 2 years ago

Unless you believe that you think with your soul or something, what else could you be other than your quantum state, or some close-enough compressed equivalent?

FormerBandmate · 2 years ago

LLMs are very good imitations of what we expect from intelligent people. They are not intelligent themselves.

Not only do they have no street smarts whatsoever, but their book smarts start to disappear when you deviate from the training data.

Madmallard · 2 years ago

Uh the structure of the neural network matters just as much as the weights. Everyone has a different structure to their neurological arrangement.

bitL · 2 years ago

They could do what Microsoft used to do in the 90s-00s, make pirated Windows/Office available (by turning blind eye on private users) so that they capture and keep the mindshare.

IncRnd · 2 years ago

Facebook, which ran over the law in order to be successful now uses the law in the exact opposite sense. Ultimately, it's greed, "What is good for me to do to you is not good for you to do to me."

throwaway1777 · 2 years ago

Actually I think nfts are wound down https://www.theverge.com/2023/3/13/23638572/instagram-nft-me...

giancarlostoro · 2 years ago

Zuck really thinks he can build the Ready Player One reality I guess.

echelon · 2 years ago

Apple, Steam, Epic Games, and VRChat will kill any chance Meta has at attaining VR monopoly.

api · 2 years ago

Someone should tell him (and all the other metaverse people) that VR is almost always dystopian. It's what everyone gets sucked into when civilization stagnates and there is no opportunity, no culture, and nowhere to go. It belongs in worlds of Malthusian collapse, after a nuclear war, or where inequality is so high the majority of people have reverted to high-tech medieval peasants.

VR has a legitimate niche in gaming but outside that it's just not appealing. It's dystopian and depressing. Nobody wants to spend time in a social network with a helmet on their head being served ads.

curiousllama · 2 years ago

That 4B # encompasses AI and Metaverse. Hard to tell where the $ is actually going (plus, metaverse requires AI advancement).

That’s one big reason the stock is up: $4B in the metaverse is nonsense. $4B in AI, though? Transformational

alibero · 2 years ago

It seems like VR is less than half of the investment by RL. In Meta's 2022 annual report, they say "Many of our metaverse investments are directed toward long-term, cutting edge research and development for products that are not on the market today and may only be fully realized in the next decade. This includes exploring new technologies such as neural interfaces using electromyography, which lets people control their devices using neuromuscular signals, as well as innovations in artificial intelligence (AI) and hardware to help build next- generation interfaces. ... *in 2023, we expect to spend approximately 50% of our Reality Labs operating expenses on our augmented reality initiatives, approximately 40% on our virtual reality initiatives, and approximately 10% on social platforms and other initiatives.*"

I'm not sure if Horizon falls into "virtual reality" or "social platforms" but it seems to be the latter: "For example, we have launched Horizon Worlds, a social platform where people can interact with friends, ..."

DesiLurker · 2 years ago

there is a saying in Sanskrit that goes.. "the laziness of a Lion is the security for the rest of jungle".

Let 'em burn it all.

sheepscreek · 2 years ago

Exactly. Going after the folks that are basically helping improve your credibility is plain stupid.

This could have been their “contribution to the society” - yielding much better PR than they could have ever hoped for.

Is there precedent on model weights being copyrightable in the first place? I suppose the recipients of the DMCA notices are unlikely to be willing to contest it in court, though.

papercrane · 2 years ago

It's an interesting legal question. US copyright is based around expressiveness and originality (which is why phone books and IBM's logo are not copyright protected.)

An argument might be made that the curation of data that goes into the training set qualifies, but it might depend on how much expressiveness and originality went into the curation.

For example, I could see a court ruling that the weights for a model trained on "all the good music from the 70s" is copyrightable, as someone had to express what they believed was "good" music, but a model trained on a large percentage of the internet without much curation would not.

Of course, nobody really knows until the courts weigh-in on it.

echelon · 2 years ago

If model weights become non-copyrightable, it'll lead to an incredible shift in the industry.

When model weights leak, anyone can pick them up and run with them. It's not like code, where you have to set up an entire bespoke infrastructure, microservices, data dependencies, etc. Models are crystalized, perfectly distilled functionality with a single interface.

You'll start to see more leaks, companies building off the work of other companies, etc. Part of me thinks this would lead to faster, more distributed innovation.

freejazz · 2 years ago

Phone books are copyright protected, https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....

"In regard to collections of facts, O'Connor wrote that copyright can apply only to the creative aspects of collection: the creative choice of what data to include or exclude, the order and style in which the information is presented, etc.—not to the information itself."

Here, the weights are also not even facts.

maxwell · 2 years ago

> IBM's logo

Am I missing a joke...?

https://www.ibm.com/legal/copytrade

2OEH8eoCRo0 · 2 years ago

Model weights seem akin to publishing a decompiled binary.

archontes · 2 years ago

It's almost certainly true that the collection of "all the good music from the 70s" is copyrightable as a collection, but that doesn't make the weights the result of a creative process.

nawwal · 2 years ago

“ Of course, nobody really knows until the courts weigh-in on it.”

I see what you did there ;)

sillysaurusx · 2 years ago

If anyone's willing to fund the legal battle, let me know. (You can DM me on Twitter: https://twitter.com/theshawwn)

I'd be willing to issue a DMCA counterclaim for llama-dl on the grounds that model weights are not copyrightable. If it's worth settling the question in court, then this seems like a good opportunity.

I wrote more about this further downthread: https://news.ycombinator.com/item?id=35288415

ISL · 2 years ago

If Meta has registered the work with the copyright office, the statutory damages in this case, should llama-dl lose, might be quite large.

Check in with an attorney before launching a battle with an opponent who has unlimited resources. There are likely to be many similar test cases in the coming year, perhaps more-readily fought.

feanaro · 2 years ago

If weights are copyrightable, then models should need to obtain permission before using copyrighted works to train them. You can't have it both ways.

danieldk · 2 years ago

IANAL, so this is not legal advise. I consulted a legal expert a few years ago about the status of machine learning models and they said it is really unclear. Apparently, if works are transformed enough that the original is not recognizable anymore it may not violate copyright. It hinges a lot on whether the original work is reproduced, so if you could get an LLM to spit out copyrighted texts unmodified, then it would most likely be copyright violation. But I think that doesn't really happen much in practice.

On the other hand, Meta can have copyright over the model through 'copyright in compilation', which protects compiled works, regardless of the copyright of the underlying material.

So, I fear that it may be possible to have it both ways. But realistically, I think we'll only know for sure when this is fought out in court.

Disclaimer: again I am not a lawyer, so take this with a grain of salt.

ISL · 2 years ago

Encyclopedias and journalism are copyrightable even when they include articles about copyrighted works (books, films, etc.).

JeremyNT · 2 years ago

It seems intuitively kind of bonkers that they can ignore copyright when generating weights using stuff on the internet, and then turn around and claim copyright on the resulting artifacts.

mdanger007 · 2 years ago

Tech companies want it both ways. 1. They own the rights to any user content they store and transmit. 2. They are shielded by Section 230, and immune to liability for its misuse.

DesiLurker · 2 years ago

IDK, its more like finding recipes to many great restaurant chains all mushed together by a 5th grader whose uncle stole it from them, on the sidewalk. looks like a grey area to me legally but IANAL.

loudmax · 2 years ago

A particularly thorny question if data in the training set is copyrighted. Which I presume much of it is, since I was playing with alpaca.cpp and it answered all kinds of pop culture questions I asked it (to varying degrees of accuracy).

kmeisthax · 2 years ago

Yes, but Meta doesn't own that data.

They only own the arrangement of what is and isn't in the training set, insamuch as that training set represents human creativity. The process of training model weights is itself purely mechanical.

The closest that they could get would be trade secrecy violations, but that only punishes the original leaker and anyone working in concert with them. I'm not sure if anyone's successfully managed to get an entire BitTorrent swarm to be considered misappropriating trade secrets. Presumably at some point, when the trade secret has been violated, you can obtain it without misappropriating - otherwise, how does that not just become Copyright 2.0?

ineptech · 2 years ago

Copyright-wise, why would the weights be any different from a video file? At the end of the day, LLaMa's weights and fast_and_furious_11.mp4 are both just strings of binary data that some company made to sell.

MrsPeaches · 2 years ago

If it’s a specific list, then I would guess so. But it may be that the numbers themselves aren’t copyrightable.

In the same way that a list of ingredients can/can’t be copyrighted.

mojosam · 2 years ago

Some degree of “human authorship” is a requirement for copyright. I don’t see how weights generated by training an AI would be protected.

MrsPeaches · 2 years ago

If it’s a specific list, then I would guess so. By it may be that the numbers themselves aren’t copyrightable.

In the same way that a list of ingredients can/can’t be copyrighted.

contravariant · 2 years ago

This comment is a list of numbers represented as ASCII and is copyrightable.

bottled_poe · 2 years ago

Doesn’t matter. The weights wouldn’t exist without many tens of millions (at least) in investment. The outcome seems very clear.

papercrane · 2 years ago

US copyright law cares about originality and expressiveness, not labour and cost. It can be very expensive to collect and print a list of every business and their phone number in a book, but the result is not copyrightable in the US.

pygy_ · 2 years ago

IANAL, but I don't think the DMCA applies. Copyright applies to creative work, and model weights aren't that.

It is however my understanding that downloading them can be considered a misappropriation of a trade secret.

Folks that would contest the bogus DMCA takedown requests would be liable to a trade secret suit.

postalrat · 2 years ago

If you are mining Bitcoin can you copyright a block you discover? You probably payed a lot of money for it.

blackbear_ · 2 years ago

Don't forget that transformers, the basic architecture of these large language models, were introduced by Google, and many other basic building blocks were introduced by universities and research centers all over the world. So no, the outcome is not very clear, at all.