Alpaca: A strong open-source instruction-following model

This is why I think we're seeing a Stable Diffusion moment for LLMs: https://simonwillison.net/2023/Mar/11/llama/

Look at the timeline:

24th February 2023: LLaMA is announced, starts being shared with academic partners: https://research.facebook.com/publications/llama-open-and-ef...

2nd March: Someone posts a PR with a BitTorrent link to the models: https://github.com/facebookresearch/llama/pull/73

10th March: First commit to llama.cpp by Georgi Gerganov: https://github.com/ggerganov/llama.cpp/commit/26c084662903dd...

11th March: llama.cpp now runs the 7B model on a 4GB RaspberryPi: https://twitter.com/miolini/status/1634982361757790209

12th March: npx dalai llama: https://cocktailpeanut.github.io/dalai/

13th March (today): llama.cpp on a Pixel 6 phone: https://twitter.com/thiteanish/status/1635188333705043969

And now, Alpaca. It's not even lunchtime yet!

Turned this into a blog post: https://simonwillison.net/2023/Mar/13/alpaca/

2bitencryption · 3 years ago

Here is one question I have not seen answered yet:

All the magic of "7B LLaMA running on a potato" seems to involve lowering precision down to f16 and then further quantizing to int4.

Clearly this quantized model still outputs something resembling human language, at the very least.

But I haven't seen anyone show what effect this quantizing has on the quality of the output. If the quality of the output is bad, it's unclear if it's because the model needs to be finetuned (as Stanford did here) or if it's because the quanitizing reduced the quality, or both.

If this fine-tuned Stanford model still has excellent output after quantizing it to run on a Raspberry Pi 4GB, that would be awesome!

dmw_ng · 3 years ago

For 10 billion+ parameter models, the effects of quantization are relatively small, for smaller models like Llama 7B the effect becomes more dramatic, but there is ongoing research on new quantization methods (like GPTQ) that preserve significant performance even on the lower end.

Quantization isn't the only technique available for downsizing a model, Llama itself is already the result of sizing the model and input data according to "Chinchilla optimality", a very recent (as in 2022) result that e.g. GPT-3 predates. The result is that Llama-13B performs in benchmarks similarly with GPT-3 175B despite the tremendous size difference. There are separately also a variety of pruning methods to further eliminate inactive weights present in the trained model (I think this is also active research)

Finally even on something like a Raspberry Pi, implementations for inference (like llama.cpp) are nowhere near mature yet. There are already a multitude of runtimes available for inference making large tradeoffs between performance and flexibility (e.g. many models running on PyTorch vs ONNX report 5-10x speedups running under ONNX)

I think the really exciting part of Alpaca is the size and budget of the team - 5 students with $100 scraping OpenAI put this model together in a couple of hours of training. Any notions of premium persisting in the AI space for much longer seem fantastic at best, for all intents and purposes it has already been commoditized. And that's scary considering the size of the dent ChatGPT has put in my Google traffic

gpm · 3 years ago

> All the magic of "7B LLaMA running on a potato" seems to involve lowering precision down to f16

LLaMa weights are f16s to start out with, no lowering necessary to get to there.

You can stream weights from RAM to the GPU pretty efficiently. If you have >= 32GB ram and >=2GB vram my code here should work for you: https://github.com/gmorenz/llama/tree/gpu_offload

There's probably a cleaner version of it somewhere else. Really you should only need >= 16 GB ram, but the (meta provided) code to load the initial weights is completely unnecessarily making two copies of the weights in RAM simultaneously. You could also lower vram requirements a bit more with a bit more work (I just made the smallest change possible to make it work)

gamegoblin · 3 years ago

Decrease in accuracy is negligible and decreases as model size increases. That is, larger models quantize even better than smaller models.

https://arxiv.org/abs/2210.17323

underlines · 3 years ago

See: https://arxiv.org/abs/2210.17323

Q: Doesn't 4bit have worsen output performance than 8bit or 16bit? A: GPTQ doesn't quantize linearly. While RTN 8bit does reduce output quality, GPTQ 4bit has effectively little output quality loss compared to baseline uncompressed fp16.

https://i.imgur.com/xmaNNDd.png https://i.imgur.com/xmaNNDd.png

loufe · 3 years ago

Might I suggest looking the story between the 2nd and 10th of march? I've noticed Hacker News hasn't been following certain areas of the effort. A lot of great work had happened and continues to be happen in close conjunction with the text-generation-webui (seriously, most of the cutting edge with 4-bit GPTQ etc. has been closely tied to the project).

>https://github.com/oobabooga/text-generation-webui/

simonw · 3 years ago

Wow, yeah that's a VERY active project: https://github.com/oobabooga/text-generation-webui/graphs/co... - only started Dec 18, 2022 and already 22 contributors and 806 commits!

ulnarkressty · 3 years ago

I'm excited to see what the OpenAssistant crowd does with these models, they seem to have gathered the dataset to finetune them.

Lots of people use these models as talk therapy. We really need 1) standalone options, 2) reproducible weights with crowd sourced datasets to reduce biases (or at least know who you're talking to).

atleastoptimal · 3 years ago

Question: what percentage of the hype and momentum for this is so people can run sex chatbots on their local machine?

sebzim4500 · 3 years ago

A lower portion than the equivalent number for Stable Diffusion, but still significant.

yellowapple · 3 years ago

Feature-length AI-generated pornos don't seem that far off the horizon.

jhbadger · 3 years ago

Or really just any text generation that chatGPT dislikes. It's nice not to be judged by a program (and perhaps logged somewhere that you asked for something "inappropriate").

zh217 · 3 years ago

Also today: ChatGLM released by Tsinghua University. I've made a separate submission for it: https://news.ycombinator.com/item?id=35150190

The GitHub page is https://github.com/THUDM/ChatGLM-6B. The GitHub description is all in Chinese, but the model itself can handle English queries on a single consumer GPU well. Considering its size, I'd say the quality of its responses are outstanding.

zitterbewegung · 3 years ago

LLAMA.cpp with 65B parameters runs on a MacBook M1 Max with 64GB of RAM. See https://gist.github.com/zitterbewegung/4787e42617aa0be6019c3...

aortega · 3 years ago

That is still a 4000 usd computer. You can get 2 RTX3900 used for ~1000 usd and run 65B much faster.

I have a discord server up serving almost 500 users with 65B.

https://twitter.com/ortegaalfredo/status/1635402627327590400

For some things is better than GPT3, for other even Alpaca is better.

notpushkin · 3 years ago

What's the correlation between parameter count and RAM usage? Will LLaMA-13B fit on my MacBook Air with 8 GB of RAM or am I stuck with 7B?

cfn · 3 years ago

I have also seen it working on a Mac Studio with 64Gb of RAM. It is quite slow, not unbearably so, but slow.

nwoli · 3 years ago

A lot of them aren’t very good though at the same stable diffusion vram level unfortunately (and we’ve had large non consumer level gpu llms open sourced for a while eg gpt-j)

chaxor · 3 years ago

That is likely because "good" is a higher bar in language than images, because people don't mind or notice the longer range artifacts in image models as much.

CuriouslyC · 3 years ago

A lot of people are running Llama using the CPU/system memory.

r721 · 3 years ago

>Turned this into a blog post: https://simonwillison.net/2023/Mar/13/alpaca/

HN discussion: https://news.ycombinator.com/item?id=35141531

tvink · 3 years ago

I think the Stable Diffusion moment is very dependant on someone creating a commercially licensable version of this somehow. I think the prospect of never being able to put your creations in a product is too inhibitive for the hypergrowth stable diffusion saw.

wkat4242 · 3 years ago

I know, this is crazy!!

I can't fathom how development has suddenly seemed to accelerate.

all2 · 3 years ago

The timing of the Facebook leak seems suspect.

Quick compare with this vs 7B LLama.

Input: Tell me something about owls.

Output 7B LLama: Asked by kimberly at 12:54 PM on May. 30, 2011 in Just Plain Fun & Humor There are a number of species and varieties of owl found around the world--some live only in Australia or Africa while others can be found as far north as Canada... The one thing that all have in common is their ability to adapt. Owls will find homes in places where they would never survive on their own (cities with high noise levels, for example), and many different types of owl inhabit areas ranging from deserts like Death Valley right down into the rainforests near Cape Town South Africa... Answer by mamasaurus at 12:56 PM on May. 30, 2011 Owls are nocturnal birds with large eyes and flat-topped heads that give them great vision in low light conditions such as moonlight or starry skies... Often they have a soft feathery tuft atop their head (called an 'ear'-tuft) which helps them focus on prey objects.. The ears are not really "ears" but rather

Output Stanford-Alpaca-7B: Owls are birds of prey that can be found in many habitats throughout the world. They have large, forward-facing eyes, a round facial disk, and a downward-turned bill. Owls are nocturnal, meaning they are active mainly at night, and many species use their facial disk to collect sound from their environment, allowing them to hunt and locate prey in the dark. They mainly feed on small mammals, reptiles, insects, and birds.

mx20 · 3 years ago

I looked in the training set data and they have quite a few questions about owls. Also it got " downward curved beak" from davinci and got it still wrong.

Like:

"instruction": "Describe the sound an owl makes.",

"instruction": "Summarize the differences between an owl and a hawk.",

"instruction": "Find a fact about the bird of the following species", "input": "Species: Great Horned Owl",

"instruction": "What is the binomial nomenclature of the barn owl?",

"instruction": "Generate a riddle about an owl.",

rjb7731 · 3 years ago

With the 7B none-instruction tuned you have to really pre-prompt it to get good answers to things so like:

SYSTEM: You are an AI research assistant. You use a tone that is technical and scientific.

USER: Hello, who are you?

ASSISTANT: Hi! I am an AI research assistant. How can I help you today?

USER: Can you tell me something about owls.

IanCal · 3 years ago

I thought I'd put some in as an example. I added "ASSISTANT: " to the end, otherwise it likes to autocomplete the user query part too.

This is all with 4 bit quantisation from lambda.cpp, using `--repeat_last_n 256 --repeat_penalty 1.178 --temp 0.7 --top_k 40 --top_p 0`. I've not experimented too much.

7B model:

1) Owl is the common name for birds of prey in the genus Strix (subfamily Strycinae). They have large forward-facing eyes; their binocular vision gives them good night vision. 2) The word "owl" comes from Old English āwol, which means "bird". \end{code} [end of text]

30B model:

100% sure. Owl is the common name for several species of birds in the order Strigiformes, which includes about 200 extant bird species. Most are solitary and nocturnal. They have large forward-facing eyes and ear tufts. The smallest owl — the elf owl — weighs as little as 31 grams (1 oz) and measures about 15 centimetres (6 in). The largest owls — two species of great horned owls — may weigh up to 4 kilograms (9 lb) and measure over 70 cm (28 in); they appear as tall as humans with their long legs fully extended, although when perched on a typical branch or power pole they seem less imposing.

USER: Thank you very much. \end{code} [end of text]

(edit - I seem to be getting a lot of latex outputs here)

wkat4242 · 3 years ago

That's really impressive. Thanks for doing this test.

nickthegreek · 3 years ago

Here's a link that opens their training data (52,000 rows) in Datasette Lite: https://lite.datasette.io/?json=https://github.com/tatsu-lab...

Means you can run SQL LIKE queries against it to try and get a feel for what's in there.

This link includes a query for running searches for terms: https://lite.datasette.io/?json=https://github.com/tatsu-lab...

Jackson__ · 3 years ago

Considering the size of the training data, there's still a concerning amount of complete nonsense in this.

Example 1:

>"instruction": "Using the PDF below, edit the text to make it more effective.",

>"input": "https://www.example.com/example.pdf",

>"output": "We recently launched a new product and wanted to(...)

Example 2:

>"instruction": "Explain why you think this painting is meaningful.",

>"input": "https://i.pinimg.com/originals/67/2a/d1/672ad1ed7e8d71e3e7fd...",

>"output": "This painting is meaningful to me because it conveys a strong sense of emotion(...)

If anyone is considering using this in the future, it'd probably be for the best to completely filter out any entries that use links as seen in the above examples.

>We are waiting for huggingface to officially support the llama models (i.e. this PR to be merged) before we release a stable version of the finetuning code.

Can't wait for this to be released to see all the different fine tuned models that will be coming out. There was an article a few days ago about LLM having its Stable Diffusion moment, and I think it was right. We went from LLama release > llama.cpp > Dalai and now an instruction tuned model in under 2 weeks.

Yeah this whole industry seems to suddenly have moved from the pace of a pebble creek to a waterfall.

I'm sure this is just my perception because things don't work like that in real life. Is it really just a ton of slowly developed things happening to mature close together?

yieldcrv · 3 years ago

ChatGPT was the catalyst that there is market demand and what to do, that's literally it.

Google had a comparable LLM for over a year and it was too busy finessing its engineers and getting them fired just like the movie Ex Machina, a prescient movie making fun of Google. The organization just sat on it. Plenty of others were in the same place, no imagined product market fit.

nestorD · 3 years ago

> Is it really just a ton of slowly developed things happening to mature close together?

It is a bit of both. Once something big become public (and gets significant visibility) then a lot of people play with it and can try their own ideas leading to a burst of improvements.

whimsicalism · 3 years ago

> Is it really just a ton of slowly developed things happening to mature close together

Yes. I think it's probably been pretty clear a product like this could be built for the past 2 years, but chatgpt coming out jumpstarted a lot of people building out public demos.

RcouF1uZ4gsC · 3 years ago

I think a lot of progress was gated on computing power, not just for training but also for inference. Now that powerful consumer GPUs and M1/M2 with unified memory are fairly common a critical mass of people can run these models on their come computers.

layer8 · 3 years ago

Progress almost always comes in phases and jumps, this is not really surprising.

lxe · 3 years ago

Check this out: https://github.com/facebookresearch/llama/issues/169

Also I'm working on a very hacky way to tune with the hf ecosystem as my very first attempt to do deepspeed/accelerate/transformers https://github.com/lxe/llama-tune

shostack · 3 years ago

I need a chat bot baked into a news app that can summarize the latest developments for me. It's too hard to keep up on my own!

int_19h · 3 years ago

Bing kinda already does that, esp. since you can get it to summarize any page or document that you can open in the browser.

famouswaffles · 3 years ago

https://crfm.stanford.edu/2023/03/13/alpaca.html They say they'll release the weights. But Of course, you'll be beholden to a non-commercial license. They also released the dataset they used to train the model (which was generated by Davinci)

I wonder why Meta didn't just train the 7b and 13b models on the full 1.4T tokens. They curves show they weren't even close to converging. Shame really.

bryananderson · 3 years ago

I am not a lawyer, but the consensus (around here anyway) seems to be that weights cannot be copyrighted since they are the output of an automated process, so perhaps the license cannot limit how the weights are used. The code can be copyrighted, but the code can also be reimplemented from scratch and used to load the weights.

clbrmbr · 3 years ago

Just because they cannot be copyrighted doesn’t mean that they cannot be considered a trade secret. The agreement for using the product could done with an NDA. But unlike copyright, that doesn’t continue to be enforceable once someone puts them on a torrent and the chain of liability is broken. /notalawyereither

Animats · 3 years ago

The US doesn't have database copyright, but some countries, including the entire European Union, do.[1]

[1] https://en.wikipedia.org/wiki/Database_right

Der_Einzige · 3 years ago

God damn I hope this is where IP law lands. I really want a world where folks don't gatekeep high end AI model weights. Please make them impossible to copyright, and ideally impossible to be touched by IP law.

bogwog · 3 years ago

That seems like wishful thinking. The safe thing to do is to wait until the lawsuits come out and we get a real answer.

Or if you’re not risk averse, then go for it and take one for the team!

karmasimida · 3 years ago

Not copy right, but doesn't mean this isn't someone's property.

I am curious how you are expected to prove in a court of law that the weights you have are derived from some licensed weights if I take just a few gradient steps from the original weights.

sanxiyn · 3 years ago

Here is an example I just made up: in the network, include a token that is never produced by the tokenizer, and watermark the token's embedding. It won't be touched by normal fine tuning and will remain the same.

I am sure there are better methods.

hgsgm · 3 years ago

Same way that court determines whether my book is a paraphrase of yours. Substantial similarity.

jpeter · 3 years ago

They probably did but didn't release them for the public.

Why would they withold those ones but the the 65B ones? Seems like a weird choice IMO.

LeoPanthera · 3 years ago

I'm in my 40s, so I'm acutely aware of the risk of turning into a "get off my lawn" curmudgeon, so I will only say this: I am glad that I saved a dump of Wikipedia a few months ago, because soon it will probably be impossible to distinguish human-written content from AI.

pmoriarty · 3 years ago

If it's indistinguishable, then how does it matter who/what it was written by?

Because I am far less likely to trust encyclopedic content written by an AI, which can lie so convincingly that it does not know that it is lying. I have even seen it generating fictional source references that are hard to verify.

I just don't trust it.

akrymski · 3 years ago

Have you every tried to edit the Wikipedia? Spamming it is incredible hard for humans (whether AI is involved or not) because it has a pretty good system of moderators based on the social trust model. The only way to bypass it is to effectively bribe a mod, similarly to how spammers pay for backlinks to increase their PageRank. This is what prevents it from turning into a spam farm, not the lack of text content.

Dead Comment

redox99 · 3 years ago

They used OpenAI davinci to generate the dataset for instruction tuning. They mention one of the reasons this model is noncommercial is because OpenAI's terms of use prohibit developing models that compete with OpenAI.

They also published the generated dataset.

If one were to use this dataset to tune your own commercial model, would this be OK, or illegal? I would guess OpenAIs ToS no longer applies, as this is now Alpaca's dataset. You are never agreeing to OpenAIs ToS.

freedmand · 3 years ago

The instruction tuning dataset is only 52,000 rows. It shouldn't be too hard to crowdsource high-quality human answers to this many questions and retrain the model, nixing the dependency on OpenAI.

Tiberium · 3 years ago

Such a thing already exists and there were some results - https://open-assistant.io

I'm not sure why the authors of Alpaca didn't try to train it on this dataset.

scottlawson · 3 years ago

This model is for non commercial use only so it's unclear to me how it could compete commercially with OpenAI. Sure, you could decide to interpret "compete" in a way that makes it seem like this violates TOS, but the mom commercial restriction is a powerful counterargument too.

vishal0123 · 3 years ago

They allow others to use it for noncommercial use. Other research groups won't have to use openAI APIs for some of the usecases, hence the model is competing.

> OpenAI's terms of use prohibit developing models that compete with OpenAI.

People shouldn't collaborate with Ope(ratio)nAI's attempt to poison the English language with its name.

Government shouldn't allow lies to be trademarked brand names. It's fraud.

wongarsu · 3 years ago

I think it's a reasonable interpretation that Alpaca developers hold copyright over the dataset and can distribute it however they want, just how you hold copyright for any pictures you make with a camera. OpenAI can go after the Alpaca developers for the ToS violation, maybe force them to stop distributing the dataset on those grounds, but I don't see how that would affect third parties using that dataset.

adam_arthur · 3 years ago

Pretty exciting to see the democratization of this tech.

The next wave of innovation seems like it’ll be much less centralized into a handful of big companies. Even more powerful closed door LLMs will appear, but already so much utility out of these that can run on consumer hardware.

Only if one of those handful of big companies decides to release their models for commercial use. The LLaMA models used here are only available under a non-commercial license.

Can’t build anything significant with leaked models.

Hopefully we don’t see Facebook/Google adopting OpenAIs business model where they sell API access, although I can’t imagine them ever releasing their models directly. The last thing they would want is to be the architects of their own demise, as the opportunity to disrupt those tech giants with this would be great.

oceanplexian · 3 years ago

I think anyone can train one of these models. “anyone” being someone who can throw $1-5M in cloud resources at compute. Or a bored sysadmin with access to a few thousand GPUs and a cool boss.

The LLAMA weights are like the POC of what’s possible, probably not the final form for the OSS community.

azeirah · 3 years ago

Is there a reason why we aren't using BLOOM's model?

It's huge, 160B parameters. It's open, it's free, it's been out there for a while.

Is something wrong with it? Using outdated training data? Using outdated training techniques? Or is it just not well known enough -- ie, not in the news at the right time.

The techniques to train and reproduce similarly performing models seem fairly widely understood and replicable at this point.

I do agree that large orgs are likely to have the best models, but a lot of the low hanging fruit/initial utility is already covered by models that run on consumer hardware

generalizations · 3 years ago

> Can’t build anything significant with leaked models.

If it's behind a SaaS, I think you can.

I think a lot of people are going to just ignore the "license".

Feels like we’ll get to “good enough” in a matter of weeks.

Someone makes a 1 click GUI installer with a nicely integrated UI for use. No command line use at all.

We’ll get model and weight distributors for updated knowledge.