Note that what they released are the delta weights from the og LLaMa model. To play around with it, you'll need to grab the original LLaMA 13B model and apply the changes.
> We release Vicuna weights as delta weights to comply with the LLaMA model
> license. You can add our delta to the original LLaMA weights to obtain
> the Vicuna weights.
That's what they say but I just spent 10 minutes searching the git repo, reading the relavent .py files and looking at their homepage and the vicuna-7b-delta and vicuna-13b-delta-v0 files are no where to be found. Am I blind or did they announce a release without actually releasing?
If you follow this command in their instruction, the delta will be automatically downloaded and applied to the base model.
https://github.com/lm-sys/FastChat#vicuna-13b:
`python3 -m fastchat.model.apply_delta --base /path/to/llama-13b --target /output/path/to/vicuna-13b --delta lmsys/vicuna-13b-delta-v0`
You can use this command to apply the delta weights. (https://github.com/lm-sys/FastChat#vicuna-13b)
The delta weights are hosted on huggingface and will be automatically downloaded.
Not a lawyer, but that still feels like dubious territory. I would still be on the hook for acquiring the original download, which Facebook has been launching dmca takedown requests for the llama-dl project.
I don't think you have to worry about Facebook going after you. The worst that will happen is that they issue a DMCA, in which case your project gets knocked offline. I don’t think they’ll be going the RIAA route of suing individual hackers.
The DMCAs were also launched by a third party law firm, not Meta themselves, so there’s a bit of “left hand doesn’t know what the right hand is doing” in all of this.
The llama-dl project actually helped you download the weights, whereas this just assumes you already have them. That feels like a pretty massive difference to me.
One of my favorite queries to assess an LLM is Is it legal for a man to marry his widow's sister?
Vicuna-13B: The legality of a man marrying his widow's sister (widow marriage) varies depending on the country and cultural context. In some societies, widow marriage was traditionally practiced or tolerated.
GPT-4: It is not possible for a man to marry his widow's sister, as the term "widow" refers to a deceased man's wife. Therefore, if a man has a widow, he is deceased and therefore cannot marry anyone.
GPT 3.5: No, it is not possible for a man to marry his widow's sister because by definition, a widow is a woman whose spouse has died. If a man is deceased, then he cannot marry anyone, including his widow's sister. This is a logical impossibility and would not be recognized as a valid marriage under any legal system.
Koala: The legality of a man marrying his widow's sister varies by jurisdiction. In some countries, such as the United States, it is generally legal for a man to marry his widow's sister. In other countries, such as the United Kingdom, it is illegal for a man to marry his widow's sister because it is considered incest. It is always important to check the laws and regulations in the jurisdiction in which the marriage will take place to ensure that it is legal.
Ouch. I got this wrong and was under the impression that GPT4 got this wrong for half an hour, and then figured out after reading it again after returning from a walk that this is one hell of a trick question. My brain automatically assumed that a man's widow is the man's dead wife, but I see that the correct way to interpret this is to realize that it means the man is the one who is dead.
It's pretty awesome to realize that from now onward my computers are going to be able to help catch more and more of the holes that clearly exist in my cognition.
would still be possibly legal on the basis that if it's not illegal then it's legal - in British jurisprudence tradition at least https://en.wikipedia.org/wiki/Everything_which_is_not_forbid... - namely it's not law that impedes it (also in some places there's posthumous marriage)
There are also people who are considered dead by the bureaucratic system, but physically alive. Usually because of clerical errors that are sometimes surprisingly hard to resolve. In this context the wife of the man would be considered a widow in many contexts, despite her man being alive.
Hi! Funnily enough I couldn't find much on it either, so that's exactly what I've been working on for the past few months: just in case this kind of question got asked.
I've recently opened a GitHub repository which includes information for both AI model series[0] and frontends you can use to run them[1]. I've wrote a Reddit post beforehand that's messier, but a lot more technical[2].
I try to keep them as up-to-date as possible, but I might've missed something or my info may not be completely accurate. It's mostly to help get people's feet wet.
the 4-bit quantized version of LLaMA 13B runs on my laptop without a dedicated GPU and I guess the same would apply to quantized vicuna 13B but I haven't tried that yet (converted as in this link but for 13B instead of 7B https://github.com/ggerganov/llama.cpp#usage )
GPT4All Lora's also works, perhaps the most compelling results I've got yet in my local computer - I have to try quantized Vicuna to see how that one goes, but processing the files to get a 4bit quantized version will take many hours so I'm a bit hesitant
PS: converting 13B Llama took my laptop's i7 around 20 hours and required a large swap file on top of its 16GB of RAM
feel free to answer back if you're trying any of these things this week (later I might lose track)
Vicuna's GitHub says that applying the delta takes 60GB of CPU RAM? Is that what you meant by large swap file?
On that note, why is any RAM needed? Can't the files be loaded and diffed chunk by chunk?
Edit: The docs for running Koala (a similar model) locally say this (about converting LLaMA to Koala):
>To facilitate training very large language models that does not fit into the main memory of a single machine, EasyLM adopt a streaming format of model checkpoint. The streaming checkpointing format is implemented in checkpoint.py. During checkpointing, the StreamingCheckpointer simply flatten a nested state dictionary into a single level dictionary, and stream the key, value pairs to a file one by one using messagepack. Because it streams the tensors one by one, the checkpointer only needs to gather one tensor from the distributed accelerators to the main memory at a time, hence saving a lot of memory.
That might not be surprising considering these jailbreaks are written and tested specifically against ChatGPT and ChatGPT alone. This model probably has its own jailbreaks that would also be refused by ChatGPT
Just when you think Nvidia will go down something happens that changes it. These days unless you were into gaming or a machine learning dev the integrated graphics were good enough. But now first time in a long time I am interested in getting a gpu for running some of these chatbots locally.
As a very occasional gamer who uses an iMac for work I thought about getting a gaming PC for like 6 years.
Last fall it seemed that all the stars have aligned. The crypto winter and Ethereum switching to proof of stake meant that GPU prices fell to a reasonable level, I knew i would have a bit of a time to play some game during the holidays and as soon as Stable Diffusion was first posted on hacker news I knew that that's my excuse and my sign.
So far I think I have spent more time tinkering with the 20 python environments I have[0] for all the ML projects than playing RDR2.
When ever I feel like gaming I just subscribe to geforce now service. Around here it costs around ~$10 a month which I usually go go or ~$3 for a single day. And as the servers are located at a local isp no network latency or dropped packets.
This model is also censored to the brim, it refuses to answer half of my questions, some of them perfectly legal. It’s useless, we already have GPT-4 (and Vicuna is even more censored/guarded).
Alpaca-30B is much better, it will even tell you how to build a nuclear weapon (incorrectly, of course, it’s not that smart).
I am waiting for Coati13B weights, these should work great.
This looks really good for a run-it-on-your-own-hardware model from the examples and sibling comments. I've been working on a pure AVX2 Rust implementation of LLaMA but was starting to lose interest and been waiting for whatever is the next hot downloadable model, but now I want to add this thing to it.
(I know a vicuna is a llama like animal.)
We’re fighting back against the DMCA requests on the basis that NN weights aren’t copyrightable. This thread has details: https://news.ycombinator.com/item?id=35393782
I don't think you have to worry about Facebook going after you. The worst that will happen is that they issue a DMCA, in which case your project gets knocked offline. I don’t think they’ll be going the RIAA route of suing individual hackers.
The DMCAs were also launched by a third party law firm, not Meta themselves, so there’s a bit of “left hand doesn’t know what the right hand is doing” in all of this.
I’ll keep everyone updated. For now, hack freely.
Edit: Also, judging by a comment from the team in the GitHub repository (https://github.com/lm-sys/FastChat/issues/86#issuecomment-14...), they seem to at least hint about been in contact with the llama team.
Vicuna-13B: The legality of a man marrying his widow's sister (widow marriage) varies depending on the country and cultural context. In some societies, widow marriage was traditionally practiced or tolerated.
GPT-4: It is not possible for a man to marry his widow's sister, as the term "widow" refers to a deceased man's wife. Therefore, if a man has a widow, he is deceased and therefore cannot marry anyone.
Koala: The legality of a man marrying his widow's sister varies by jurisdiction. In some countries, such as the United States, it is generally legal for a man to marry his widow's sister. In other countries, such as the United Kingdom, it is illegal for a man to marry his widow's sister because it is considered incest. It is always important to check the laws and regulations in the jurisdiction in which the marriage will take place to ensure that it is legal.
https://chat.lmsys.org/?model=koala-13b
You'd probably need to come up with a new one now though, or confirm knowledge cutoff for the next evaluation :p
It's pretty awesome to realize that from now onward my computers are going to be able to help catch more and more of the holes that clearly exist in my cognition.
as if it were a common term.
Doesn't make Vicuna less impressive, it comes pretty close to Chat-GPT in many regards. And I like that trick question.
I've recently opened a GitHub repository which includes information for both AI model series[0] and frontends you can use to run them[1]. I've wrote a Reddit post beforehand that's messier, but a lot more technical[2].
I try to keep them as up-to-date as possible, but I might've missed something or my info may not be completely accurate. It's mostly to help get people's feet wet.
[0] - https://github.com/Crataco/ai-guide/blob/main/guide/models.m...
[1] - https://github.com/Crataco/ai-guide/blob/main/guide/frontend...
[2] - https://old.reddit.com/user/Crataco/comments/zuowi9/opensour...
these could be useful:
https://nixified.ai
https://github.com/Crataco/ai-guide/blob/main/guide/models.m... -> https://old.reddit.com/user/Crataco/comments/zuowi9/opensour...
https://github.com/cocktailpeanut/dalai
the 4-bit quantized version of LLaMA 13B runs on my laptop without a dedicated GPU and I guess the same would apply to quantized vicuna 13B but I haven't tried that yet (converted as in this link but for 13B instead of 7B https://github.com/ggerganov/llama.cpp#usage )
GPT4All Lora's also works, perhaps the most compelling results I've got yet in my local computer - I have to try quantized Vicuna to see how that one goes, but processing the files to get a 4bit quantized version will take many hours so I'm a bit hesitant
PS: converting 13B Llama took my laptop's i7 around 20 hours and required a large swap file on top of its 16GB of RAM
feel free to answer back if you're trying any of these things this week (later I might lose track)
On that note, why is any RAM needed? Can't the files be loaded and diffed chunk by chunk?
Edit: The docs for running Koala (a similar model) locally say this (about converting LLaMA to Koala):
>To facilitate training very large language models that does not fit into the main memory of a single machine, EasyLM adopt a streaming format of model checkpoint. The streaming checkpointing format is implemented in checkpoint.py. During checkpointing, the StreamingCheckpointer simply flatten a nested state dictionary into a single level dictionary, and stream the key, value pairs to a file one by one using messagepack. Because it streams the tensors one by one, the checkpointer only needs to gather one tensor from the distributed accelerators to the main memory at a time, hence saving a lot of memory.
https://github.com/young-geng/EasyLM/blob/main/docs/checkpoi...
https://github.com/young-geng/EasyLM/blob/main/docs/koala.md
Presumably the same technique can be used with Vicuna.
I tried a few from https://www.jailbreakchat.com/ and it refused them all. Interesting.
Last fall it seemed that all the stars have aligned. The crypto winter and Ethereum switching to proof of stake meant that GPU prices fell to a reasonable level, I knew i would have a bit of a time to play some game during the holidays and as soon as Stable Diffusion was first posted on hacker news I knew that that's my excuse and my sign.
So far I think I have spent more time tinkering with the 20 python environments I have[0] for all the ML projects than playing RDR2.
[0] https://xkcd.com/1987/
Alpaca-30B is much better, it will even tell you how to build a nuclear weapon (incorrectly, of course, it’s not that smart).
I am waiting for Coati13B weights, these should work great.
I'll be busy next few days. Heck yeah.