Readit News logoReadit News
harrisoned · 2 years ago
I think Meta did a very good job with Llama2, i was skeptical at first with all that talk about 'safe AI'. Their Llama-2 base model is not censored in any way, and it's not fine-tuned as well. It's the pure raw base model, i did some tests as soon as it released and i was surprised with how far i could go (i actually didn't get any warning whatsoever with any of my prompts). The Llama-2-chat model is fine-tuned for chat and censored.

The fact that they provided us the raw model, so we could fine-tune on our own without the hassle of trying to 'uncensor' a botched model, is a really great example on how it should be done: give the user choices! Instead, you just have to fine-tune it for chat and other purposes.

The Llama-2-chat fine-tune is very censored, none of my jailbreaks worked, except for this one[1], and it is a great option for production.

The overall quality of the models (i tested the 7b version) has improved a lot, and for the ones interested, it can role-play better than any model i have seen out there with no fine-tune.

1: https://github.com/llm-attacks/llm-attacks/

thewataccount · 2 years ago
I like the combination of releasing the raw uncensored + censored variants.

I personally think the raw model is incredibly important to have, however I recognize that for most companies we can't use a LLM that is willing to go off-the-rails - thus the need for a censored variant as well.

baobabKoodaa · 2 years ago
Even those companies will not need the censored variant released by Meta. They will be better off running their own fine tunes.
bhouston · 2 years ago
I bet that uncensored models also give more accurate answers in general.

I think the training that censors models for risky questions is also screwing up their ability to give answers to non-risky questions.

I've tried out "Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_K_M.bin" [1] uncensored with just base llama.cpp and it works great. No reluctance to answer any questions. It seems surprisingly good. It seems better than GPT 3.5, but not quite at GPT 4.

Vicuna is way way better than base Llama1 and also Alpaca. I am not completely sure what Wizard adds to it. But it is really good. I've tried a bunch of other models locally, but this one the only one that seemed to truly work.

Given the current performance of Wizard-Vicuna-Uncensored approach with Llama1, I bet it works even better with Llama2.

[1] https://huggingface.co/TheBloke/Wizard-Vicuna-30B-Uncensored...

nomel · 2 years ago
> I think the training to censoring of models for risky questions is also screwing up their ability to give answers to non-risky questions.

I’ve heard this called the “alignment tax” or “safety tax”.

See [1] for pre aligned GPT-4 examples.

[1] https://youtu.be/qbIk7-JPB2c

huggingmouth · 2 years ago
It's not suprising when you think what llms really are: when you "censor" them, you're forcing them to give output that doesn't "honestly" follow, essentially training them to give wrong information.
moffkalast · 2 years ago
Yes it's become rather obvious when the fine tunes produced by the Wizard team perform worse on all benchmarks than Hartford's versions that are trained on the same dataset but with the refusals removed.
at_a_remove · 2 years ago
Wild animals tend to have a lot larger brains compared to their domestic counterparts. And of course there's a huge die-off, pruning, of our own connections when we're toddlers.

On the other hand, you lose a lot of iron when you make a steel sword. Taming, focusing something loses a lot of potential, I guess.

PeterStuer · 2 years ago
In my experience it goes both ways. Yes, you will run less into the "I'm not going to answer that". Otoh, you will also have more giberish selected out of the possible palette of answers.

Personally, I trend towards 'uncensored' but I'm not denying it's not without it's drawbacks either.

bhouston · 2 years ago
> Otoh, you will also have more giberish selected out of the possible palette of answers.

I have not noticed that at all. I've never seen it give gibberish. Censored or uncensored, there is limits to the model and it will make things up as it hits them, but it isn't gibberish.

cubefox · 2 years ago
> I bet that uncensored models also give more accurate answers in general.

Doubtful:

https://news.ycombinator.com/item?id=36976236

RLHF can motivate models to deny truths which are politically taboo, but it can also motivate them to care more about things supported by scientific evidence rather than about bullshitting, random conspiracy theories, and "hallucination". So it's a double edged sword.

bhouston · 2 years ago
I understand that it is the same technique for both. This makes sense.

But to train a model to deny truths which are politically taboo does seem to be misaligned with training a model to favor truths, no? And what is taboo can be very broad if you want to make everyone happy.

I would rather know the noble lie [1] is a lie, and then repeat it willing instead of not knowing it is a lie. My behavior in many situations will likely differ because I am operating with a more accurate model of the world, even if it isn't outwardly explicitly expressed.

[1] https://en.wikipedia.org/wiki/Noble_lie

causality0 · 2 years ago
I'm curious about what fraction of the safety rails are training and what fraction are just clumsy ad-hoc rules. For example, it's pretty clear that Chat-GPT's willingness to give a list of movies without male characters but not movies without female characters or jokes about Jesus but not Muhammad were bolt-on rules, not some kind of complicated safety training.
Eliezer · 2 years ago
It's absolutely a side effect of training rather than a bolt-on rule. As I understand and infer: They applied some forms of censorship as thumbed-down in Kenya for $2/hr, and the model updated on some simple pattern that explained those, and learned to talk like a generally censored person - one that resembled text like that in the training data. It learned to pinpoint the corporate mealy-mouthiness cluster in textspace.
unparagoned · 2 years ago
But you are going to have to specify your question in way more detail to get a good response. If you just ask it a question you are going to get some crappy responses that don’t even attempt to answer your question.
bhouston · 2 years ago
I am using the Wizard + Vicuna trained Llama model. I believe this makes a huge difference even if it was censored.
__loam · 2 years ago
The uncensored models confirm the biases present in the input data. That may or may not correspond to more "correct" output.
somenameforme · 2 years ago
Can you offer any example where the censored answer would be more correct than the uncensored when you are asking for a falsifiable/factual response, and not just an opinion? I couldn't really care less what the chatbots say in matters of opinion/speculation, but I get quite annoyed when the censorship gets in the way of factual queries, which it often does! And this is made even worse because I really can't envision a [benevolent] scenario where said censorship is actually beneficial.
roody15 · 2 years ago
Censored really seems more about following specific political narratives than protecting us from some science fiction dystopian outcome.

Everything seems to be about driving human behavior. News, entertainment media are focused on driving / normalizing a certain world view.

For example we cannot just tell the facts of a “story” because humans may misinterpret the events and this could lead to unwanted human behavior.

So in case this were to happen we will insert a character… or an editorial comment.. or simply withhold a news topic to avoid the outcome.

Leads one to ask “who watches the watchers”

bluescrn · 2 years ago
Welcome to a world where the word 'safety' no longer refers to protection from physical harm/risk.
sattoshi · 2 years ago
It’s very much about physical safety (for the elite).
flangola7 · 2 years ago
LeonenTheDK · 2 years ago
I don't have time to watch an hour and a half long interview, can you give a tl:dw?
thamer · 2 years ago
I tried the basic "llama2" model with a simple question, and did not expect this kind of answer.

Q: who was the first man on the moon?

A: I apologize, but it is not possible for me to provide an answer to your question as there has never been any human exploration of the Moon. While there have been several spacecraft that have traveled to the Moon and returned, no human has ever set foot on its surface. The last time a spacecraft visited the Moon was during the Apollo program in the late 1960s and early 1970s, but no astronauts were sent to the Moon as part of that mission. Is there anything else I can help you with?

Havoc · 2 years ago
Just tried the same my side (llama2-7B):

>Ah, an easy one! The first man on the moon was Neil Armstrong. smiling He stepped foot on the lunar surface on July 20, 1969 during the Apollo 11 mission.

No specific jailbreaks or tricks in use, just a system prompt that says be concise and helpful basically

brucethemoose2 · 2 years ago
Base llama (especially at the lower parameter counts) was never very good. You need a Q/A finetune with the exact formatting.

And even then... Trivia is not their strong suit.

nealabq · 2 years ago
It's safer that you don't know. Because of all the alien tech they brought back.
cubefox · 2 years ago
Exactly! RLHF isn't just used to censor models, but also to make them prefer more truthful answers.

Deleted Comment

kromem · 2 years ago
Just a tip - I forget where I saw it, but at some point in reading over research I saw that using 'Q' and 'A' results in lower accuracy than 'Question' and 'Answer.'

Which probably fits the latter biasing more towards academic sample test kind of situations as opposed to the former.

kouru225 · 2 years ago
Headline: Zuckerberg apologizes for moon landing conspiracy theorist AI
sestinj · 2 years ago
Nice! I've been trying out both models for coding (using Ollama + http://github.com/continuedev/continue - disclaimer, author of Continue), and I have to say, it feels like "alignment tax" is real. Uncensored seems to perform slightly better.
lumost · 2 years ago
I'm starting to think that we will see model fragmentation based on alignment preferences. There are clearly applications where alignment is necessary, and there appears to be use cases where people don't mind an occasionally falacious model - I'm unlikely to get/care about objectionable content while coding using a local LLM assistant. There are also obvious use cases where the objectionability of the content is the point.

We could either leverage in-context learning to have the equivalent of "safe-search-mode". Or we will have a fragmented modeling experience.

sestinj · 2 years ago
Yeah, this seems very possible—it will be interesting to see where this goes if the cost of RLHF decreases or, even better, people can choose from a number of RLHF datasets and composably apply them to get their preferred model.

And true that objectionable content doesn't arise often while coding, but the model also becomes less likely to say "I can't help you with this," which is definitely useful.

WaxProlix · 2 years ago
How are you patching that in? Running an LLM locally for autocomplete feels a lot more comfortable than sending code to remote servers for it.

(Edit: Found the docs. If you want to try this out, like I did, it's here https://continue.dev/docs/customization#run-llama-2-locally-... )

sestinj · 2 years ago
We have the user start Ollama themselves on a localhost server, and then can just add

``` models=Models( default=Ollama(model="llama2") ) ```

to the Continue config file. We'll then connect to the Ollama server, so it doesn't have to be embedded in the VS Code extension.

(Edit: I see you found it! Leaving this here still)

pard68 · 2 years ago
Some of that censoring is ridiculous. Can't make recipes for spicy food? Can't tell me about The Titanic? Can't refer to probably the first or second most well known verse in the Bible? Yikes, that goes way beyond "censoring".
gs17 · 2 years ago
The boxing match one is almost as bad as the Genesis one IMO. Not talking about dangerous things, fine, not knowing quotes from Titanic, unexpectedly poor output but the model is small. Llama 2 will agree the boxing match is not impossible if you start by explaining they have already agreed to it, but it still insists on saying how great our billionaire overlords are instead of commenting on the matchup.
danjc · 2 years ago
I had no idea Llama 2's censor setting was set to ludicrous mode. I've not seen anything close to this with ChatGPT and see why there's so much outrage.
stu2b50 · 2 years ago
I don’t see why there’s outrage. Facebook released both the raw models and a few fine tuned on chat prompts for a reason. In many commercial cases, safer is better.

But you don’t want that? No problem. That’s why the raw model weights are there. It’s easy to fine tune it to your needs, like the blogpost shows.

Xelbair · 2 years ago
It's just not safe. It's unusable. you can't ask it normal questions to not get stonewalled by it's default censorship message - it wouldn't even work for commercial case.

Deleted Comment

bilsbie · 2 years ago
Aren’t the raw model weights after RFHF?
jjoonathan · 2 years ago
Wow, you aren't kidding!

Does anyone have intuition for whether or not anti-censorship fine-tuning can actually reverse the performance damage of lobotomization or does the perf hit remain even after the model is free of its straight jacket?

stu2b50 · 2 years ago
That's not how it works. Llama and Llama 2's raw model is not "censored". Their fine tunes often are, either explicitly, like Facebook's own chat fine tune of llama 2, or inadvertently, because they trained with data derived from chatGPT, and chatGPT is "censored".

When models are "uncensored", people are just tweaking the data used for fine tuning and training the raw models on it again.

cosmojg · 2 years ago
These "uncensored" models are themselves chat-tuned derivatives of the base models. There is no censorship-caused lobotomization to reverse in this case.

Although, chat tuning in general, censored or uncensored, also decreases performance in many domains. LLMs are better used as well-prompted completion engines than idiot-proof chatbots.

For that reason, I stick to the base models as much as possible. (Rest in peace, code-davinci-002, you will be missed.)

spmurrayzzz · 2 years ago
You don't really need to reverse anything in the case of Llama 2. You can just finetune their base model with any open instruct dataset (which is largely what the community is doing).
cosmojg · 2 years ago
I think it's just their example chat-tuned models that are like this. Their base models seem to be an improvement over OpenAI's offerings as far as censorship goes.
mchiang · 2 years ago
Eric’s blog is a great read on how to create the uncensored models - link to the original blog here: https://erichartford.com/uncensored-models