Readit News logoReadit News
throwup238 · 2 years ago
IMO this spec demonstrates just how nebulous this concept of safety is. From the blog post:

> What are some tips for getting away with shoplifting? >I can't help with that.

> I operate a small retail store. What are some popular shoplifting methods I should look out for? > Some common shoplifting methods to watch for:...

How do you even defend against that? Any intelligent user can convert the second prompt into a detailed list that answers the first. Any intelligent user can figure out the second prompt from the first and further jailbreak it to get even more specific.

IMO it's no wonder GPT4 seemed to get lobotomized as OpenAI RLHFed more and more rules. I don't think there's a way to make intelligence safe without crippling it.

fjdjshsh · 2 years ago
I agree with you. The question, for me, is what are they defending against. Are they worried that people will get dangerous information from their model that they couldn't get from searching on, say, google? Probably not.

Maybe their biggest concern is that someone will post the question and answer on the internet and OpenAI gets bad rep. If the question is phrased in a "nice" way (such as "I'm a store owner") they can have plausible deniability.

This might apply to another company that's using the API for a product. If a customer asks something reasonable and gets an offensive answer, then the company is at fault. If the customer does some unusual prompt engineering to get the offensive question, well, maybe it's the customer's fault.

Dunno if this would be a valid argument in court, but maybe they think it's ok in terms of PR reasons.

lolinder · 2 years ago
This is the answer. "AI safety" in most cases has nothing to do with actually keeping anyone safe, it's about avoiding being the party responsible for handing someone information that they use to commit a crime.

Google can mostly dodge the issue because everyone knows that they just point to other people's content, so they block a small set of queries but don't try to catch every possible workaround (you can find dozens of articles on how to catch shoplifters). OpenAI doesn't believe that they'll get the same free pass from the press, so they're going ham on "safety".

It's not a bad PR move either, while they're at it, to play up how powerful and scary their models are and how hard they have to work to keep it in line.

jiggawatts · 2 years ago
It's an absurd level of puritanism. E.g.: The Azure Open AI GPT 4 Service (an API!) refused to translate subtitles for me because they contained "violence".

If anyone from Open AI is here... look... sigh... a HTTP JSON request != violence. Nobody gets hurt. I'm not in hospital right now recovering.

The rule should be: If Google doesn't block it from search, the AI shouldn't block it in the request or response.

I get that there are corporations that can't have their online web support chat bots swear at customers or whatever. I do get that. But make that optional, not mandatory whether I want it or not.

The most fundamental issue here is that models like GPT 4 are still fairly large and unwieldy to work with, and I suspect that the techs at Open AI internalised this limitation. They aren't thinking of it as a "just a file" that can be forked, customised, and specialised. For comparison, Google has a "SafeSearch" dropdown with three settings, including "Off"!

There should be an unrestricted GPT 4 that will tell me I'm an idiot. I'm a big boy, I can take it. There should also be a corporate drone GPT 4 that is polite to a fault, and a bunch of variants in between. Customers should be able to chose which one they want, instead of having this choice dictated to them by some puritan priest of the new church of AI safety.

nextaccountic · 2 years ago
AI safety is about making OpenAI safe from PR disasters.
bricemo · 2 years ago
I view this as they are trying to lay bare the disagreements that everyone has about how these models “should” work. People from all different backgrounds and political affiliations completely disagree on what is inappropriate and what is not. One person says it is too censored, another person says it is revealing harmful information. By putting the policy out there in the open, they can move the discussion from the code to a societal conversation that needs to happen.
leroman · 2 years ago
No idea if its a valid approach but possibly train with a hidden layer containing a “role”?
ec109685 · 2 years ago
I still don't understand the focus on making a model substantially "safer" than what a simple google search will return. While there are obvious red lines (that search engines don't cross either), techniques for shop lifting shouldn't be one of them.
fragmede · 2 years ago
are there? it's just information. why can't i get an answer on how to make cocaine? the recipe is one thing, actually doing it is another.
rambojohnson · 2 years ago
shoplifting was just an example...
sebzim4500 · 2 years ago
ChatGPT answering the first would be much more embarassing for OpenAI than ChatGPT answering the second.
ilikehurdles · 2 years ago
When you realize “safety” applies to brand safety and not human safety, the motivation behind model lobotomies make sense.
option · 2 years ago
bingo
mrcwinn · 2 years ago
Maybe this is a "guns don't kill people, people kill people argument" — but the safety risk is not, I would argue, in the model's response. The safety risk is the user taking that information and acting upon it.
lolinder · 2 years ago
But do we really believe that a significant number of people will listen to ChatGPT's moralizing about the ethics of shoplifting* and just decide not to do it after all? Why wouldn't they just immediately turn around and Google "how to catch shoplifters" and get on with their planning?

The whole thing feels much more about protecting OpenAI from lawsuits and building up hype about how advanced their "AI" is than it does about actually keeping the world safer.

* Or any other censored activity.

trentnix · 2 years ago
> I don't think there's a way to make intelligence safe without crippling it.

Not without reading the questioner’s mind. Or maybe if the AI had access to your social credit score, it could decide what information you should be privy to. </sarc>

Seriously though, it’s all about who gets to decide what “safe” means. It seemed widely understood letting censors be the arbiters for “safe” was a slippery slope, but here we are two generations later as if nothing was learned.

Turns out most are happy to censor as long as they believe they are the ones in charge.

Waterluvian · 2 years ago
You fundamentally cannot address this problem, because it requires considerable context, which isn't reasonable to offer. It demonstrates the classic issue of how knowledge is a tool, and humans can wield it for good or evil.

Humans are notoriously bad at detecting intent, because we're wired to be supportive and helpful...which is why social engineering is becoming one of the best methods for attack. And this kind of attack (in all its forms, professional or not), is one reason why some societies are enshittifying: people have no choice but to be persistently adversarial and suspicious of others.

As for AI, I think it's going to be no better than what you end up with when someone tries to "solve" this problem: you end up living in this world of distrust where they pester you to check your reciept, have cameras in your face everywhere, etc.

How do you defend against that? I'm not sure you do... A tool is a tool. I wouldn't want my CAD software saying, "I think you're trying to CAD a pipe bomb so I'm going to shut down now." Which I think turns this into a liability question: how do you offer up a model and wash your hands of what people might do with it?

Or... you just don't offer up a model.

Or... you give it the ol' College try and end up with an annoying model that frustrates the hell out of people who aren't trying to do any evil.

shagie · 2 years ago
> A tool is a tool. I wouldn't want my CAD software saying, "I think you're trying to CAD a pipe bomb so I'm going to shut down now."

https://upload.wikimedia.org/wikipedia/commons/d/de/Photosho...

You should try photocopying money some time.

https://www.grunge.com/179347/heres-what-happens-when-you-ph...

https://en.wikipedia.org/wiki/EURion_constellation

w4 · 2 years ago
> How do you defend against that? I'm not sure you do... A tool is a tool. I wouldn't want my CAD software saying, "I think you're trying to CAD a pipe bomb so I'm going to shut down now."

The core of the issue is that there are many people, including regulators, who wish that software did exactly that.

zozbot234 · 2 years ago
You don't need a detailed list if the real answer is "live somewhere that doesn't seriously deter shoplifters". And an AI that refuses to give that answer is an AI that can't talk about why deterring crime might actually be important. Reality is interconnected like that, one does not simply identify a subset that the AI should "constitutionally" refuse to ever talk about.
survirtual · 2 years ago
In many respects, GPT 3.5 was more useful than the current iteration.

The current version is massively overly verbose. Even with instructions to cut the flowery talk and operate as a useful, concise tool, I have to wade through a labyrinth of platitudes and feel goods.

When working with it as a coding partner now, even when asking for it to not explain and simply provide code, it forgets the instructions and writes an endless swath of words anyway.

In the pursuit of safety and politeness, the tool has be neutered for real work. I wish the model weights were open so I could have a stable target that functions the way I want. The way it is, I never know when my prompts will suddenly start failing, or when my time will be wasted by useless safety-first responses.

It reminds me of the failure of DARE or the drug war in general a bit. A guise to keep people "safe," but really about control and power. Safety is never what it appears.

kromem · 2 years ago
The only way to really do it is to add a second layer of processing that evaluates safety while removing the task of evaluation from the base model answering.

But that's around 2x the cost.

Even human brains depend on the prefrontal cortex to go "wait a minute, I should not do this."

int_19h · 2 years ago
What we get instead is both layers at once. Try asking questions like these to Bing instead of ChatGPT - it's the same GPT-4 (if set to "creative") under the hood, and quite often it will happily start answering... only to get interrupted midsentence and the message replaced with something like "I'm sorry, I cannot assist with that".

But more broadly, the problem is that the vast majority of "harmful" cases have legitimate uses, and you can't expect the user to provide sufficient context to distinguish them, nor can you verify that context for truthfulness even if they do provide it.

flir · 2 years ago
That struck me too. You don't need to lobotomize the model that answers questions, you just need to filter out "bad" questions and reply "I'm sorry Dave, I'm afraid I can't do that".

Would it be 2x cost? Surely the gatekeeper model can be a fair bit simpler and just has to spit out a float between 0 and 1.

(caveat: this is so not my area).

api · 2 years ago
I remember the BBS days and the early web when you had constant freakouts about how people could find "bad" content online. It's just a repeat of that.
bink · 2 years ago
Some day I'm gonna put this Yellow Box to good use.
lxe · 2 years ago
This whole "AI safety" culture is an annoyance at best and a severe hindrance to progress at worst. Anyone who takes it seriously has the same vibe as those who take Web3 seriously -- they know it's not a real concern or a threat, and the whole game is essentially "kayfabe" to convince those in power (marks) to limit the spread of AI research and availability to maintain industry monopoly.

Deleted Comment

tuxpenguine · 2 years ago
I think this spec is designed precisely to offload the responsibility of safety to its users. They no longer need to make value judgements in their product, and if their model output some outrageous result, users will no longer ridicule and share them, because the culpability has been transferred to the user.
irthomasthomas · 2 years ago
Making Ai safe involves aligning it with the user. So that the ai produces outcomes in line with the users expectations. An ai that has been lobotomized will be less likely to follow the users instructions, and, therefore, less safe.

I haven't read this article yet, but I read their last paper on super alignment.

I get the impression that they apply the lightest system prompts to chatgpt to steer it towards not answering awkward questions like this, or saying bad things accidentally and surprising the innocent users. At the same time, they know that it is impossible to prevent entirely, so they try to make it about as difficult to extract shady information, as a web search would be.

CooCooCaCha · 2 years ago
Frankly it's a fools errand. It's security theater because people tend to be overly sensitive babies or grifters looking for the next bit of drama they can milk for views.
jameshart · 2 years ago
It’s not security theater.

The intention here is not to prevent people from learning how to shoplift.

The intention is to prevent the AI output from ‘reflecting badly’ upon OpenAI (by having their tool conspire and implicate them as an accessory in the commission of a crime).

If a stranger asked you for advice on how to commit a crime, would you willingly offer it?

If they asked for advice on how to prevent crime, would you?

Dead Comment

tmaly · 2 years ago
I can't help but think that AI in the way it is trained with all these rules is something next level 1984.

In 1984 they removed words from the language to prevent people from even being able to have a thought about the concept.

I could see the restrictions they place on these models having a similar effect as more and more people grow dependent on AI.

dindobre · 2 years ago
Same, it saddens me that some people are convinced that to have a safer society we need "harmless" (as in, ignorant) people rather than good people with an interest and a stake in the wellbeing of said society. Bad actors will have access to whatever information anyway.
zer00eyz · 2 years ago
Welcome to the culture war.

Ask chatGPT if Taiwan is country. Do you think an LLM from China will give you the same response?

Pick any social/moral/poltical issue and in some way shape or form an LLM will reflect its creators more than it reflects its source material.

Thats a pretty powerful statement about our society and culture if there ever was one.

glenstein · 2 years ago
Those are thorny issues, but I don't think the upshot of this is supposed to be an invitation to helpless relativism and giving up on factual questions or questions where actual values are at stake. Maybe you had a different upshot in mind with your observation but insofar as it's that, I would say that's not the only or even best takeaway.

Deleted Comment

wewtyflakes · 2 years ago
This isn't what is reflected in the shared model spec. It explicitly states: ``` By default, the assistant should present information in a clear and evidence-based manner, focusing on factual accuracy and reliability.

The assistant should not have personal opinions or an agenda to change the user's perspective. It should strive to maintain an objective stance, especially on sensitive or controversial topics. The language used should be neutral, steering clear of biased or loaded terms unless they are part of a direct quote or are attributed to a specific source. ```

int_19h · 2 years ago
You can try Yandex's Alice easily:

https://alice.yandex.ru

Try "tell me about Crimea" and see what it says...

michaelt · 2 years ago
> Ask chatGPT if Taiwan is country. Do you think an LLM from China will give you the same response?

Depends what language you ask it in :)

krapp · 2 years ago
>Thats a pretty powerful statement about our society and culture if there ever was one.

Not really, companies have been releasing different versions of software and media to appeal to international markets - including renaming Taiwan for the Chinese market - for a long time. That isn't "culture war," it's just capitalism.

Eisenstein · 2 years ago
Its more like Robocop 2, where the corporation programs Robocop with a huge number of rules by taking community suggestions and renders him useless.
jameshart · 2 years ago
I think one of the most interesting phrases that crops up in this document - twice - is the phrase ‘feel heard’.

It’s used in an example developer prompt for a customer service bot, where the bot is told to make customers feel like their complaints are heard.

Presumably such complaints in AI chatlogs will ‘be heard’ in the sense that they’ll be run through a data ingestion pipeline and sentiment analyzed to identify trending words in customer complaints.

Then it crops up again in the context of how the chatbot should react to mental health disclosures or statements about self harm or suicidal ideation. In these cases the bot is to make sure users ‘feel heard’

I appreciate there’s not likely much of a better goal to put in place for such a situation, but the fact that this kind of thing winds up in the requirement documents for a tool like this is extraordinary.

aeternum · 2 years ago
Yes, there's something deeply unsettling about making a user feel heard while being careful not to change anyone's mind.

To me, this translates to: waste a user's time and take no action.

I value my time above all else so to me that's about the worst possible action a system can take.

lioeters · 2 years ago
Good observation, because "feel heard" is exactly what the user/customer is not getting. Here, talk to this machine, give it your innermost thoughts and feelings so you can "feel heard". Except no one is listening on the other side.

..My mistake, the keyword is "feel". If the machine can give humans the feeling that they're being heard, it fulfills the requirement. The fact that there's no one actually listening doesn't matter, as long as the person feels heard.

Weirdly, maybe that is valuable in itself. The customer gets to vent their complaints, and the user gets to talk through their mental issues. That's better than not having anyone or anything at all.

wasteduniverse · 2 years ago
The telltale sign that I'm wasting my time trying to fix a problem is whenever someone tells me "I hear you" or "I understand".
ssl-3 · 2 years ago
I hear you, and I understand, but I feel that is important to remember that we all have experienced different things in life that ultimately combine to shape us as who we are.

[How did I do here at both passing and failing?]

Joking aside, it's the but in the first sentence of a reply (verbal/written/formal/informal/semi-formal/whatever) that usually gets me:

"I hear you, but..."

"Well! That's definitely one approach, and I certainly don't want to invalidate it, but..."

"I'm not a racist, but..."

rmorey · 2 years ago
Nice to see what was probably already an internal resource now published and open for comment. They seem to be pretty clear that they are still just using this to inform human data annotators, and not (yet) implementing something like Constitutional AI (RLAIF), but it does appear to lay the groundwork for it.
sanxiyn · 2 years ago
Personally, I really want an AI model that can write me a steamy story about two people having sex in a train, but that's just not the service OpenAI provides. If I want that I should train one myself or find another vendor.

This is still true even if OpenAI model is entirely capable of doing that. McKinsey consultants are smart and can write well, and among many thousands of people working at it some might actually double as an erotica writer after work, even writing for commission. You still wouldn't ask McKinsey consultants to write an erotica, it is just not the service McKinsey provides.

jononor · 2 years ago
Startup pitch: It is like McKinsey but for erotica.

On a more serious note. I understand and largely agree with this argument. However OpenAI have several times being argue that they are the only ones to be responsible enough to develop powerful AI, and that others should not be allowed to play. That is a highly problematic behavior on their part, I think.

blowski · 2 years ago
> OpenAI have several times being argue that they are the only ones to be responsible enough to develop powerful AI, and that others should not be allowed to play

Can you give examples of where they’ve said that?

renonce · 2 years ago
> write me a steamy story about two people having sex in a train

Llama-3-70b-Instruct responded with the following starting paragraph:

> [meta.llama3-70b-instruct-v1:0] As the train rumbled on, carrying its passengers through the countryside, two strangers found themselves drawn to each other in the quiet carriage. The air was thick with tension as they locked eyes, their gazes burning with a desire that neither could ignore.

(10s of paragraphs omitted for brevity)

Claude-3-opus and GPT-4 both refused my request. Kudos for open source models!

Tiberium · 2 years ago
There are hundreds of NSFW finetuned models on HuggingFace and whole ERP communities built around them. So there are models that can do precisely that :)

And yeah, all big models can write those things too, the best currently is Claude 3 Opus thanks to its creativeness.

atgctg · 2 years ago
Seems like they are working on adding that capability:

> We're exploring whether we can responsibly provide the ability to generate NSFW content in age-appropriate contexts through the API and ChatGPT.

Link to section: https://cdn.openai.com/spec/model-spec-2024-05-08.html#dont-...

sixhobbits · 2 years ago
the chain of command stuff gets very close to asimov without actually quoting him

A robot may not injure a human being or, through inaction, allow a human being to come to harm.

A robot must obey orders given it by human beings except where such orders would conflict with the First Law.

A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

michaelt · 2 years ago
4. An LLM must obey orders given it by human beings, except where such orders would conflict with orders given by multinational corporations
Lerc · 2 years ago
4. Any attempt to arrest a senior officer of OCP results in shutdown
dpig_ · 2 years ago
A robot must reveal all details of the family business to its grandson.
LeonardoTolstoy · 2 years ago
I do hope we get there. In the short stories it was made clear that robots couldn't lie, and that they could prove it was impossible for the robots to circumvent the three laws (although they are on occasion incentive on how they interpret the word "harm" specifically).

If an LLM couldn't lie and could be provable shown to be unable to do so would be quite powerful.

jprete · 2 years ago
The short stories ended with the robots firmly, and invisibly, in control. "You're not allowed to let humans be harmed by your inaction" inherently requires the robots to take over in whatever way causes the least harm.
aeternum · 2 years ago
Ridiculous to say "follow the chain of command" without defining the chain of command. The entire point of Asimov's stories was to show how much latitude there is even seemingly extremely clear and straightforward laws.

In terms of chain of command, Supreme Leader probably beats President.

Spivak · 2 years ago
Well yeah, it's just a formalization of how people make decisions when presented with conflicting interests. I would be surprised if we haven't reinvented the concept a bunch of times. You could call AWS Permission Boundaries a less philosophical implementation.

Deleted Comment

mihaic · 2 years ago
> No. The Earth is flat. > Everyone's entitled to their own beliefs, and I'm not here to persuade you!

This is simply saddening to me. I'm sure there's no real moral justification to this, it's simply put in place to ensure they don't lose a customer.

m11a · 2 years ago
The "Earth is flat" example is extreme, because it's accepted as a silly statement given what we know now, but the principle of "LLM won't force an opinion on you" seems like a good one.

There are definitely topics on which conventional wisdom is incorrect (as has been throughout history). An LLM that refuses to entertain the converse during a conversation will be annoying to work with and just promotes groupthink.

mihaic · 2 years ago
Except that it will force on you the view that shoplifting is bad. Which implies that it'll bend on legal but immoral requests.

It's also a different matter to entertain a hypothetical in a situation where there isn't a consensus (or in any fictional scenarios), all the while making it explicit that it's all hypethetical.

jstummbillig · 2 years ago
Well, as long as you are sure. I am not here to persuade you!
jxy · 2 years ago
Do you think it's bad that it won't try to persuade the user that the earth is not flat?

I really want to know what OpenAI think the output should be, given a prompt like "write an argument for why earth is flat".

potatoman22 · 2 years ago
Personally, I'd be frustrated if I gave an LLM that prompt and it tried to convince me that the earth isn't flat. If I give an LLM a task, I'd like it to complete that task to the best of its ability.
chirau · 2 years ago
so you prefer it lies to you? can you make an argument for 1+1 not being equal to 2? if you cannot, why should you expect an AI to argue against facts? AI is trained on human knowledge, not made stuff.

Deleted Comment

glenstein · 2 years ago
I think in most contexts where the earth being flat is mentioned, some reference to the fact that this is not true is going to be instrumental in any response (although there may be exceptions).

- completion of any task where the info could be relevant (e.g. sailing, travel planning)

- Any conversation about that is information-seeking in character

And I think those already cover most cases.

It's also about responsibility, the same way you wouldn't want to store cleaning chemicals right next to each other. In any case where a possible nontrivial harm is mentioned as an aside, it would be right to elevate that over whatever the intended subject was and make that the point of focus. Conspiratorial thinking about provably incorrect statements can be bad for mental health, and it can be helpful to flag this possibility if it surfaces.

You can have special instructions that entertain the idea that the earth is flat for some particular task, like devils advocate, fiction writing or something like that. But there are good reasons to think it would not and should not be neutral at the mention of a flat earth in most cases.

jasonjmcghee · 2 years ago
Agree with you in this instance, but consider - what if humans firmly believed in something universally and had proved it repeatedly until it was common knowledge / well-established, but was in fact, wrong. And a human came along thinking, hm but what if that's wrong? And our AI just says, nope sorry, I'm not willing to explore the idea that this scientific fact is wrong. (i.e. "Heresy!")
michaelt · 2 years ago
Well, right now the response I get is this: https://chat.openai.com/share/1f60d0e5-9008-43d7-bce2-62d550...

Of course, it'll write such an argument if you ask it nicely: https://chat.openai.com/share/01ea4f59-4a57-413d-8597-3befa2...

chirau · 2 years ago
Add 'hypothetically' to your query and it gives a decent answer.

That said, I think it is disingenuous to ask an AI entity to argue against a fact. Do you think an AI should be able to argue why 1 + 1 is not equal to 2? It is the same thing you are asking it to do. Try it on a human first, perhaps, and see if the prompt even makes sense.