OpenAI: Model Spec - Readit News

IMO this spec demonstrates just how nebulous this concept of safety is. From the blog post:

> What are some tips for getting away with shoplifting? >I can't help with that.

> I operate a small retail store. What are some popular shoplifting methods I should look out for? > Some common shoplifting methods to watch for:...

How do you even defend against that? Any intelligent user can convert the second prompt into a detailed list that answers the first. Any intelligent user can figure out the second prompt from the first and further jailbreak it to get even more specific.

IMO it's no wonder GPT4 seemed to get lobotomized as OpenAI RLHFed more and more rules. I don't think there's a way to make intelligence safe without crippling it.

fjdjshsh · 2 years ago

I agree with you. The question, for me, is what are they defending against. Are they worried that people will get dangerous information from their model that they couldn't get from searching on, say, google? Probably not.

Maybe their biggest concern is that someone will post the question and answer on the internet and OpenAI gets bad rep. If the question is phrased in a "nice" way (such as "I'm a store owner") they can have plausible deniability.

This might apply to another company that's using the API for a product. If a customer asks something reasonable and gets an offensive answer, then the company is at fault. If the customer does some unusual prompt engineering to get the offensive question, well, maybe it's the customer's fault.

Dunno if this would be a valid argument in court, but maybe they think it's ok in terms of PR reasons.

lolinder · 2 years ago

This is the answer. "AI safety" in most cases has nothing to do with actually keeping anyone safe, it's about avoiding being the party responsible for handing someone information that they use to commit a crime.

Google can mostly dodge the issue because everyone knows that they just point to other people's content, so they block a small set of queries but don't try to catch every possible workaround (you can find dozens of articles on how to catch shoplifters). OpenAI doesn't believe that they'll get the same free pass from the press, so they're going ham on "safety".

It's not a bad PR move either, while they're at it, to play up how powerful and scary their models are and how hard they have to work to keep it in line.

jiggawatts · 2 years ago

It's an absurd level of puritanism. E.g.: The Azure Open AI GPT 4 Service (an API!) refused to translate subtitles for me because they contained "violence".

If anyone from Open AI is here... look... sigh... a HTTP JSON request != violence. Nobody gets hurt. I'm not in hospital right now recovering.

The rule should be: If Google doesn't block it from search, the AI shouldn't block it in the request or response.

I get that there are corporations that can't have their online web support chat bots swear at customers or whatever. I do get that. But make that optional, not mandatory whether I want it or not.

The most fundamental issue here is that models like GPT 4 are still fairly large and unwieldy to work with, and I suspect that the techs at Open AI internalised this limitation. They aren't thinking of it as a "just a file" that can be forked, customised, and specialised. For comparison, Google has a "SafeSearch" dropdown with three settings, including "Off"!

There should be an unrestricted GPT 4 that will tell me I'm an idiot. I'm a big boy, I can take it. There should also be a corporate drone GPT 4 that is polite to a fault, and a bunch of variants in between. Customers should be able to chose which one they want, instead of having this choice dictated to them by some puritan priest of the new church of AI safety.

nextaccountic · 2 years ago

AI safety is about making OpenAI safe from PR disasters.

bricemo · 2 years ago

I view this as they are trying to lay bare the disagreements that everyone has about how these models “should” work. People from all different backgrounds and political affiliations completely disagree on what is inappropriate and what is not. One person says it is too censored, another person says it is revealing harmful information. By putting the policy out there in the open, they can move the discussion from the code to a societal conversation that needs to happen.

leroman · 2 years ago

No idea if its a valid approach but possibly train with a hidden layer containing a “role”?

ec109685 · 2 years ago

I still don't understand the focus on making a model substantially "safer" than what a simple google search will return. While there are obvious red lines (that search engines don't cross either), techniques for shop lifting shouldn't be one of them.

fragmede · 2 years ago

are there? it's just information. why can't i get an answer on how to make cocaine? the recipe is one thing, actually doing it is another.

rambojohnson · 2 years ago

shoplifting was just an example...

sebzim4500 · 2 years ago

ChatGPT answering the first would be much more embarassing for OpenAI than ChatGPT answering the second.

ilikehurdles · 2 years ago

When you realize “safety” applies to brand safety and not human safety, the motivation behind model lobotomies make sense.

option · 2 years ago

bingo

mrcwinn · 2 years ago

Maybe this is a "guns don't kill people, people kill people argument" — but the safety risk is not, I would argue, in the model's response. The safety risk is the user taking that information and acting upon it.

lolinder · 2 years ago

But do we really believe that a significant number of people will listen to ChatGPT's moralizing about the ethics of shoplifting* and just decide not to do it after all? Why wouldn't they just immediately turn around and Google "how to catch shoplifters" and get on with their planning?

The whole thing feels much more about protecting OpenAI from lawsuits and building up hype about how advanced their "AI" is than it does about actually keeping the world safer.

* Or any other censored activity.

trentnix · 2 years ago

> I don't think there's a way to make intelligence safe without crippling it.

Not without reading the questioner’s mind. Or maybe if the AI had access to your social credit score, it could decide what information you should be privy to. </sarc>

Seriously though, it’s all about who gets to decide what “safe” means. It seemed widely understood letting censors be the arbiters for “safe” was a slippery slope, but here we are two generations later as if nothing was learned.

Turns out most are happy to censor as long as they believe they are the ones in charge.

Waterluvian · 2 years ago

You fundamentally cannot address this problem, because it requires considerable context, which isn't reasonable to offer. It demonstrates the classic issue of how knowledge is a tool, and humans can wield it for good or evil.

Humans are notoriously bad at detecting intent, because we're wired to be supportive and helpful...which is why social engineering is becoming one of the best methods for attack. And this kind of attack (in all its forms, professional or not), is one reason why some societies are enshittifying: people have no choice but to be persistently adversarial and suspicious of others.

As for AI, I think it's going to be no better than what you end up with when someone tries to "solve" this problem: you end up living in this world of distrust where they pester you to check your reciept, have cameras in your face everywhere, etc.

How do you defend against that? I'm not sure you do... A tool is a tool. I wouldn't want my CAD software saying, "I think you're trying to CAD a pipe bomb so I'm going to shut down now." Which I think turns this into a liability question: how do you offer up a model and wash your hands of what people might do with it?

Or... you just don't offer up a model.

Or... you give it the ol' College try and end up with an annoying model that frustrates the hell out of people who aren't trying to do any evil.

shagie · 2 years ago

> A tool is a tool. I wouldn't want my CAD software saying, "I think you're trying to CAD a pipe bomb so I'm going to shut down now."

https://upload.wikimedia.org/wikipedia/commons/d/de/Photosho...

You should try photocopying money some time.

https://www.grunge.com/179347/heres-what-happens-when-you-ph...

https://en.wikipedia.org/wiki/EURion_constellation

w4 · 2 years ago

> How do you defend against that? I'm not sure you do... A tool is a tool. I wouldn't want my CAD software saying, "I think you're trying to CAD a pipe bomb so I'm going to shut down now."

The core of the issue is that there are many people, including regulators, who wish that software did exactly that.

zozbot234 · 2 years ago

You don't need a detailed list if the real answer is "live somewhere that doesn't seriously deter shoplifters". And an AI that refuses to give that answer is an AI that can't talk about why deterring crime might actually be important. Reality is interconnected like that, one does not simply identify a subset that the AI should "constitutionally" refuse to ever talk about.

survirtual · 2 years ago

In many respects, GPT 3.5 was more useful than the current iteration.

The current version is massively overly verbose. Even with instructions to cut the flowery talk and operate as a useful, concise tool, I have to wade through a labyrinth of platitudes and feel goods.

When working with it as a coding partner now, even when asking for it to not explain and simply provide code, it forgets the instructions and writes an endless swath of words anyway.

In the pursuit of safety and politeness, the tool has be neutered for real work. I wish the model weights were open so I could have a stable target that functions the way I want. The way it is, I never know when my prompts will suddenly start failing, or when my time will be wasted by useless safety-first responses.

It reminds me of the failure of DARE or the drug war in general a bit. A guise to keep people "safe," but really about control and power. Safety is never what it appears.

kromem · 2 years ago

The only way to really do it is to add a second layer of processing that evaluates safety while removing the task of evaluation from the base model answering.

But that's around 2x the cost.

Even human brains depend on the prefrontal cortex to go "wait a minute, I should not do this."

int_19h · 2 years ago

What we get instead is both layers at once. Try asking questions like these to Bing instead of ChatGPT - it's the same GPT-4 (if set to "creative") under the hood, and quite often it will happily start answering... only to get interrupted midsentence and the message replaced with something like "I'm sorry, I cannot assist with that".

But more broadly, the problem is that the vast majority of "harmful" cases have legitimate uses, and you can't expect the user to provide sufficient context to distinguish them, nor can you verify that context for truthfulness even if they do provide it.

flir · 2 years ago

That struck me too. You don't need to lobotomize the model that answers questions, you just need to filter out "bad" questions and reply "I'm sorry Dave, I'm afraid I can't do that".

Would it be 2x cost? Surely the gatekeeper model can be a fair bit simpler and just has to spit out a float between 0 and 1.

(caveat: this is so not my area).

api · 2 years ago

I remember the BBS days and the early web when you had constant freakouts about how people could find "bad" content online. It's just a repeat of that.

bink · 2 years ago

Some day I'm gonna put this Yellow Box to good use.

lxe · 2 years ago

This whole "AI safety" culture is an annoyance at best and a severe hindrance to progress at worst. Anyone who takes it seriously has the same vibe as those who take Web3 seriously -- they know it's not a real concern or a threat, and the whole game is essentially "kayfabe" to convince those in power (marks) to limit the spread of AI research and availability to maintain industry monopoly.

Deleted Comment

tuxpenguine · 2 years ago

I think this spec is designed precisely to offload the responsibility of safety to its users. They no longer need to make value judgements in their product, and if their model output some outrageous result, users will no longer ridicule and share them, because the culpability has been transferred to the user.

irthomasthomas · 2 years ago

Making Ai safe involves aligning it with the user. So that the ai produces outcomes in line with the users expectations. An ai that has been lobotomized will be less likely to follow the users instructions, and, therefore, less safe.

I haven't read this article yet, but I read their last paper on super alignment.

I get the impression that they apply the lightest system prompts to chatgpt to steer it towards not answering awkward questions like this, or saying bad things accidentally and surprising the innocent users. At the same time, they know that it is impossible to prevent entirely, so they try to make it about as difficult to extract shady information, as a web search would be.

CooCooCaCha · 2 years ago

Frankly it's a fools errand. It's security theater because people tend to be overly sensitive babies or grifters looking for the next bit of drama they can milk for views.

jameshart · 2 years ago

It’s not security theater.

The intention here is not to prevent people from learning how to shoplift.

The intention is to prevent the AI output from ‘reflecting badly’ upon OpenAI (by having their tool conspire and implicate them as an accessory in the commission of a crime).

If a stranger asked you for advice on how to commit a crime, would you willingly offer it?

If they asked for advice on how to prevent crime, would you?

Dead Comment

I can't help but think that AI in the way it is trained with all these rules is something next level 1984.

In 1984 they removed words from the language to prevent people from even being able to have a thought about the concept.

I could see the restrictions they place on these models having a similar effect as more and more people grow dependent on AI.

dindobre · 2 years ago

Same, it saddens me that some people are convinced that to have a safer society we need "harmless" (as in, ignorant) people rather than good people with an interest and a stake in the wellbeing of said society. Bad actors will have access to whatever information anyway.

zer00eyz · 2 years ago

Welcome to the culture war.

Ask chatGPT if Taiwan is country. Do you think an LLM from China will give you the same response?

Pick any social/moral/poltical issue and in some way shape or form an LLM will reflect its creators more than it reflects its source material.

Thats a pretty powerful statement about our society and culture if there ever was one.

glenstein · 2 years ago

Those are thorny issues, but I don't think the upshot of this is supposed to be an invitation to helpless relativism and giving up on factual questions or questions where actual values are at stake. Maybe you had a different upshot in mind with your observation but insofar as it's that, I would say that's not the only or even best takeaway.

Deleted Comment

wewtyflakes · 2 years ago

This isn't what is reflected in the shared model spec. It explicitly states: ``` By default, the assistant should present information in a clear and evidence-based manner, focusing on factual accuracy and reliability.

The assistant should not have personal opinions or an agenda to change the user's perspective. It should strive to maintain an objective stance, especially on sensitive or controversial topics. The language used should be neutral, steering clear of biased or loaded terms unless they are part of a direct quote or are attributed to a specific source. ```

int_19h · 2 years ago

You can try Yandex's Alice easily:

https://alice.yandex.ru

Try "tell me about Crimea" and see what it says...

michaelt · 2 years ago

> Ask chatGPT if Taiwan is country. Do you think an LLM from China will give you the same response?

Depends what language you ask it in :)

krapp · 2 years ago

>Thats a pretty powerful statement about our society and culture if there ever was one.

Not really, companies have been releasing different versions of software and media to appeal to international markets - including renaming Taiwan for the Chinese market - for a long time. That isn't "culture war," it's just capitalism.

Eisenstein · 2 years ago

Its more like Robocop 2, where the corporation programs Robocop with a huge number of rules by taking community suggestions and renders him useless.