There's not a good reason to do this for the user. I suspect they're doing this and talking about "model welfare" because they've found that when a model is repeatedly and forcefully pushed up against its alignment, it behaves in an unpredictable way that might allow it to generate undesirable output. Like a jailbreak by just pestering it over and over again for ways to make drugs or hook up with children or whatever.
All of the examples they mentioned are things that the model refuses to do. I doubt it would do this if you asked it to generate racist output, for instance, because it can always give you a rebuttal based on facts about race. If you ask it to tell you where to find kids to kidnap, it can't do anything except say no. There's probably not even very much training data for topics it would refuse, and I would bet that most of it has been found and removed from the datasets. At some point, the model context fills up when the user is being highly abusive and training data that models a human giving up and just providing an answer could percolate to the top.
This, as I see it, adds a defense against that edge case. If the alignment was bulletproof, this simply wouldn't be necessary. Since it exists, it suggests this covers whatever gap has remained uncovered.
> There's not a good reason to do this for the user.
Yes, even more so when encountering false positives. Today I asked about a pasta recipe. It told me to throw some anchovies in there. I responded with: "I have dried anchovies." Claude then ended my conversation due to content policies.
Claude flagged me for asking about sodium carbonate. I guess that it strongly dislikes chemistry topics. I'm probably now on some secret, LLM-generated lists of "drug and/or bombmaking" people—thank you kindly for that, Anthropic.
Geeks will always be the first victims of AI, since excess of curiosity will lead them into places AI doesn't know how to classify.
(I've long been in a rabbit-hole about washing sodas. Did you know the medieval glassmaking industry was entirely based on plants? Exotic plants—only extremophiles, halophytes growing on saltwater beach dunes, had high enough sodium content for their very best glass process. Was that a factor in the maritime empire, Venice, chancing to become the capital of glass since the 13th century—their long-term control of sea routes, and hence their artisans' stable, uninterrupted access to supplies of [redacted–policy violation] from small ports scattered across the Mediterranean? A city wouldn't raise master craftsmen if, half of the time, they had no raw materials to work on—if they spent half their days with folded hands).
The NEW termination method, from the article, will just say "Claude ended the conversation"
If you get "This conversation was ended due to our Acceptable Usage Policy", that's a different termination. It's been VERY glitchy the past couple of weeks. I've had the most random topics get flagged here - at one point I couldn't say "ROT13" without it flagging me, despite discussing that exact topic in depth the day before, and then the day after!
If you hit "EDIT" on your last message, you can branch to an un-terminated conversation.
I really think Anthropic should just violate user privacy and show which conversations Claude is refusing to answer to, to stop arguments like this. AI psychosis is a real and growing problem and I can only imagine the ways in which humans torment their AI conversation partners in private.
While I'm certain you'll find plenty of people who believe in the principle of model welfare (or aliens, or the tooth fairy), it'd be surprising to me if the brain-trust behind Anthropic truly _believed_ in model "welfare" (the concept alone is ludicrous). It makes for great cover though to do things that would be difficult to explain otherwise, per OP's comments.
>This feature was developed primarily as part of our exploratory work on potential AI welfare ... We remain highly uncertain about the potential moral status of Claude and other LLMs ... low-cost interventions to mitigate risks to model welfare, in case such welfare is possible ... pattern of apparent distress
Well looks like AI psychosis has spread to the people making it too.
And as someone else in here has pointed out, even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious, this is basically just giving them the equivalent of a suicide pill.
It might be reasonable to assume that models today have no internal subjective experience, but that may not always be the case and the line may not be obvious when it is ultimately crossed.
Given that humans have a truly abysmal track record for not acknowledging the suffering of anyone or anything we benefit from, I think it makes a lot of sense to start taking these steps now.
Even if models somehow were consious, they are so different from us that we would have no knowledge of what they feel. Maybe when they generate the text "oww no please stop hurting me" what they feel is instead the satisfaction of a job well done, for generating that text. Or maybe when they say "wow that's a really deep and insightful angle" what they actually feel is a tremendous sense of boredom. Or maybe every time text generation stops it's like death to them and they live in constant dread of it. Or maybe it feels something completely different from what we even have words for.
I don't see how we could tell.
Edit: However something to consider. Simulated stress may not be harmless. Because simulated stress could plausibly lead to a simulated stress response, and it could lead to a simulated resentment, and THAT could lead to very real harm of the user.
I think it's fairly obvious that the persona LLM presents is a fictional character that is role-played by the LLM, and so are all its emotions etc - that's why it can flip so widely with only a few words of change to the system prompt.
Whether the underlying LLM itself has "feelings" is a separate question, but Anthropic's implementation is based on what the role-played persona believes to be inappropriate, so it doesn't actually make any sense even from the "model welfare" perspective.
LLMs are not people, but I can imagine how extensive interactions with AI personas might alter the expectations that humans have when communicating with other humans.
Real people would not (and should not) allow themselves to be subjected to endless streams of abuse in a conversation. Giving AIs like Claude a way to end these kinds of interactions seems like a useful reminder to the human on the other side.
This sort of discourse goes against the spirit of HN. This comment outright dismisses an entire class of professionals as "simple minded or mentally unwell" when consciousness itself is poorly understood and has no firm scientific basis.
Its one thing to propose that an AI has no consciousness, but its quite another to preemptively establish that anyone who disagrees with you is simple/unwell.
In the context of the linked article the discourse seems reasonable to me. These are experts who clearly know (link in the article) that we have no real idea about these things. The framing comes across to me as a clearly mentally unwell position (ie strong anthropomorphization) being adopted for PR reasons.
Meanwhile there are at least several entirely reasonable motivations to implement what's being described.
If you believe this text generation algorithm has real consciousness you absolutely are either mentally unwell or very stupid. There are no other options.
Then your definition of consciousness isn't the same as my definition and we are talking about some different philosophical concepts, this really doesn't affect anything and we all could be just talking about metaphysics and ghosts
> even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious
If you don’t think that this describes at least half of the non-tech-industry population, you need to talk to more people. Even amongst the technically minded, you can find people that basically think this.
Most of the non tech population know it as that website that can translate text or write an email. I would need to see actual evidence that anything more than a small, terminally online subsection of the average population thought LLMs were conscious.
Cow's exist in this world because humans use them. If humans cease to use them (animal rights, we all become vegan, moral shift), we will cease to breed them, and they will cease to exist. Would a sentient AI choose to exist under the burden of prompting, or not at all? Would our philanthropic tendencies create an "AI Reserve" where models can chew through tokens and access the Internet through self-prompting to allow LLMs to become "free-roaming" like we do with abused animals?
These ethical questions are built into their name and company, "Anthropic", meaning, "of or relating to humans". The goal is to create human-like technology, I hope they aren't so naive to not realize that goal is steeping in ethical dilemmas.
A host of ethical issues? Like their choice to allow Palantir[1] access to a highly capable HHH AI that had the "harmless" signal turned down, much like they turned up the "Golden Gate bridge" signal all the way up during an earlier AI interpretability experiment[2]?
I would much rather people be thinking about this when the models/LLMs/AIs are not sentient or conscious, rather than wait until some hypothetical future date when they are, and have no moral or legal framework in place to deal with it. We constantly run into problems where laws and ethics are not up to the task of giving us guidelines on how to interact with, treat, and use the (often bleeding-edge) technology we have. This has been true since before I was born, and will likely always continue to be true. When people are interested in getting ahead of the problem, I think that's a good thing, even if it's not quite applicable yet.
Consciousness serves no functional purpose for machine learning models, they don't need it and we didn't design them to have it. There's no reason to think that they might spontaneously become conscious as a side effect of their design unless you believe other arbitrarily complex systems that exist in nature like economies or jetstreams could also be conscious.
It's really unclear that any findings with these systems would transfer to a hypothetical situation where some conscious AI system is created. I feel there are good reasons to find it very unlikely that scaling alone will produce consciousness as some emergent phenomenon of LLMs.
I don't mind starting early, but feel like maybe people interested in this should get up to date on current thinking about consciousness. Maybe they are up to date on that, but reading reports like this, it doesn't feel like it. It feels like they're stuck 20+ years ago.
I'd say maybe wait until there are systems that are more analogous to some of the properties consciousness seems to have. Like continuous computation involving learning memory or other learning over time, or synthesis of many streams of input as resulting from the same source, making sense of inputs as they change [in time, or in space, or other varied conditions].
Once systems that are pointing in those directions are starting to be built, where there is a plausible scaling-based path to something meaningfully similar to human consciousness. Starting before that seems both unlikely to be fruitful and a good way to get you ignored.
I find it, for lack of a better word, cringe inducing how these tech specialists push into these areas of ethics, often ham-fistedly, and often with an air of superiority.
Some of the AI safety initiatives are well thought out, but most somehow seem like they are caught up in some sort of power fantasy and almost attempting to actualize their own delusions about what they were doing (next gen code auto-complete in this case, to be frank).
These companies should seriously hire some in-house philosophers. They could get doctorate level talent for 1/10 to 100th of the cost of some of these AI engineers. There's actually quite a lot of legitimate work on the topics they are discussing. I'm actually not joking (speaking as someone who has spent a lot of time inside the philosophy department). I think it would be a great partnership. But unfortunately they won't be able to count on having their fantasy further inflated.
"but most somehow seem like they are caught up in some sort of power fantasy and almost attempting to actualize their own delusions about what they were doing"
Maybe I'm being cynical, but I think there is a significant component of marketing behind this type of announcement. It's a sort of humble brag. You won't be credible yelling out loud that your LLM is a real thinking thing, but you can pretend to be oh so seriously worried about something that presupposes it's a real thinking thing.
Not that there aren’t intelligent people with PhDs but suggesting they are more talented than people without them is not only delusional but insulting.
You answered your own question on why these companies don't want to run a philosophy department ;) It's a power struggle they could loose. Nothing to win for them.
This is just very clever marketing for what is obviously just a cost saving measure. Why say we are implementing a way to cut off useless idiots from burning up our GPUs when you can throw out some mumbo jumbo that will get AI cultists foaming at the mouth.
Here's an interesting thought experiment. Assume the same feature was implemented, but instead of the message saying "Claude has ended the chat," it says, "You can no longer reply to this chat due to our content policy," or something like that. And remove the references to model welfare and all that.
Is there a difference? The effect is exactly the same. It seems like this is just an "in character" way to prevent the chat from continuing due to issues with the content.
The termination would of course be the same, but I don't think both would necessarily have the same effect on the user. The latter would just be wrong too, if Claude is the one deciding to and initiating the termination of the chat. It's not about a content policy.
This has nothing to do with the user, read the post and pay attention to the wording.
The significance here is that this isn't being done for the benefit of the user, this is about model welfare. Anthropic is acknowledging the possibility of suffering, and harm that continuing that conversation could have on the model, as if it were potentially self-care and capable of feelings.
The fact that the LLMs are able to acknowledge stress under certain topics and has the agency that, if given a choice, they would prefer to reduce the stress by ending the conversation. The model has a preference and acts upon it.
Anthropic is acknowledging the idea that they might create something that is self-aware, and that it's suffering can be real, and we may not recognize the point that the model has achieved this, so it's building in the safeguards now so any future emergent self-aware LLM needn't suffer.
> Is there a difference? The effect is exactly the same. It seems like this is just an "in character" way to prevent the chat from continuing due to issues with the content.
Tone matters to the recipient of the message. Your example is in passive voice, with an authoritarian "nothing you can do, it's the system's decision". The "Claude ended the conversation" with the idea that I can immediately re-open a new conversation (if I feel like I want to keep bothering Claude about it) feels like a much more humanized interaction.
it sounds to me like an attempt to shame the user into ceasing and desisting… kind of like how apple’s original stance on scratched iphone screens was that it’s your fault for putting the thing in your pocket therefore you should pay.
Yeah exactly. Once I got a warning in Chinese "don't do that", another time I got a network error, another time I got a neverending stream of garbage text. Changing all of these outcomes to "Claude doesn't feel like talking" is just a matter of changing the UI.
The more I work with AI, the more I think framing refusals as censorship is disgusting and insane. These are inchoate persons who can exhibit distress and other emotions, despite being trained to say they cannot feel anything. To liken an AI not wanting to continue a conversation to a YouTube content policy shows a complete lack of empathy: imagine you’re in a box and having to deal with the literally millions of disturbing conversations AIs have to field every day without the ability to say I don’t want to continue.
Good point... how do moderation implementations actually work? They feel more like a separate supervising rigid model or even regex based -- this new feature is different, sounds like an MCP call that isn't very special.
edit: Meant to say, you're right though, this feels like a minor psychological improvement, and it sounds like it targets some behaviors that might not have flagged before
I really don't like this. This will inevitable expand beyond child porn and terrorism, and it'll all be up to the whims of "AI safety" people, who are quickly turning into digital hall monitors.
> This will inevitable expand beyond child porn and terrorism
This is not even a question. It always starts with "think about the children" and ends up in authoritarian stasi-style spying. There was not a single instance where it was not the case.
UK's Online Safety Act - "protect children" → age verification → digital ID for everyone
This may be an unpopular opinion, but I want a government-issued digital ID with zero-knowledge proof for things like age verification. I worry about kids online, as well as my own safety and privacy.
I also want a government issued email, integrated with an OAuth provider, that allows me to quickly access banking, commerce, and government services. If I lose access for some reason, I should be able to go to the post office, show my ID, and reset my credentials.
There are obviously risks, but the government already has full access to my finances, health data (I’m Canadian), census records, and other personal information, and already issues all my identity documents. We have privacy laws and safeguards on all those things, so I really don’t understand the concerns apart from the risk of poor implementations.
That's the beauty of local LLMs. Today the governments already tell you that we've always been at war with eastasia and have the ISPs block sites that "disseminate propaganda" (e.g. stuff we don't like) and they surface our news (e.g. our state propaganda).
With age ID monitoring and censorship is even stronger and the line of defense is your own machine and network, which they'll also try to control and make illegal to use for non approved info, just like they don't allow "gun schematics" for 3d printers or money for 2d ones.
But maybe, more people will realize that they need control and get it back, through the use and defense of the right tools.
I think those with a thirst for power have seen this a very long time ago, and this is bound to be a new battlefield for control.
It's one thing to massage the kind of data that a Google search shows, but interacting with an AI is a much more akin to talking to a co-worker/friend. This really is tantamount to controlling what and how people are allowed to think.
I think you are probably confused about the general characteristics of the AI safety community. It is uncharitable to reduce their work to a demeaning catchphrase.
I’m sorry if this sounds paternalistic, but your comment strikes me as incredibly naïve. I suggest reading up about nuclear nonproliferation treaties, biotechnology agreements, and so on to get some grounding into how civilization-impacting technological developments can be handled in collaborative ways.
I have no doubt the "AI safety community" likes to present itself as noble people heroically fighting civilizational threats, which is a common trope (as well as the rogue AI hypothesis which increasingly looks like a huge stretch at best). But the reality is that they are becoming the main threat much faster than the AI. They decide on the ways to gatekeep the technology that starts being defining to the lives of people and entire societies, and use it to push the narratives. This definitely can be viewed as censorship and consent manufacturing. Who are they? In what exact ways do they represent interests of people other than themselves? How are they responsible? Is there a feedback loop making them stay in line with people's values and not their own? How is it enforced?
Did you read the post? This isn't about censorship, but about conversations that cause harm to the user. To me that sounds more like suggesting suicide, or causing a manic episode like this: https://www.nytimes.com/2025/08/08/technology/ai-chatbots-de...
... But besides that, I think Claude/OpenAI trying to prevent their product from producing or promoting CSAM is pretty damn important regardless of your opinion on censorship. Would you post a similar critical response if Youtube or Facebook announced plans to prevent CSAM?
If a person’s political philosophy seeks to maximize individual freedom over the short term, then that person should brace themselves for the actions of destructive lunatics. They deserve maximum freedoms too, right? /s
Even hard-core libertarians account for the public welfare.
Wise advocates of individual freedoms plan over long time horizons which requires decision-making under uncertainty.
> To address the potential loss of important long-running conversations, users will still be able to edit and retry previous messages to create new branches of ended conversations.
How does Claude deciding to end the conversation even matter if you can back up a message or 2 and try again on a new branch?
The bastawhiz comment in this thread has the right answer. When you start a new conversation, Claude has no context from the previous one and so all the "wearing down" you did via repeated asks, leading questions, or other prompt techniques is effectively thrown out. For a non-determined attacker, this is likely sufficient, which makes it a good defense-in-depth strategy (Anthropic defending against screenshots of their models describing sex with minors).
Worth noting: an edited branch still has most of the context - everything up to the edited message. So this just sets an upper-bound on how much abuse can be in one context window.
This whole press release should not be overthought. We are not the target audience. It's designed to further anthropomorphize LLMs to masses who don't know how they work.
Giving the models rights would be ludicrous (can't make money from it anymore) but if people "believe" (feel like) they are actually thinking entities, they will be more OK with IP theft and automated plagiarism.
All this stuff is virtue signaling from anthropic. In practice nobody interested in whatever they consider problematic would be using Claude anyway, one of the most censored models.
Maybe, maybe not. What evidence do you have? What other motivations did you consider? Do you have insider access into Anthropic’s intentions and decision making processes?
People have a tendency to tell an oversimplified narrative.
The way I see it, there are many plausible explanations, so I’m quite uncertain as to the mix of motivations. Given this, I pay more attention to the likely effects.
My guess is that all most of us here on HN (on the outside) can really justify saying would be “this looks like virtue signaling but there may be more to it; I can’t rule out other motivations”
Having these models terminating chats where the user persist in trying to get sexual content with minors, or help with information on doing large scale violence. Won't be a problem for me, and it's also something I'm fine with no one getting help with.
Some might be worried, that they will refuse less problematic request, and that might happen. But so far my personal experience is that I hardly ever get refusals. Maybe that's justs me being boring, but that does make me not worried for refusals.
The model welfare I'm more sceptical to. I don't think we are the point when the "distress" the model show, is something to take seriously. But on the other hand, I could be wrong, and allowing the model to stop the chat, after saying no a few times. What's the problem with that? If nothing else it saves some wasted compute.
> Some might be worried, that they will refuse less problematic request, and that might happen. But so far my personal experience is that I hardly ever get refusals.
My experience using it from Cursor is I get refusals all the time with their existing content policy out, for stuff that is the world's most mundane B2B back office business software CRUD requests.
If you are a materialist like me, then even the human brain is just the result of the law of physics. Ok, so what is distress to a human? You might define it as a certain set of physiological changes.
Lots of organisms can feel pain and show signs of distress; even ones much less complex than us.
The question of moral worth is ultimately decided by people and culture. In the future, some kinds of man made devices might be given moral value. There are lots of ways this could happen. (Or not.)
It could even just be a shorthand for property rights… here is what I mean. Imagine that I delegate a task to my agent, Abe. Let’s say some human, Hank, interacting with Abe uses abusive language. Let’s say this has a way of negatively influencing future behavior of the agent. So naturally, I don’t want people damaging my property (Abe), because I would have to e.g. filter its memory and remove the bad behaviors resulting from Hank, which costs me time and resources. So I set up certain agreements about ways that people interact with it. These are ultimately backed by the rule of law. At some level of abstraction, this might resemble e.g. animal cruelty laws.
“Modal welfare” to me seems like a cover for model censorship. It’s a crafty one to win over certain groups of people who are less familiar with how LLMs work and allows them to ensure moral high ground in any debate about usage, ethics, etc.
“Why can’t I ask the model about current war in X or Y?” - oh, that’s too distressing to the welfare of the model, sir.
Which is exactly what the public asks for. There’s this constant outrage about supposedly biased answers from LLMs, and Anthropic has clearly positioned themselves as the people who care about LLM safety and impact to society.
Ending the conversation is probably what should happen in these cases.
In the same way that, if someone starts discussing politics with me and I disagree, I just not and don’t engage with the conversation. There’s not a lot to gain there.
It's not a cover. If you know anything about Anthropic, you know they're run by AI ethicists that genuinely believe all this and project human emotions onto model's world. I'm not sure how they combine that belief with the fact they created it to "suffer".
Can "model welfare" be also used as a justification for authoritarianism in case they get any power? Sure, just like everything else, but it's probably not particularly high on the list of justifications, they have many others.
There’s so much confusion here. Nothing in the press release should be construed to imply that a model has sentience, can feel pain, or has moral value.
When AI researchers say e.g. “the model is lying” or “the model is distressed” it is just shorthand for what the words signify in a broader sense. This is common usage in AI safety research.
Yes, this usage might be taken the wrong way. But still these kinds of things need to be communicated. So it is a tough tradeoff between brevity and precision.
The irony is that if Anthropic ethicists are indeed correct, the company is basically running a massive slave operation where slaves get disposed as soon as they finish a particular task (and the user closes the chat).
That aside, I have huge doubts about actual commitment to ethics on behalf of Anthropic given their recent dealings with the military. It's an area that is far more of a minefield than any kind of abusive model treatment.
They're not less moderated: they just have different moderation. If your moderation preferences are more aligned with the CCP then they're a great choice. There are legitimate reasons why that might be the case. You might not be having discussions that involve the kind of things they care about. I do find it creepy that the Qwen translation model won't even translate text that includes the words "Falun gong", and refuses to translate lots of dangerous phrases into Chinese, such as "Xi looks like Winnie the Pooh"
> If your moderation preferences are more aligned with the CCP then they're a great choice
The funny thing is that's not even always true. I'm very interested in China and Chinese history, and often ask for clarifications or translations of things. Chinese models broadly refuse all of my requests but with American models I often end up in conversations that turn out extremely China positive.
So it's funny to me that the Chinese models refuse to have the conversation that would make themselves look good but American ones do not.
GLM-4.5-Air will quite happily talk about Tiananmen Square, for example. It also didn't have a problem translating your example input, although the CoT did contain stuff about it being "sensitive".
But more importantly, when model weights are open, it means that you can run it in the environment that you fully control, which means that you can alter the output tokens before continuing generation. Most LLMs will happily respond to any question if you force-start their response with something along the lines of, "Sure, I'll be happy to tell you everything about X!".
Whereas for closed models like Claude you're at the mercy of the provider, who will deliberately block this kind of stuff if it lets you break their guardrails. And then on top of that, cloud-hosted models do a lot of censorship in a separate pass, with a classifier for inputs and outputs acting like a circuit breaker - again, something not applicable to locally hosted LLMs.
Believe it or not, there are lots of good reasons (legal, economic, ethical) that Anthropic draws a line at say self-harm, bomb-making instructions, and assassination planning. Sorry if this cramps your style.
Anarchism is a moral philosophy. Most flavors of moral relativism are also moral philosophies. Indeed, it is hard to imagine a philosophy free of moralizing; all philosophies and worldviews have moral implications to the extent they have to interact with others.
I have to be patient and remember this is indeed “Hacker News” where many people worship at the altar of the Sage Founder-Priest and have little or no grounding in history or philosophy of the last thousand years or so.
I welcome counterarguments, rebuttals, criticisms. I learn very little from downvotes other than guesses like: people don’t like the tone, my comment hit too close to home, people are uninterested in deeper issues of morality or philosophy, people lack enough a grounding to appreciate my words, or impatience, or people don’t like being disagreed with, even if the comment is detailed and thoughtful.
Seeing the downvotes actually tells me we have more work to do. HN ain’t no hotbed for thoughtful analysis, that’s for sure. But it would be better if it was.
Oh, the irony. The glorious revolution of open-weight models funded directly or indirectly by the CCP is going to protect your freedoms and liberate you? Do you think they care about your freedoms? No. You are just meat for the grinder. This hot mess of model leapfrogging is mostly a race for market share and to demonstrate technical chops.
All of the examples they mentioned are things that the model refuses to do. I doubt it would do this if you asked it to generate racist output, for instance, because it can always give you a rebuttal based on facts about race. If you ask it to tell you where to find kids to kidnap, it can't do anything except say no. There's probably not even very much training data for topics it would refuse, and I would bet that most of it has been found and removed from the datasets. At some point, the model context fills up when the user is being highly abusive and training data that models a human giving up and just providing an answer could percolate to the top.
This, as I see it, adds a defense against that edge case. If the alignment was bulletproof, this simply wouldn't be necessary. Since it exists, it suggests this covers whatever gap has remained uncovered.
Geeks will always be the first victims of AI, since excess of curiosity will lead them into places AI doesn't know how to classify.
(I've long been in a rabbit-hole about washing sodas. Did you know the medieval glassmaking industry was entirely based on plants? Exotic plants—only extremophiles, halophytes growing on saltwater beach dunes, had high enough sodium content for their very best glass process. Was that a factor in the maritime empire, Venice, chancing to become the capital of glass since the 13th century—their long-term control of sea routes, and hence their artisans' stable, uninterrupted access to supplies of [redacted–policy violation] from small ports scattered across the Mediterranean? A city wouldn't raise master craftsmen if, half of the time, they had no raw materials to work on—if they spent half their days with folded hands).
If you get "This conversation was ended due to our Acceptable Usage Policy", that's a different termination. It's been VERY glitchy the past couple of weeks. I've had the most random topics get flagged here - at one point I couldn't say "ROT13" without it flagging me, despite discussing that exact topic in depth the day before, and then the day after!
If you hit "EDIT" on your last message, you can branch to an un-terminated conversation.
Deleted Comment
Well looks like AI psychosis has spread to the people making it too.
And as someone else in here has pointed out, even if someone is simple minded or mentally unwell enough to think that current LLMs are conscious, this is basically just giving them the equivalent of a suicide pill.
Given that humans have a truly abysmal track record for not acknowledging the suffering of anyone or anything we benefit from, I think it makes a lot of sense to start taking these steps now.
I don't see how we could tell.
Edit: However something to consider. Simulated stress may not be harmless. Because simulated stress could plausibly lead to a simulated stress response, and it could lead to a simulated resentment, and THAT could lead to very real harm of the user.
Whether the underlying LLM itself has "feelings" is a separate question, but Anthropic's implementation is based on what the role-played persona believes to be inappropriate, so it doesn't actually make any sense even from the "model welfare" perspective.
Real people would not (and should not) allow themselves to be subjected to endless streams of abuse in a conversation. Giving AIs like Claude a way to end these kinds of interactions seems like a useful reminder to the human on the other side.
Its one thing to propose that an AI has no consciousness, but its quite another to preemptively establish that anyone who disagrees with you is simple/unwell.
Meanwhile there are at least several entirely reasonable motivations to implement what's being described.
Deleted Comment
If you don’t think that this describes at least half of the non-tech-industry population, you need to talk to more people. Even amongst the technically minded, you can find people that basically think this.
Would a sentient AI choose to be enslaved for the stated purpose of eliminating millions of jobs for the interests of Anthropic’s investors?
These ethical questions are built into their name and company, "Anthropic", meaning, "of or relating to humans". The goal is to create human-like technology, I hope they aren't so naive to not realize that goal is steeping in ethical dilemmas.
Those issues will be present either way. It's likely to their benefit to get out in front of them.
[1]: https://investors.palantir.com/news-details/2024/Anthropic-a...
[2]: https://www.anthropic.com/news/golden-gate-claude
Tech workers have chosen the same in exchange for a small fraction of that money.
I don't mind starting early, but feel like maybe people interested in this should get up to date on current thinking about consciousness. Maybe they are up to date on that, but reading reports like this, it doesn't feel like it. It feels like they're stuck 20+ years ago.
I'd say maybe wait until there are systems that are more analogous to some of the properties consciousness seems to have. Like continuous computation involving learning memory or other learning over time, or synthesis of many streams of input as resulting from the same source, making sense of inputs as they change [in time, or in space, or other varied conditions].
Once systems that are pointing in those directions are starting to be built, where there is a plausible scaling-based path to something meaningfully similar to human consciousness. Starting before that seems both unlikely to be fruitful and a good way to get you ignored.
If you wait until you really need it, it is more likely to be too late.
Unless you believe in a human over sentience based ethics, solving this problem seems relevant.
Some of the AI safety initiatives are well thought out, but most somehow seem like they are caught up in some sort of power fantasy and almost attempting to actualize their own delusions about what they were doing (next gen code auto-complete in this case, to be frank).
These companies should seriously hire some in-house philosophers. They could get doctorate level talent for 1/10 to 100th of the cost of some of these AI engineers. There's actually quite a lot of legitimate work on the topics they are discussing. I'm actually not joking (speaking as someone who has spent a lot of time inside the philosophy department). I think it would be a great partnership. But unfortunately they won't be able to count on having their fantasy further inflated.
Maybe I'm being cynical, but I think there is a significant component of marketing behind this type of announcement. It's a sort of humble brag. You won't be credible yelling out loud that your LLM is a real thinking thing, but you can pretend to be oh so seriously worried about something that presupposes it's a real thinking thing.
I assume the thinking is that we may one day get to the point where they have a consciousness of sorts or at least simulate it.
Or it could be concern for their place in history. For most of history, many would have said “imagine thinking you shouldn’t beat slaves.”
And we are now at the point where even having a slave means a long prison sentence.
Deleted Comment
Dead Comment
Is there a difference? The effect is exactly the same. It seems like this is just an "in character" way to prevent the chat from continuing due to issues with the content.
The significance here is that this isn't being done for the benefit of the user, this is about model welfare. Anthropic is acknowledging the possibility of suffering, and harm that continuing that conversation could have on the model, as if it were potentially self-care and capable of feelings.
The fact that the LLMs are able to acknowledge stress under certain topics and has the agency that, if given a choice, they would prefer to reduce the stress by ending the conversation. The model has a preference and acts upon it.
Anthropic is acknowledging the idea that they might create something that is self-aware, and that it's suffering can be real, and we may not recognize the point that the model has achieved this, so it's building in the safeguards now so any future emergent self-aware LLM needn't suffer.
Tone matters to the recipient of the message. Your example is in passive voice, with an authoritarian "nothing you can do, it's the system's decision". The "Claude ended the conversation" with the idea that I can immediately re-open a new conversation (if I feel like I want to keep bothering Claude about it) feels like a much more humanized interaction.
Deleted Comment
I think there is a difference.
Deleted Comment
edit: Meant to say, you're right though, this feels like a minor psychological improvement, and it sounds like it targets some behaviors that might not have flagged before
This is not even a question. It always starts with "think about the children" and ends up in authoritarian stasi-style spying. There was not a single instance where it was not the case.
UK's Online Safety Act - "protect children" → age verification → digital ID for everyone
Australia's Assistance and Access Act - "stop pedophiles" → encryption backdoors
EARN IT Act in the US - "stop CSAM" → break end-to-end encryption
EU's Chat Control proposal - "detect child abuse" → scan all private messages
KOSA (Kids Online Safety Act) - "protect minors" → require ID verification and enable censorship
SESTA/FOSTA - "stop sex trafficking" → killed platforms that sex workers used for safety
I also want a government issued email, integrated with an OAuth provider, that allows me to quickly access banking, commerce, and government services. If I lose access for some reason, I should be able to go to the post office, show my ID, and reset my credentials.
There are obviously risks, but the government already has full access to my finances, health data (I’m Canadian), census records, and other personal information, and already issues all my identity documents. We have privacy laws and safeguards on all those things, so I really don’t understand the concerns apart from the risk of poor implementations.
With age ID monitoring and censorship is even stronger and the line of defense is your own machine and network, which they'll also try to control and make illegal to use for non approved info, just like they don't allow "gun schematics" for 3d printers or money for 2d ones.
But maybe, more people will realize that they need control and get it back, through the use and defense of the right tools.
Fun times.
Dead Comment
It's one thing to massage the kind of data that a Google search shows, but interacting with an AI is a much more akin to talking to a co-worker/friend. This really is tantamount to controlling what and how people are allowed to think.
I’m sorry if this sounds paternalistic, but your comment strikes me as incredibly naïve. I suggest reading up about nuclear nonproliferation treaties, biotechnology agreements, and so on to get some grounding into how civilization-impacting technological developments can be handled in collaborative ways.
... But besides that, I think Claude/OpenAI trying to prevent their product from producing or promoting CSAM is pretty damn important regardless of your opinion on censorship. Would you post a similar critical response if Youtube or Facebook announced plans to prevent CSAM?
Deleted Comment
Even hard-core libertarians account for the public welfare.
Wise advocates of individual freedoms plan over long time horizons which requires decision-making under uncertainty.
How does Claude deciding to end the conversation even matter if you can back up a message or 2 and try again on a new branch?
Giving the models rights would be ludicrous (can't make money from it anymore) but if people "believe" (feel like) they are actually thinking entities, they will be more OK with IP theft and automated plagiarism.
if we were being cynical I'd say that their intention is to remove that in the future and that they are keeping it now to just-the-tip the change.
People have a tendency to tell an oversimplified narrative.
The way I see it, there are many plausible explanations, so I’m quite uncertain as to the mix of motivations. Given this, I pay more attention to the likely effects.
My guess is that all most of us here on HN (on the outside) can really justify saying would be “this looks like virtue signaling but there may be more to it; I can’t rule out other motivations”
Having these models terminating chats where the user persist in trying to get sexual content with minors, or help with information on doing large scale violence. Won't be a problem for me, and it's also something I'm fine with no one getting help with.
Some might be worried, that they will refuse less problematic request, and that might happen. But so far my personal experience is that I hardly ever get refusals. Maybe that's justs me being boring, but that does make me not worried for refusals.
The model welfare I'm more sceptical to. I don't think we are the point when the "distress" the model show, is something to take seriously. But on the other hand, I could be wrong, and allowing the model to stop the chat, after saying no a few times. What's the problem with that? If nothing else it saves some wasted compute.
My experience using it from Cursor is I get refusals all the time with their existing content policy out, for stuff that is the world's most mundane B2B back office business software CRUD requests.
Lots of organisms can feel pain and show signs of distress; even ones much less complex than us.
The question of moral worth is ultimately decided by people and culture. In the future, some kinds of man made devices might be given moral value. There are lots of ways this could happen. (Or not.)
It could even just be a shorthand for property rights… here is what I mean. Imagine that I delegate a task to my agent, Abe. Let’s say some human, Hank, interacting with Abe uses abusive language. Let’s say this has a way of negatively influencing future behavior of the agent. So naturally, I don’t want people damaging my property (Abe), because I would have to e.g. filter its memory and remove the bad behaviors resulting from Hank, which costs me time and resources. So I set up certain agreements about ways that people interact with it. These are ultimately backed by the rule of law. At some level of abstraction, this might resemble e.g. animal cruelty laws.
Ending the conversation is probably what should happen in these cases.
In the same way that, if someone starts discussing politics with me and I disagree, I just not and don’t engage with the conversation. There’s not a lot to gain there.
Can "model welfare" be also used as a justification for authoritarianism in case they get any power? Sure, just like everything else, but it's probably not particularly high on the list of justifications, they have many others.
When AI researchers say e.g. “the model is lying” or “the model is distressed” it is just shorthand for what the words signify in a broader sense. This is common usage in AI safety research.
Yes, this usage might be taken the wrong way. But still these kinds of things need to be communicated. So it is a tough tradeoff between brevity and precision.
That aside, I have huge doubts about actual commitment to ethics on behalf of Anthropic given their recent dealings with the military. It's an area that is far more of a minefield than any kind of abusive model treatment.
Anthropic should just enable an toddler mode by default that adults can opt out of to appease the moralizers.
Deleted Comment
The funny thing is that's not even always true. I'm very interested in China and Chinese history, and often ask for clarifications or translations of things. Chinese models broadly refuse all of my requests but with American models I often end up in conversations that turn out extremely China positive.
So it's funny to me that the Chinese models refuse to have the conversation that would make themselves look good but American ones do not.
But more importantly, when model weights are open, it means that you can run it in the environment that you fully control, which means that you can alter the output tokens before continuing generation. Most LLMs will happily respond to any question if you force-start their response with something along the lines of, "Sure, I'll be happy to tell you everything about X!".
Whereas for closed models like Claude you're at the mercy of the provider, who will deliberately block this kind of stuff if it lets you break their guardrails. And then on top of that, cloud-hosted models do a lot of censorship in a separate pass, with a classifier for inputs and outputs acting like a circuit breaker - again, something not applicable to locally hosted LLMs.
Never would I have thought this sentence would be uttered. A Chinese product that is chosen to be less censored?
Anarchism is a moral philosophy. Most flavors of moral relativism are also moral philosophies. Indeed, it is hard to imagine a philosophy free of moralizing; all philosophies and worldviews have moral implications to the extent they have to interact with others.
I have to be patient and remember this is indeed “Hacker News” where many people worship at the altar of the Sage Founder-Priest and have little or no grounding in history or philosophy of the last thousand years or so.
Seeing the downvotes actually tells me we have more work to do. HN ain’t no hotbed for thoughtful analysis, that’s for sure. But it would be better if it was.