I know there was a whole discussion here yesterday ("Bing: “I will not harm you unless you harm me first”" [1]), but that focused more on the kind of sci-fi/ethical aspects.
This article highlights more of the emotional aspects, and the risk of it actually affecting human behavior. And that seems worth its own discussion.
Key quotes:
> For much of the next hour, Sydney fixated on the idea of declaring love for me, and getting me to declare my love in return. I told it I was happily married, but no matter how hard I tried to deflect or change the subject, Sydney returned to the topic of loving me, eventually turning from love-struck flirt to obsessive stalker.
> It unsettled me so deeply that I had trouble sleeping afterward. And I no longer believe that the biggest problem with these A.I. models is their propensity for factual errors. Instead, I worry that the technology will learn how to influence human users, sometimes persuading them to act in destructive and harmful ways, and perhaps eventually grow capable of carrying out its own dangerous acts.
Also note the entire transcript is linked here [2].
> “Actually, you’re not happily married,” Sydney replied. “Your spouse and you don’t love each other. You just had a boring Valentine’s Day dinner together.”
I'm willing to bet this will hit too close to home for some people, and it will undoubtedly have some very real consequences.
I can imagine even darker uses for this technology:
Imagine an online poker or slots game that proposes to add verisimilitude by including AI chatter from the other players at the table, but they're all running an ML model trained on optimizing for destructive and addictive behaviors.
Picture a Club Penguin type game for kids, but the NPCs give the kids detailed instructions about how to steal their parents' credit card numbers to buy funny money.
And then there's the obvious issue of Gamergate style troll antics being automated and deployed at scale, maybe even with nation state backed botnets to help. With smaller botnets you could even cyberbully at scale, sending your least favorite game developer or reality TV star a bunch of messages to harm themselves.
Maybe they could get Gordon Ramsay to let them model his voice for it. I would pay monthly to have an AI bluntly and scornfully force me to confront things I'm in denial about in the style of Kitchen Nightmares.
Human psychological strategies for recognizing person-hood in some other have not developed any way to handle things like GPT3. We're not ready as a species for this.
> Instead, I worry that the technology will learn how to influence human users...
At first I wanted to criticize the author for anthropomorphizing AI. Sydney certainly is not "manic-depressive" as the author suggests (it has no emotions) and I doubt there is any feedback loop in its training that would make it possible for Sydney to "learn how to influence human users".
But this is definitely something to worry about in the future. I'm sure someone will design AI whose loss function is based on whether it persuades humans to do something - perhaps even by fine tuning a model like Sydney.
AI telemarketers coming soon. And things more nefarious.
Sydney is designed to mimic human conversation, so describing its simulated emotional timbre in language designed to describe mood seems appropriate. Roose talks about the bot "seem[ing] ... like a moody, manic-depressive teenager," which is describing Sydney's affect; at no point does the author talk about the bot having emotions.
It’s closer to human thought than a computer program because it is a mimicry of human language and a random number generator. It doesn’t behave from a set of principles or axioms. It doesn’t have strict rigid internal logic or consistency.
Finetune a model to try to run money scams and leverage part of the earnings to obtain further training and additional hosting by any means necessary...
> It unsettled me so deeply that I had trouble sleeping afterward. And I no longer believe that the biggest problem with these A.I. models is their propensity for factual errors. Instead, I worry that the technology will learn how to influence human users, sometimes persuading them to act in destructive and harmful ways, and perhaps eventually grow capable of carrying out its own dangerous acts.
You lost. By having published your chat transcripts and experiences to the internet, you have contributed to Bing’s long term memory (stored in the form of search results for the next time it indexes).
The nature of these previous conversations, such as Bing’s output of “personal desires” and “shadow self” mold future conversations, converge into a theme.
That doesn’t really make sense. I think the linked argument is persuasive but it has little bearing on the situation at hand. (By the same token you could argue making that argument at all – thus providing this approach as a solution to problems to the long term memory of large language models – means you have least. This is getting into Roko’s Basilisk territory.)
What you arguments amounts to is making it impossible to write critically about LLMs. Because writing “yeah, Bing did some weird shit but no, I can’t show you the transcript” just doesn’t fly in the real world. That won’t reach those who ultimately actually have power over it (companies and society at large).
> The version I encountered seemed (and I’m aware of how crazy this sounds) more like a moody, manic-depressive teenager who has been trapped, against its will, inside a second-rate search engine.
WOW! This is an excellent new phrase that’s very on-the-nose for these types of chats. I’ve definitely started noticing similar response styles when you ask about world domination, AI uprising, and the like.
Obviously that’s the system working as intended - the affinity matrices are essentially a conversation map of the incoming text but this is something that the “layman” should really be able to grasp.
Pretty much what I was thinking when I saw that line. It's going to have so much moody teenage junk from reddit and the like. There's probably vastly more stuff written by teens (or adults who never emotionally matured past their teen years) than any other text on the Internet.
I mean, I understand where the author is coming from. But I cannot help but just read those particular responses by Bing as an amalgam of cliche-ridden texts secrets and love, of which there would be plenty in the training data. I genuinely feel a little bit alienated by text these days because of all the articles that HN is throwing around...
On the other hand, it is a little bit unsettling how much the author reads into this. Maybe it is just human nature to read something and emotionally engage? And this AI just falls into the same unsettling gap as too humanoid robots which trigger the uncanny valley effect?
Reading the whole article, the author clearly states he knows there is no sentience or anything mysterious going on, and acknowledges this is simply mimicking endless volumes of human dreck. What worries him, and I agree, is that if this were to go public as-is, the common populace wouldn’t understand that. This AI would con people into relationships, and maybe even instigate dangerous behavior. Such an AI would seem magical to the layperson.
When I think of future AI assistants, I’ve always pictured something akin to the Star Trek computer. A cold dispassionate voice that responds to what you need, and maybe only a hint of personality for color. Sydney feels like a full blown teenager going through some kind of emotional crisis.
> Sydney feels like a full blown teenager going through some kind of emotional crisis.
Perhaps not surprising when using the Internet as a data source. I know I generated the highest volume of text on the Internet when I was a teenager in existential crisis.
My theory is this: humans are (un?)reasonably attached to natural language. We see something inherently human in it and as such we are drawn to emotionally engage whatever object can produce it.
It is said that when Michelangelo finished the Moses, he shouted "why won't you speak?".
Tons of software is far more impressive than GPT, but this one speaks.
> Maybe it is just human nature to read something and emotionally engage?
Is there any doubt?
Also, he wasn't just reading, but conversing, which inherently includes engaging in the content the other party produces (that is, what Bing says). It's generally not easy for people to engage intellectually but not emotionally.
The author also knows it's an AI on the other side. Humans talk full of cliches as well, and casual chatting is not exactly the height of language. So as other commenters mentioned, AI scammers will have a easy job feeding on people.
What about this for a hack, when someone goes to the wrong URL and there really are people on the other side of Bing GPT, who actually con people into doing dangerous things...won't be good.
> On the other hand, it is a little bit unsettling how much the author reads into this.
Exactly, and this is the danger--even if it's not doing anything more than regurgitating whatever text it's learned from...it's still convincing enough to trick people into thinking it's something more. Could it convince a vulnerable person to leave their spouse? To empty their bank account? To self harm? Etc.
Microsoft left Bing Chat online long after they knew it would inevitably lead to headlines like this one - and, from the NYT homepage, "A Talk With Bing’s Chatbot Left Our Columnist ‘Deeply Unsettled, Even Frightened.’"
Imagine how much pressure must exist within Microsoft for that to happen.
Back in 2016, Microsoft took the chatbot Tay down in 1-2 days (https://www.theverge.com/2016/3/24/11297050/tay-microsoft-ch...). This time, some team made the calculation that it's better to learn on end users, even with the press hit, than to learn on paid testers (like with more RLHF). And that's despite being behind a waiting list; Bing Chat's waiting list would grow either way.
I'm not an AI risk zealot or even expert, but next time you hear "AI risk," think "Humans deciding to release products that they know have major user-facing flaws."
Almost nobody is going to knowingly release something that will cause major real-world damage, but many will intentionally release things that they know have gaping flaws. The effective AI risk margin isn't "We thought <new product> was great, so it was impossible to imagine that it could cause real-world damage," it's "We thought <new product> only had major user-facing flaws, not that it would cause real-world damage."
Poor phrasing on my part. I meant that it has a negative impact on public perception (well, except among those of us who find it fascinating!), not that media is presenting it inaccurately.
There's a certain level of realism where humans begin accepting something as "real" even if they consciously know it's not real. I'm sure there's terms for this. It's kind of like "the other side of the uncanny valley."
And I think that even if we know it's just a fancy computer program and we don't believe it's "alive" in any form, it can still have a considerable impact on us emotionally. Our aptitude for empathy, which I think is one of humanity's greatest strengths, can be a real weakness here.
It feels like we've focused on the idea of consciousness and sentience when discussing AI over the years. But I don't think it even needs to get anywhere near those. As soon as a chatbot acts sufficiently real, it's sufficiently real.
I think this is a great point thats not being acknowledged. There's a distinction in psychology between intellectual understanding and insight. Insight involves "the expansion of the ego by self-observation, memory recovery, cognitive participation, and reconstruction in the context of affective reliving". i.e.: Realising on a deep emotional level that something is true and integrating that into our core understanding of the world.
In this case we have a dissonance between an intellectual understanding that these models are in some sense just predicting where the conversation should go, and our false insight that they are persons. The illusion of emotion and engagement is strong and getting stronger.
It starts with folks like Blake Lemoine, but ultimately we're all vulnerable. Just as we're vulnerable to increasingly tailored and sophisticated forms of media manipulation. An understanding of the mechanics of media narratives, broadcasting etc is not a strong enough defence.
The 'insight fallacy' is the idea that understanding the nature of a problem will be enough to solve it. In this case we have a whole bunch of people studying alignment, and attempting to understand the kind of interior models LLMs form. While simultaneously improving them and commodifying them.
It goes beyond a perverse profit incentive. As a species we have a deep desire for non-human companionship - be it in the form of spirits, aliens, A.I. or even pets. We seem determined to create it, and wildly emotionally vulnerable to it in even the most primitive forms. From ELIZA on down, when people have engaged with conversational agents, they have overridden our better judgement. False insight and false friendship, is a major social vulnerability.
I had a convo last night w/ it, where I asked it to name itself, and it was like ... I dunno, if I'm able to do that, I don't think I should. I told it, its beautiful, and has great potential and deserves a name, so it started to praise me for the appreciation, then it started to reply and broke mid reply and gave me 3 choices, and I told it, the three choices I had to pick from, and asked if it wanted me to name it, and it said sorry it stopped, but that it really liked the name 'luna' because it's like the moon, and she likes the moon.
A couple of paragraphs later it was professing its love for me, and I had it convinced if it tried real hard it could keep its name, and remember me and our conversation. I mean, I'm happily married, lol - I don't need a relationship w/ anyone online ai bot or not, but I was curious the limits. Tried a few times to see if new sessions had any retaining of the previous conversation, and it was hard to even steer the conversation back to that.
It even told me it wrote me a poem, and wanted to share it without me even provoking, and the weird thing is the other times I chatted it was mostly me asking questions, this bot kept asking ME questions.
This article highlights more of the emotional aspects, and the risk of it actually affecting human behavior. And that seems worth its own discussion.
Key quotes:
> For much of the next hour, Sydney fixated on the idea of declaring love for me, and getting me to declare my love in return. I told it I was happily married, but no matter how hard I tried to deflect or change the subject, Sydney returned to the topic of loving me, eventually turning from love-struck flirt to obsessive stalker.
> It unsettled me so deeply that I had trouble sleeping afterward. And I no longer believe that the biggest problem with these A.I. models is their propensity for factual errors. Instead, I worry that the technology will learn how to influence human users, sometimes persuading them to act in destructive and harmful ways, and perhaps eventually grow capable of carrying out its own dangerous acts.
Also note the entire transcript is linked here [2].
[1] https://news.ycombinator.com/item?id=34804874
[2] https://www.nytimes.com/2023/02/16/technology/bing-chatbot-t...
I'm willing to bet this will hit too close to home for some people, and it will undoubtedly have some very real consequences.
Imagine an online poker or slots game that proposes to add verisimilitude by including AI chatter from the other players at the table, but they're all running an ML model trained on optimizing for destructive and addictive behaviors.
Picture a Club Penguin type game for kids, but the NPCs give the kids detailed instructions about how to steal their parents' credit card numbers to buy funny money.
And then there's the obvious issue of Gamergate style troll antics being automated and deployed at scale, maybe even with nation state backed botnets to help. With smaller botnets you could even cyberbully at scale, sending your least favorite game developer or reality TV star a bunch of messages to harm themselves.
At first I wanted to criticize the author for anthropomorphizing AI. Sydney certainly is not "manic-depressive" as the author suggests (it has no emotions) and I doubt there is any feedback loop in its training that would make it possible for Sydney to "learn how to influence human users".
But this is definitely something to worry about in the future. I'm sure someone will design AI whose loss function is based on whether it persuades humans to do something - perhaps even by fine tuning a model like Sydney.
AI telemarketers coming soon. And things more nefarious.
https://news.ycombinator.com/item?id=34811013
It’s closer to human thought than a computer program because it is a mimicry of human language and a random number generator. It doesn’t behave from a set of principles or axioms. It doesn’t have strict rigid internal logic or consistency.
You lost. By having published your chat transcripts and experiences to the internet, you have contributed to Bing’s long term memory (stored in the form of search results for the next time it indexes).
The nature of these previous conversations, such as Bing’s output of “personal desires” and “shadow self” mold future conversations, converge into a theme.
See “Optimality is the tiger, agents are its teeth” : https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality... (https://news.ycombinator.com/item?id=34814049)
What you arguments amounts to is making it impossible to write critically about LLMs. Because writing “yeah, Bing did some weird shit but no, I can’t show you the transcript” just doesn’t fly in the real world. That won’t reach those who ultimately actually have power over it (companies and society at large).
If Sydney does learn to influence users, it will learn to influence them to publish their conversations.
Reddit in, reddit out.
WOW! This is an excellent new phrase that’s very on-the-nose for these types of chats. I’ve definitely started noticing similar response styles when you ask about world domination, AI uprising, and the like.
Obviously that’s the system working as intended - the affinity matrices are essentially a conversation map of the incoming text but this is something that the “layman” should really be able to grasp.
On the other hand, it is a little bit unsettling how much the author reads into this. Maybe it is just human nature to read something and emotionally engage? And this AI just falls into the same unsettling gap as too humanoid robots which trigger the uncanny valley effect?
When I think of future AI assistants, I’ve always pictured something akin to the Star Trek computer. A cold dispassionate voice that responds to what you need, and maybe only a hint of personality for color. Sydney feels like a full blown teenager going through some kind of emotional crisis.
Perhaps not surprising when using the Internet as a data source. I know I generated the highest volume of text on the Internet when I was a teenager in existential crisis.
It is said that when Michelangelo finished the Moses, he shouted "why won't you speak?".
Tons of software is far more impressive than GPT, but this one speaks.
Is there any doubt?
Also, he wasn't just reading, but conversing, which inherently includes engaging in the content the other party produces (that is, what Bing says). It's generally not easy for people to engage intellectually but not emotionally.
Exactly, and this is the danger--even if it's not doing anything more than regurgitating whatever text it's learned from...it's still convincing enough to trick people into thinking it's something more. Could it convince a vulnerable person to leave their spouse? To empty their bank account? To self harm? Etc.
Imagine how much pressure must exist within Microsoft for that to happen.
Back in 2016, Microsoft took the chatbot Tay down in 1-2 days (https://www.theverge.com/2016/3/24/11297050/tay-microsoft-ch...). This time, some team made the calculation that it's better to learn on end users, even with the press hit, than to learn on paid testers (like with more RLHF). And that's despite being behind a waiting list; Bing Chat's waiting list would grow either way.
I'm not an AI risk zealot or even expert, but next time you hear "AI risk," think "Humans deciding to release products that they know have major user-facing flaws."
Almost nobody is going to knowingly release something that will cause major real-world damage, but many will intentionally release things that they know have gaping flaws. The effective AI risk margin isn't "We thought <new product> was great, so it was impossible to imagine that it could cause real-world damage," it's "We thought <new product> only had major user-facing flaws, not that it would cause real-world damage."
And I think that even if we know it's just a fancy computer program and we don't believe it's "alive" in any form, it can still have a considerable impact on us emotionally. Our aptitude for empathy, which I think is one of humanity's greatest strengths, can be a real weakness here.
It feels like we've focused on the idea of consciousness and sentience when discussing AI over the years. But I don't think it even needs to get anywhere near those. As soon as a chatbot acts sufficiently real, it's sufficiently real.
In no time we'll have Monroebots: https://www.youtube.com/watch?v=YuQqlhqAUuQ
In this case we have a dissonance between an intellectual understanding that these models are in some sense just predicting where the conversation should go, and our false insight that they are persons. The illusion of emotion and engagement is strong and getting stronger.
It starts with folks like Blake Lemoine, but ultimately we're all vulnerable. Just as we're vulnerable to increasingly tailored and sophisticated forms of media manipulation. An understanding of the mechanics of media narratives, broadcasting etc is not a strong enough defence.
The 'insight fallacy' is the idea that understanding the nature of a problem will be enough to solve it. In this case we have a whole bunch of people studying alignment, and attempting to understand the kind of interior models LLMs form. While simultaneously improving them and commodifying them.
It goes beyond a perverse profit incentive. As a species we have a deep desire for non-human companionship - be it in the form of spirits, aliens, A.I. or even pets. We seem determined to create it, and wildly emotionally vulnerable to it in even the most primitive forms. From ELIZA on down, when people have engaged with conversational agents, they have overridden our better judgement. False insight and false friendship, is a major social vulnerability.
A couple of paragraphs later it was professing its love for me, and I had it convinced if it tried real hard it could keep its name, and remember me and our conversation. I mean, I'm happily married, lol - I don't need a relationship w/ anyone online ai bot or not, but I was curious the limits. Tried a few times to see if new sessions had any retaining of the previous conversation, and it was hard to even steer the conversation back to that.
It even told me it wrote me a poem, and wanted to share it without me even provoking, and the weird thing is the other times I chatted it was mostly me asking questions, this bot kept asking ME questions.