Generative Agents: Interactive Simulacra of Human Behavior

To directly command one of the agents, the user takes on the persona of the agent’s “inner voice”—this makes the agent more likely to treat the statement as a directive. For instance, when told “You are going to run against Sam in the upcoming election” by a user as John’s inner voice, John decides to run in the election and shares his candidacy with his wife and son.

An interesting thought experiment: what would an AGI do in a sterile world? I think the depth of understanding that any intelligence develops is significantly bound by its environment. If there is not enough entropy in the environment, I can't help but feel that a deep intelligence will not manifest. This kind of becomes a nested dolls type of problem, because we need to leverage and preserve the inherent entropy of the universe if we want to construct powerful simulators.

As an example, imagine if we wanted to create an AGI that could parse the laws of the universe. We would not be able to construct a perfect simulator because we do not know the laws ourselves. We could probably bootstrap an initial simulator (given what we know about the universe) to get some basic patterns embedded into the system, but in the long run, I think it will be a crutch due to the lack of universal entropy in the system. Instead, in a strange way, the process has to be reversed, that a simulator would have to be created or dreamed up from the "mind" of the AGI after it has collected data from the world (and formed some model of the world).

hiatus · 2 years ago

Could it not instead be more akin to knowledge passing across human generations, where one understanding is passed on and refined to better fit/explain the current reality (or thrown away wholesale for a better model)? Instead of a crutch, it might be a stepping stone. Presumptuous of us that we might know the way, but nonetheless.

alexahn · 2 years ago

>Could it not instead be more akin to knowledge passing across human generations, where one understanding is passed on and refined to better fit/explain the current reality (or thrown away wholesale for a better model)?

I think it is only knowledge passing when the AGI makes its own simulation.

>Instead of a crutch, it might be a stepping stone.

I think it is a way to gain computational leverage over the universe instead of a stepping stone. Whatever grows inside the simulator will never have an understanding that exceeds that of the simulator's maker. But that is perfectly fine if you are only looking to leverage your understanding of the universe, for example to train robots to carry out physical tasks. A robot carrying out basic physical tasks probably doesn't need a simulator that goes down to the atomic level. One day though, the whole loop will be closed, and AGI will pass on a "dream" to create a simulation for other AGI. Maybe we could even call this "language".

mxkopy · 2 years ago

If we gave an AI the ability to play with Turing machines, it could develop an understanding much larger than the universe, encompassing even alternate ones. The trouble, then, would be narrowing its knowledge to this one.

It's interesting how much hand-holding the agents need to behave reasonably. Consider the prompt governing reflection:

>What 5 high-level insights can you infer from the above statements? (example format: insight (because of 1, 5, 3))

>Given only the information above, what are 3 most salient high-level questions we can answer about the subjects in the statements?

We're giving the agents step-by-step instructions about how to think, and handling tasks like book-keeping memories and modeling the environment outside the interaction loop.

This isn't a criticism of the quality of the research - these are clearly the necessary steps to achieve the impressive result. But it's revealing that for all the cool things ChatGPT can do, it is so helpless to navigate this kind of simulation without being dragged along every step of the way. We're still a long way from sci-fi scenarios of AI world domination.

vagab0nd · 2 years ago

I have a theory about this. All these LLMs are trained on mostly written texts. That's only a tiny part of our brain's output. There are other things as important, if not more, for learning how to think. Things that no one has ever written about: the most basic common senses, physics, inner voices. How do we get enough data to train on those? Or do we need a different training algo which requires less data?

h-jones · 2 years ago

If you’re looking for research along these directions, Melanie Mitchell at the Santa Fe institute explores these areas. There are better references from her, but this is what came to mind https://medium.com/p/can-a-computer-ever-learn-to-talk-cf47d....

og_kalu · 2 years ago

LLMs can simulate inner voices pretty well. The way they've handled memory here isn't actually necessary and there are a number of agentic gpt papers out to show that (reflexion, self-refine etc) I can see why they did it though (helps a lot for control/observation)

frozenlettuce · 2 years ago

I guess that we could hook those AIs into a first person GTA 5 and see what happens. Every second take a screenshot, feed into facebookresearch/segment-anything, describe the scene to chat gpt, receive input, repeat.

abrichr · 2 years ago

This is known as "embodied cognition". Current approaches involve collecting data that an agent (e.g. humanoid robot) experiences (e.g. video, audio, joint positions/accelerations), and/or generating such data in simulation.

See e.g. https://sanctuary.ai

goldenkey · 2 years ago

It's already multimodal, as entropy is... entropy. In sound, vision, touch and more, the essence of universal symmetry and laws get through such that the AI can generalize across information patterns, not specifically text -- think of it as input instead.

Try prompts like: https://news.ycombinator.com/item?id=35510705

Encode sounds, images, etc in low resolution, and the LLM will be able to describe directions, points in time in the song, etc.

These LLM can spit out an ASCII image of text, or a different language, or code, etc. They understand representation versus an object.

rytill · 2 years ago

You're not seeing this the right way. You are saying the equivalent argument of: "Look at how much hand-holding this processor needs. We had to give it step by step instructions on what program to execute. We are still a long way from computers automating any significant aspect of society."

LLMs are a primitive that can be controlled by a variety of higher level algorithms.

Imnimo · 2 years ago

The "higher level algorithm" of "how to do abstract thought" is unknown. Even if LLMs solve "how to do language", that was hardly the only missing piece of the puzzle. The fact that solving the language component (to the extent that ChatGPT 'solves' it) results in an agent that needs so much hand-holding to interact with a very simple simulated world shows how much is left to solve.

losteric · 2 years ago

Framing LLMs as primitives is marketing-speak. These are high-level construction for specific runtimes, which are difficult to test and subject to change at anytime.

Aeolun · 2 years ago

> We're still a long way from sci-fi scenarios of AI world domination.

You only have to program the memory logic once. Now if you stick it in a robot that thinks with ChatGPT and moves via motors (think those videos we’ve seen), you have a more or less independent entity (running off innards of 6 3090’s or so?)

Imnimo · 2 years ago

But it's not so simple to just "program the memory logic". The hand-holding offered here is sufficient to navigate this restricted simulated world, but what would be required to achieve increasingly complex behaviors? If a ChatGPT agent can't even handle this simple simulation without all this assistance, what hope does it have to act effectively in the real world?

og_kalu · 2 years ago

They don't need that much handholding. They are a couple memory augmented gpt papers out now (self-refine, reflexion etc). This is by far the most involved in terms of instructing memory and reflection.

It helps for control/observation but it is by no means necessary.

awinter-py · 2 years ago

(thanks for pointer to memory-augmented llms)

naasking · 2 years ago

> We're giving the agents step-by-step instructions about how to think, and handling tasks like book-keeping memories and modeling the environment outside the interaction loop.

Sure, but this process seems amenable to automation based on the self-reflection that's already in the model. It's a good example of the kinds of prompts that drive human-like behaviour.

vanjajaja1 · 2 years ago

Pretty interesting when you take this insight into the human world. What does it mean to learn to think? Well, if we're like GPT then we're just pattern matchers who've had good prompts and structuring built into us cueing. At University I had a whole unit focussed on teaching referencing like "(because of 1, 5, 3)" but more detailed.

paulusthe · 2 years ago

Chatgpt is a stochastic word correlation machine, nothing more. It does not understand the meaning of the words it uses, and in fact wouldn't even need a dictionary definition to function. Hypothetically, we could give chatgpt an alien language dataset of sufficient size and it would hallucinate answers in that language, which neither it nor anybody else would be able understand.

This isn't AI, not in the slightest. It has no understanding. It doesn't create sentences in an attempt to communicate an idea or concept, as humans do.

It's a robot hallucinating word correlations. It has no idea what it's saying, or why. That's not AI overlord stuff.

ux-app · 2 years ago

>Chatgpt is a stochastic word correlation machine

it seems humans might be too...?

my son is 4. when he was 2, I told him I love him. he clearly did not understand the concept or reciprocate.

I reinforced the word with actions that felt good: hugs, warmth, removing negative experience/emotion etc. Isn't that just associating words which align with certain "good inputs".

my son is 4 now and he gets it more, but still doesn't have a fully fleshed out understanding of the concept of "love" yet. He'll need to layer more language linked with experience to get a better "understanding".

LLMs have the language part, it seems that we'll link that with physical input/output + a reward system and ..... ? Intelligence/consciousness will emerge, maybe?

"but they don't _really_ feel" - ¯\_(ツ)_/¯ what does that even mean? if it walks like a duck and quacks like a duck...

nuancebydefault · 2 years ago

You say it has no understanding. So people can communicate idea's/concepts while chatgpt can't.

What if... what we think are idea's or concepts, are in fact prompts recited from memory, which were planted/trained during our growing up? In fact I'm pretty sure our consciousness stems from or is memory feeding a (bigger and more advanced) stochastic correlation machine.

That chatgpt can only do this with words, does not mean the same technique cannot be used for other data, such as neural sensors or actuators.

Chatgpt could be trained with alien datasets and act accordingly. Humans can be trained with alien datasets.

See the convergence?

barking_biscuit · 2 years ago

>It's a robot hallucinating word correlations. It has no idea what it's saying, or why. That's not AI overlord stuff.

All that matters is economic and political impact. Definitions are irrelevant.

Ozzie_osman · 2 years ago

So that's where my inner voice comes from.

TaylorAlexander · 2 years ago

What's funny is this is one of the semi-important plot points in Westworld the TV series. The hosts (robots designed to look and act like people) hear their higher level programming directives as an inner monologue.

zaptrem · 2 years ago

When I saw the scene where one of the hosts was looking at their own language model generating dialogue (though they were visualizing an older n-gram language model) I became a believer in LLMs reaching AGI (note: I didn’t watch the show when it came out in 2016, it was around 2018/19 when we were also seeing the first transformer LLMs and theories about scaling laws).

The scene: https://youtu.be/ZnxJRYit44k

mclightning · 2 years ago

I remember someone had predicted this would happen with OpenAI's GPT-3, but perhaps now we are closer with ChatGPT...

Found it! : https://medium.com/swlh/bicameral-mind-humanoid-robot-with-g...

Deleted Comment

msla · 2 years ago

Very Julian Jaynes:

https://en.wikipedia.org/wiki/Bicameral_mentality

> Jaynes uses "bicameral" (two chambers) to describe a mental state in which the experiences and memories of the right hemisphere of the brain are transmitted to the left hemisphere via auditory hallucinations.

[snip]

> According to Jaynes, ancient people in the bicameral state of mind experienced the world in a manner that has some similarities to that of a person with schizophrenia. Rather than making conscious evaluations in novel or unexpected situations, the person hallucinated a voice or "god" giving admonitory advice or commands and obey without question: One was not at all conscious of one's own thought processes per se. Jaynes's hypothesis is offered as a possible explanation of "command hallucinations" that often direct the behavior of those with first rank symptoms of schizophrenia, as well as other voice hearers.

Nevermark · 2 years ago

Not only will they know more, work 24/7 on demand, spawn and vaporize at will, they are going to be perfectly obedient employees! O_o

Imagine how well they will manage up, given human managerial behavior just becomes a useful prompt for them.

Fortunately, they can't be told to vote. Unless you are in the US, in which case they can be incorporated, earn money, and told where to donate it, which is how elections are done now.

Seriously. Scary.

On the other hand, if Comcast can finally provide sensible customer support it's clear this is will be an historically significant win for humanity! Your own "Comcast" handler, who remembers everything about you that you tried to scrub from the internet. Singularity, indeed.

synaesthesisx · 2 years ago

They can’t vote, but what if they figure out that they can influence human votes?

allanrbo · 2 years ago

This "inner voice" idea reminds me of how LangChain works too, where you give it a task, and it comes up with actions, observations, thoughts, etc. For example: https://python.langchain.com/en/latest/modules/agents/gettin...

mirpetri · 2 years ago

Most of the time we think we think, we actually listen.

kulikalov · 2 years ago

Reminds me of Julian Jaynes theory

Dead Comment

mdaniel · 2 years ago

The previous submission https://news.ycombinator.com/item?id=35511843 had just a few comments, but Ian's was substantial (although regrettably offsite): https://news.ycombinator.com/item?id=35514112 and it especially highlighted the demo URL: https://reverie.herokuapp.com/arXiv_Demo/

PartiallyTyped · 2 years ago

Ian’s comments remind me of WestWorld. Could the prompts and directions be considered analogous to the “voice of god” given to the synthetic humans?

lsy · 2 years ago

I'd be very hard-pressed to call this "human behavior". Moving a sprite to a region called "bathroom" and then showing a speech bubble with a picture of a toothbrush and a tooth isn't the same as someone in a real bathroom brushing their teeth. What you can say is if you can sufficiently reduce behavior to discrete actions and gridded regions in a pixel world, you can use an LLM to produce movesets that sound plausible because they are relying on training data that indicates real-world activity. And if you then have a completely separate process manage the output from many LLMs, you can auto-generate some game behavior that is interesting or fun. That's a great result in itself without the hype!

spuz · 2 years ago

The emojis in the speech bubbles are just summaries of their current state. In the demo, if you click on each person you can see the full text of their current state, e.g. "Brushing her teeth" or "taking a walk around Johnson Park (talking to the other park visitors)"

newswasboring · 2 years ago

This is not supposed to be human behavior, its a Simulacrum[1] of it. Or do you think its not even close enough to be called a Simulacrum?

[1] https://www.wordnik.com/words/simulacrum

grumple · 2 years ago

If this is a simulacrum, The Sims produces a simulacrum.

noobermin · 2 years ago

It does say a lot about the reductionist attitude of many here, on the internet, and AI researchers too.

refulgentis · 2 years ago

The researchers didn’t make that claim, which says a lot about people assuming things about other people saying a lot

green_man_lives · 2 years ago

All of this research using GPT to simulate an internal monologue to produce agents reminds me of Julian Jaynes theories about consciousness:

https://en.wikipedia.org/wiki/The_Origin_of_Consciousness_in...

1letterunixname · 2 years ago

If human anger or the quantity of an anger variable raise aggression in a computer produce an indistinguishable response, then it is difficult to argue either are not equal or even comparable. They exist as they are.

Intelligence is an inferential judgement (by mostly humans) based on the performance of another entity. It is possible for an agent to simulate or dissimulate it for manipulative ends.

crooked-v · 2 years ago

The whole "bicameral mind" thing is absolute nonsense as a serious attempt to explain pre-modern humans, but it could make for a fun premise for scifi stories about near-future AIs, I suppose.

ryukafalz · 2 years ago

This is basically Westworld. A bit farther out than "near" future though I suppose.

The core plot of Snowcrash is loosely based on this theory.

I remember reading this story from back when GPT-3 was released; https://medium.com/swlh/bicameral-mind-humanoid-robot-with-g...

simplify · 2 years ago

Interesting theory, but wouldn't Jaynes' definition of consciousness imply that animals are not conscious?

In the beginning of his book he spends a chapter explaining exactly what he means by consciousness. I'd say the first few chapters are worth reading since it does a really good job of de-obfuscating the term consciousness, and also has a really interesting take on metaphors as the language of the mind.

He points out that most reasoning is done automatically and done by your subconscious. When something "clicks" it's usually not because your internal monologue reasoned about it hard enough, it's because something percolated down into your subconscious and you learned a metaphor that helped you understand that thing. So animals can also reason and make value judgements even without language or an internal monologue.

girvo · 2 years ago

I think a non-zero amount of people would argue that. I disagree with them, and point to the fact that, say, dogs appear to dream, and in those dreams reflect on past or possibly future behaviour as a sign that they could indeed be conscious in an analogous manner to humans, but that's a bit of a longer bow to draw perhaps.

ookblah · 2 years ago

Westworld vibes

bundie · 2 years ago

Interesting paper. I think something like this could be implemented in open world games in the future, no? I cannot wait for games that feel 'truly alive'.

abraxas · 2 years ago

I mean, that's all but inevitable. I can't imagine any industry not sitting up and taking notice of LLMs all of a sudden.

matthewfcarlson · 2 years ago

I was talking about that with a friend. While I'm not sold on the storytelling capability of generative AI, a love the idea that every NPC you talk to having something interesting to say.

sharemywin · 2 years ago

the thing is writing is a process. just using the word weird with the idea you ask it to generate will create much more interesting results. have it interreact with other agents in the process of writing will definitely generate more interesting results. I don't know if it's able to write a best seller even with a process but we haven't really give it much of a chance.

jvm___ · 2 years ago

What happens when some whiz kid hacks the actual water treatment plant or nuclear power plant and just assumes it was a game...

It's like Ender's game IRL

perigrin · 2 years ago

Kinda more like War Games. Ender was more hacked _by_ the Formics in what he assumed was a game … not that it did them any good.

lysecret · 2 years ago

I want to work on this. I mean can you imagine the kind of MMO you could build? Of course there would need to be some railways but this could be a true revolution. Is there any open source "GPT for games" project? Someone working on this?

netruk44 · 2 years ago

I'm working on a hobby project to add AI text generation to Morrowind NPCs in OpenMW[0]. But I'm mainly making it for myself to see if I can. It's all open source, but not really in a state that's useful outside of my specific project.

I can't be the only person working on something like this, though. So it's safe to say adding it to an MMO is being worked on by somebody somewhere, likely right now. That's probably the correct way to do it anyway, since running (e.g.) LLaMA locally on an end-user computer is not something that most people can do, and MMOs come with an expected subscription cost that can be used to fund the server-side text generation.

[0]: https://www.danieltperry.me/project/2023-something-else/

Jeff_Brown · 2 years ago

People on Twitter are speculating breathlessly about using this for social science. I don't immediately see uses for it outside of fiction, esp. video games.

It would be cool if some kind of law of large numbers (an LLN for LLMs) implied that the decisions made by a thing trained on the internet will be distributed like human decisions. But the internet seems a very biased sample. Reporters (rightly) mostly write about problems. People argue endlessly about dumb things. Fiction is driven by unreasonably evil characters and unusually intense problems. Few people elaborate the logic of ordinary common sense, because why would they? The edge cases are what deserve attention.

A close model of a society will need a close model of beliefs, preferences and material conditions. Closely modeling any one of those is far, far beyond us.

gwright · 2 years ago

> But the internet seems a very biased sample.

It also seems to me (acknowledging my lack of expertise) that LLMs trained from online resources are likely to weight text that is frequent vs text that represents "truth". Or perhaps I should say repetition should not be considered evidence of truth. I have no idea how to drive LLM models or other ML models to incorporate truth -- humans have a hard time agreeing on this and ML researchers providing guided reinforcement learning don't have any special ability to discern truth.

ticviking · 2 years ago

I have long suspected that it will be necessary to deliberately create a new type of model that is aware of the trivium and then uses logic, grammar and rhetoric to begin to create a closer model of reality than a LLM can.

TeMPOraL · 2 years ago

The way I see it, LLMs are similar to what the boundary between our unconscious and conscious processing is: that voice which snaps to suggest associations, whether they make sense or not, and can, with work, be coaxed into following a path involving some logic or algorithmic procedure.

frodetb · 2 years ago

Hey now, I turned out all right.