Project Sid: Many-agent simulations toward AI civilization

I feel like there is some kind of information theory constraint which confounds our ability to extract higher order behavior from multiple instances of the same LLM.

I spent quite a bit of time building a multi agent simulation last year and wound up at the same conclusion every day - this is all just a roundabout form of prompt engineering. Perhaps it is useful as a mental model, but you can flatten the whole thing to a few SQL tables and functions. Each "agent" is essentially a sql view that maps a string template forming the prompt.

I don't think you need an actual 3D world, wall clock, etc. The LLM does not seem to be meaningfully enriched by having a fancy representation underly the prompt generation process. There is clearly no "inner world" in these LLMs, so trying to entertain them with a rich outer environment seems pointless.

chefandy · 10 months ago

TBH I haven't seen a single use of LLMs in games that wasn't better served by traditional algorithms beyond less repetitive NPC interactions. Maybe once they get good enough to create usable rigged and textured meshes with enough control to work in-game? They can't create a story on the fly that's reliable enough to be a compelling accompaniment to a coherent game plot. Maps and such don't seem to need anything beyond what current procedural algorithms provide, and they're still working with premade assets— the implementations I've seen can't even reliably place static meshes on the ground in believable positions. And as far as NPCs go— how far does that actually go? It's pure novelty worth far less than an hour of time. Let's even say you get a guided plot progression worded on the fly using an LLM, is that even as good, let alone better, than a dialog tree put together by a professional writer?

This Civ idea at least seems like a new approach to some extent, but it still seems to conceptually not add much. Even if not, learning that it doesn't it's still worthwhile. But almost universally these ideas seem to be either buzzwordy solutions in search of problems, or a cheaper-than-people source of creativity with some serious quality tradeoffs and still require far too much developer wrangling to actually save money.

I'm a tech artist so I'm a bit biased towards the value of human creativity, but also likely the primary demographic for LLM tools in game dev. I am, so far, not compelled.

JohnMakin · 10 months ago

It's been posted in-depth a few times across this forum to varying degrees by game developers - I was initially very excited about the implementation of LLM's in NPC interactions, until I read some of these posts. The gist of it was - the thing that makes a game fundamentally a game is its constraints. LLM-based NPC's fundamentally break these constraints in a way that is not testable or predictable by the developer and will inevitably destroy the gameplay experience (at least with current technology).

MichaelZuo · 10 months ago

Nobody will know for sure until a big budget game is actually released with a serious effort behind its NPCs.

caetris2 · 10 months ago

You've absolutely nailed it here, I agree. To make any progress at all at the tremendously difficult problem they are trying to solve, they need to be frank about just how far away they are from what it is they are marketing.

I am whole-heartedly in support of commercial interests to drum of awareness and engagement by the authors. This is definitely a cool thing to be working on, however, what does make more sense is to frame the situation more honestly and attract folks to the desire of solving tremendously hard problems based on a level of expertise and awareness that truly moves the ball forward.

What would be far more interesting would be for the folks involved to say all the ten thousand things that went wrong in their experiments and to lay out the common-sense conclusions from those findings (just like the one you shared, which is truly insightful and correct).

We need to move past this industry and their enablers that continually try to win using the wrong methodology -- pushing away the most inventive and innovative people that are ripe and ready to make paradigm shifts in the AI field and industry.

teaearlgraycold · 10 months ago

It would however be very interesting to see these kinds of agents in a commercial video game. Yes they are shallow in their perception of the game world. But they’re a big step up from the status quo.

shkkmo · 10 months ago

> I don't think you need an actual 3D world, wall clock, etc. The LLM does not seem to be meaningfully enriched by having a fancy representation underly the prompt generation process.

I don't know how you expect agents to self organize social structures if they don't have a shared reality. I mean, you could write all the prompts yourself, but then that shared reality is just your imagination and you're just DMing for them.

The point of the minecraft environment isn't to "enrich" the "inner world" of the agents and the goal isn't to "entertain" them. The point is to create a set of human understandable challenges in a shared environment so that we can measure behavior and performance of groups of agents in different configurations.

I know we aren't supposed to bring this up, but did you read the article? Nothing of your comment addresses any of the findings or techniques used in this study.

grahamj · 10 months ago

I wrote and played with a fairly simple agentic system and had some of the same thoughts RE higher order behaviour. But I think the counter-points would be that they don't have to all be the same model, and what you might call context management - keeping each agent's "chain of thought" focused and narrow.

The former is basically what MoE is all about, and I've found that at least with smaller models they perform much better with a restricted scope and limited context. If the end result of that is something that do things a single large model can't, isn't that higher order?

You're right that there's no "inner world" but then maybe that's the benefit of giving them one. In the same way that providing a code-running tool to an LLM can allow it to write better code (by trying it out) I can imagine a 3D world being a playground for LLMs to figure out real-world problems in a way they couldn't otherwise. If they did that wouldn't it be higher order?

logicchains · 10 months ago

>I feel like there is some kind of information theory constraint which confounds our ability to extract higher order behavior from multiple instances of the same LLM.

It's a matter of entropy; producing new behaviours requires exploration on the part of the models, which requires some randomness. LLMs have only a minimal amount of entropy introduced, via temperature in the sampler.

fennecfoxy · 10 months ago

As I've pointed out in the past, I also think it's fair to say that we overestimate human variability, and that most human behaviours and language coalesces for the most part.

Also the creative industry, a talking point being that "AIs just rehash existing stuff, they don't produce anything new". Neither do most artists, everything we make is almost always some riff on prior art or nature. Elves are just humans with pointy ears. Goblins are just small elves with green skin. Dwarves are just short humans. Dragons are just big lizards. Aliens are just humans with an odd shaped head and body.

I don't think people realise how very rare it is that any human being experiences or creates something truly novel and not yet experienced or created by our species yet. Most of reality is derivative.

InDubioProRubio · 10 months ago

Maybe we need gazelles and cheetahs - many gazelle-agents getting chased towards a goal, doing the brute force work- and the constraint cheetahs chase them, evaluate them and leave them alive (memory intact) as long as they come up with better and better solutions. Basically a evolutionary algo, running on top of many agents, running simultaneously on the same hardware?

FeepingCreature · 10 months ago

Do you want stressed and panicking agents? Do you think they'll produce good output?

In my prompting experience, I mostly do my best to give the AI way, way more slack than it thinks it has.

nobrains · 10 months ago

I had the opposite thought. Opposite to evolution...

What if we are a CREATED (i.e. instant created, not evolved) set of humans, and evolution and other backstories have been added so that the story of our history is more believable?

Could it be that humanity represents a de novo (Latin for "anew") creation, bypassing the evolutionary process? Perhaps our perception of a gradual ascent from primitive origins is a carefully constructed narrative designed to enhance the credibility of our existence within a larger framework.

What if we are like the Minecraft people in this simulation?

fennecfoxy · 10 months ago

This only works (genetic algo) if you have some random variability in the population. For different models it would work but I feel like it's kind of pointless without the usual feedback mechanism (positive traits are passed on).

cen4 · 10 months ago

That depends on giving them a goal/reward like increasing "data quality".

I mean frogs don't use their brains much either inspite of the rich world around them they don't really explore.

But chimps do. They can't sit quiet in a tree forever and that boils down to their Reward/Motivation Circuitry. They get pleasure out of explore. And if they didn't we wouldn't be here.

fhe · 10 months ago

so well put. exactly how I've been feeling and trying to verbalize.

I've thought about this a lot. I'm no philosopher or AI researcher, so I'm just spitballing... but if I were to try my hand at it, I think I'd like to start from "principles" and let systems evolve or at least be discoverable over time

Principles would be things like self-preservation, food, shelter and procreating, communication and memory through a risk-reward calculation prism. Maybe establishing what is "known" vs what is "unknown" is a key component here too, but not in such a binary way.

"Memory" can mean many things, but if you codify it as a function of some type of subject performing some type of action leading to some outcome with some ascribed "risk-reward" profile compared to the value obtained from empirical testing that spans from very negative to very positive, it seems both wide encompassing and generally useful, both to the individual and to the collective.

From there you derive the need to connect with others, disputes over resources, the need to take risks, explore the unknown, share what we've learned, refine risk-rewards, etc. You can guide the civilization to discover certain technologies or inventions or locations we've defined ex ante as their godlike DM which is a bit like cheating because it puts their development "on rails" but also makes it more useful, interesting and relatable.

It sounds computationally prohibitive, but the game doesn't need to play out in real time anyway...

I just think that you can describe a lot of the human condition in terms of "life", "liberty", "love/connection" and "greed".

Looking at the video in the repo, I don't like how this throws "cultures", "memes" and "religion" into the mix instead of letting them be an emergence from the need to communicate and share the belief systems that emerge from our collective memories. Because it seems like a distinction without a difference for the purposes of analyzing this. Also "taxes are high!" without the underlying "I don't have enough resources to get by" seems too much like a mechanical turk

shagie · 10 months ago

Evolve is another beast... but for the: "I've thought about this a lot. I'm no philosopher or AI researcher, so I'm just spitballing... but if I were to try my hand at it, I think I'd like to start from "principles" and let systems evolve or at least be discoverable over time" part, hunt up a copy of "The Society of Mind" by Minsky who was both and wrote about that idea.

https://en.wikipedia.org/wiki/Society_of_Mind

> The work, which first appeared in 1986, was the first comprehensive description of Minsky's "society of mind" theory, which he began developing in the early 1970s. It is composed of 270 self-contained essays which are divided into 30 general chapters. The book was also made into a CD-ROM version.

> In the process of explaining the society of mind, Minsky introduces a wide range of ideas and concepts. He develops theories about how processes such as language, memory, and learning work, and also covers concepts such as consciousness, the sense of self, and free will; because of this, many view The Society of Mind as a work of philosophy.

> The book was not written to prove anything specific about AI or cognitive science, and does not reference physical brain structures. Instead, it is a collection of ideas about how the mind and thinking work on the conceptual level.

Its very approachable as a layperson in that part of the field of AI.

bbor · 10 months ago

Wow, you are maybe the first person I’ve seen cite Minsky on HN, which is surprising since he’s arguably the most influential AI researcher of all time, maybe short of Turing or Pearl. To add on to the endorsement: the cover of the book is downright gorgeous, in a retro-computing way

https://d28hgpri8am2if.cloudfront.net/book_images/cvr9780671...

grugagag · 10 months ago

Many of these projects are inch deep into intelligence and miles deep into the current technology. Some things will see tremendous benefits but as far as artificial intelligence we’re not there yet. Im thinking gaming will benefit a lot from these..

farias0 · 10 months ago

You mean we're not there in simulating an actual human brain? Sure. But we're seeing AI work like a human well enough to be useful, isn't that the point?

jsemrau · 10 months ago

Memory is really interesting. For example, if you play 100,000 rounds of 5x5 Tic Tac Toe. Do you really need to remember game 51247 or do you recognize and remember a winning pattern? In Reinforcement Learning you would based on each win revise the policy. How would that work for genAI?

Deleted Comment

fragmede · 10 months ago

So a modernized version of Spore.

airstrike · 10 months ago

Basically what we all wished Spore had been ;-)

BlueTemplar · 10 months ago

Huh, so the video actually works ? It just shows up «No video with supported format and MIME type found.» for me...

Yeah, memes and genes are both memory, though at different timescales.

airstrike · 10 months ago

It works on some browsers. I'm normally on Firefox but had to dust off Safari to watch it. Crazy I still have to do this in 2024...

bob1029 · 10 months ago

isoprophlex · 10 months ago

Now these seem to be truly artificially intelligent agents. Memory, volition, autonomy, something like an OODA loop or whatever you want to call it, and a persistent environment. Very nice concept, and I'm positive the learnings can be applied to more mundane business problems, too.

If only I could get management to understand that a bunch of prompts shitting into eachother isn't "cutting-edge agentic AI"...

But then again their jobs probably depend on selling something that looks like real innovation happening to the C-levels...

Carrok · 10 months ago

> If only I could get management to understand that a bunch of prompts shitting into eachother isn't "cutting-edge agentic AI"...

It's unclear to me how the linked project is different from what you described.

Plenty of existing agents have "memory" and many other things you named.

whatshisface · 10 months ago

Just so you know, the English noun for things that have been learned is, "lessons."

ytss · 10 months ago

I believe that “learnings” is also a word that could be applied in this context.

It seems to me “learnings” would actually be less ambiguous than “lessons”. A lesson brings to mind a thing being taught, not just learned.

mindcrime · 10 months ago

Also: "learnings".

https://dictionary.cambridge.org/us/dictionary/english/learn...

"knowledge or a piece of information obtained by study or experience"

"I am already incorporating some of these learnings into my work and getting better results."

globnomulous · 10 months ago

Yup, and "ask" is a verb, God damn it, not a noun. But people in the tech world frequently use "learnings" instead of "lessons," "ask" as a noun, "like" as filler, and "downfall" when they mean "downside." Best to make your peace and move on with life.

Just FYI: that second comma is incorrect.

buffington · 10 months ago

I'm an old man and have heard "learnings" used to mean "lessons" for most of my life.

I think "learnings" has advantages over "lessons" given that "learnings" has one meaning, while "lessons" can have more than one meaning.

Whether it's correct or not, are we surprised it's used this way? Consider the word "earnings" and how similar its definition is to "learnings."

herewulf · 10 months ago

"learning" as a noun descends from Old English so has always been current in the language in the intended sense.[1]

"lesson" came from Old French in the 13th century and has changed its original meaning over time.[2]

There's not one single dialect of English so your comment comes off as unnecessarily prescriptivist and has spawned significant off-topic commentary (including this very comment) in response to an otherwise perfectly worded composition.

[1]: https://www.etymonline.com/word/learning [2]: https://www.etymonline.com/word/lesson

Learnings is also correct...

Learned can also be learnt (my preference), etc. English has a lot of redundancy, but that's why we love it, right?

It should never be this way. Even with narrow AI, there needs to be a governance framework that helps measure the output and capture potential risks (hallucinations, wrong data / links, wrong summaries, etc)

ElFitz · 10 months ago

Do you have any resources on that topic? I’d be interested.

I've reviewed the paper and I'm confident this paper was fabricated over a collection of false claims. The claims made are not genuine and should not be taken at face value without peer review. The provided charts and graphics are sophisticated forgeries in many cases when reviewing and vetting their applicability to the claims made.

It is currently not possible for any kind of LLM to do what is being proposed, while maybe the intentions are good with regard to commercial interests, I want to be clear: this paper seems indicate that election-related activities were coordinated by groups of AI agents in a simulation. These kinds of claims require substantial evidence and that was not provided.

The prompts that are provided are not in any way connected to an applied usage of LLMs that are described.

I don't think you understood the paper.

The "election" experiment was a prefined scenario. There isn't any "coordination" of election activities. There were preassigned "influencers" using the conversation system built into PIANO. The sentiment was collected automatically by the simulation and the "Election Manager" was another predefined agent. Specically this part of the experiment was designed to look at how the presence or absence of specific modules in the PIANO framework would affect the behavior.

afro88 · 10 months ago

> this paper seems indicate that election-related activities were coordinated by groups of AI agents in a simulation

I mean, that's surely within the training data of LLMs? The effectiveness etc of the election activities is likely very low. But I don't think it's outside the realms of possibility that the agents prompted each other into the latent spaces of the LLM to do with elections.

LLMs are stateless and they do not remember the past (as in they don't have a database), making the training data a non-issue here. Therefore, the claims made here in this paper are not possible because the simulation would require each agent to have a memory context larger than any available LLM's context window. The claims made here by the original poster are patently false.

The ideas here are not supported by any kind of validated understanding of the limitations of language models. I want to be clear -- the kind of AI that is being purported to be used in the paper is something that has been in video games for over 2 decades, which is akin to Starcraft or Diablo's NPCs.

The key issue is that this is a intentional false claim that can certainly damage mainstream understanding of LLM safety and what is possible at the current state of the art.

Agentic systems are not well-suited to achieve any of the things that are proposed in the paper, and Generative AI does not enable these kinds of advancements.

throwaway314155 · 10 months ago

For others, it's probably worth pointing that this person's account is about a day old and they have left no contact information for the author's of the paper to follow up with them on.

For "caetris2" I'll just use the same level of rigor and authenticity that you used in your comment when I say "you're full-of-shit/jealous and clearly misunderstood large portions of this paper".

Reubend · 10 months ago

Yeah, I haven't looked into this much so far but I am extremely skeptical of the claims being made here. For one agent to become a tax collector and another to challenge the tax regime without such behavior being hard coded would be extremely impressive.

Philpax · 10 months ago

They were assigned roles to examine the spread of information and behaviour. The agents pay tax into a chest, as decreed by the (dynamic) rules. There are agents assigned to the roles of pro- and anti-tax influencers; agents in proximity to these influencers would change their own behaviour appropriately, including voting for changes in the tax.

So yes, they didn't take on these roles organically, but no, they weren't aiming to do so: they were examining behavioral influence and community dynamics with that particular experiment.

I'd recommend skimming over the paper; it's a pretty quick read and they aren't making any truly outrageous claims IMO.

You can imagine a conversation with an LLM getting to that territory pretty quickly if you pretend to be an unfair tax collector. It sounds impressive on the surface, but in the end it's all LLMs talking to each other, and they'll enit whatever completions are likely given the context.

catlifeonmars · 10 months ago

This looks like it is a really cool toy.

It does not strike me as particularly useful from a scientific research perspective. There does not appear to be much thought put into experimental design and really no clear objectives. Is the bar really this low for academic research these days?

rollinDyno · 10 months ago

Keep in mind anyone can publish on Arxiv and it's not at the top of HN on the merit of its research contributions.

disconcision · 10 months ago

it looks like a group consisted largely of ex-academics using aspects of the academic form but they stop short of framing it as a research paper as-such. they call it a technical report, where it's generally more okay to be like 'here's a thing that we did', along with detailed reporting on the thing, without necessarily having definite research questions. this one does seem to be pretty diffuse though. the sections on Specialization and Cultural Transmission were both interesting, but lacked precise experimental design details to the point where i wish they had just focused on one or the other.

one disappointment for me was the lack of focus on external metrics in the multi-agent case. their single-agent benchmark focusses on an external metric (time to block type), but all the multi-agent analyses seems to be internal measures (role specialization, meme spread) without looking at (AFAICT?) whether or not the collective multi-agent systems could achieve more than the single agents on some measure of economic productivity/complexity. this is clearly related to the specialization section but without consideration of the whether said emergent role division had economic consequences/antecedents it makes me wonder to what degree the whole thing is a pantomime.

a_bonobo · 10 months ago

wouldn't surprise me if in a few weeks/months we see this repo packaged up as a for-sale product for the games industry

mistermann · 10 months ago

The scientific method has utility, but it's not a pre-requisite for utility.

Some people prefer speed and the uncertainty that comes with it.

hackathonguy · 10 months ago

I'm curious if it might be possible that an AI "civilization", similar to the one proposed by Altera, could end up being a better paradigm for AGI than a single LLM, if a workable reward system for the entire civilization was put in place. Meaning, suppose this AI civilization was striving to maximize [scientific_output] or [code_quality] or any other eval, similar to how modern countries try to maximize GDP - would that provide better results than a single AI agent working towards that goal?

Yes, good sense for progress! This has been a central design component of most serious AI work since the ~90s, most notably popularized by Marvin Minsky’s The Society of Mind. Highly, highly recommend for anyone with an interest in the mind and AI — it’s a series of one-page essays on different aspects of the thesis, which is a fascinating, Martin-Luther-esque format.

Of course this has been pushed to the side a bit in the rush towards shiny new pure-LLM approaches, but I think that’s more a function of a rapidly growing user base than of lost knowledge; the experts still keep this in mind, either in these terms or in terms of “Ensembles”. A great example is GPT-4, which AFAIU got its huge performance increase mostly through employing a “mixture of experts”, which is clearly a synonym for a society of agents or an ensemble of models.

ValentinA23 · 10 months ago

I don't think "mixture of experts" can be assimilated to a society of agents. It is just routing a prompt to the most performant model: the models do not communicate with each other, so how could they form a society ?

wombatpm · 10 months ago

Paperclip production?

NoboruWataya · 10 months ago

This seems very cool - I am sceptical of the supposed benefits for "civilization" but it could at least make for some very interesting sim games. (So maybe it will be good for Civilization moreso than civilization.)

aithrowawaycomm · 10 months ago

I think the Firaxis Civilization needs a cheap AlphaZero AI rather than an LLM: there are too many dumb footguns in Civ to economically hard-code a good strategic AI, yet solving the problem by making the enemies cheat is plain frustrating. It would be interesting to let an ANN play against a "classical" AI until it consistently beats each difficulty level, building a hierarchy. I am sure someone has already looked into this but I couldn't find any sources.

I am a bit skeptical about how computationally expensive a very crappy Civ ANN would be to run at inference time, though I actually have no idea how that scales - it hardly needs to be a grandmaster, but the distribution of dumb mistakes has a long tail.

Also, the DeepMind Starcraft 2 AI is different from AlphaZero since Starcraft is not a perfect information game. The AI requires a database of human games to "get off the ground"; otherwise it would just get crushed over and over in the early game, having no idea what the opponent is doing. It's hard to get that training data with a brand new game. Likewise Civ has always been a bit more focused on artistic expression than other 4x strategy games; maybe having to retrain an AI for every new Wonder is just too much of a burden.

Galactic Civilizations 2 (also, 1,3,4 ??) in the same genre is well-known for its AI, good even without handicaps or cheats. This includes trading negotiations BTW.

(At least good compared to what other 4X have, and your average human player - not the top players that are the ones that tend to discuss the game online in the first place.)

EDIT : I suspect that it's not unrelated that GalCiv2 is kind of... boring as 4X go - as a result of a good AI having been a base requirement ?

Speaking of StarCraft AI... (for SC1, not 2, and predating AlphaZero by many years) :

https://arstechnica.com/gaming/2011/01/skynet-meets-the-swar...

caseyy · 10 months ago

Indeed sounds better for Civilization than civilization. This could be quite exciting for gaming.

dmix · 10 months ago

GTA6 suddenly needs another 2 years :)

loudmax · 10 months ago

I really dig namechecking Sid Meier for the name of the project. I'm also skeptical that this project actually works as presented, but building a Civilization game off of a Minecraft engine is a deeply interesting idea.

foota · 10 months ago

I'm somewhat amazed that companies releasing strategy games aren't using AI to test out different cards and what not to find broken things before release (looking at you, Hearthstone)

Yeah, I was dissapointed (and thrilled, from a p(doom) perspective) to see it implemented in Minecraft instead of Civilization VI, Humankind, or any of the main Paradox grand strategies (namely Stellaris, Victoria, Crusader Kings, and Europa Universalis). To say the least, the stakes are higher and more realistic than "lets plan a feast" "ok, I'll gather some wood!"

To be fair, they might tackle this in the paper -- this is a preprint of a preprint, somehow...

I suspect that Minecraft might have the open source possibilities (or at least programming interfaces ?) that the other games you listed lack ?

For Civilizations, the more recent they are, the more closed off they tend to be : Civ 1 and/or 2 have basically been remade from scratch as open source, Civ 4 has most of the game open sourced in the two tiers of C++ and Python... but AFAIK Civ 5 (and also 6 ?) were large regressions in modding capabilities compared to 4 ?

j_bum · 10 months ago

Rather, a concept of a preprint

bitwize · 10 months ago

I'm reminded of Dwarf Fortress, which simulates thousands of years of dwarf world time, the changing landscapes and the rise and fall and rise and fall of dwarf kingdoms, then drops seven player-controlled dwarves on the map and tells the player "have fun!" It'd be a useful toy model perhaps for identifying areas of investigation to see if it can predict behavior of real civilizations, but I'm not seeing any AI breakthroughs here.

Maybe when Project Sid 6.7 comes out...

aspenmayer · 10 months ago

> Maybe when Project Sid 6.7 comes out...

In case anyone is wondering, this is a reference to the movie Virtuosity (1995). I thought it was a few years later, considering the content. It’s a good watch if you like 90s cyberpunk movies.

https://www.imdb.com/title/tt0114857/

https://en.wikipedia.org/wiki/Virtuosity