Some thoughts on LLMs and software development

> My former colleague Rebecca Parsons, has been saying for a long time that hallucinations aren’t a bug of LLMs, they are a feature. Indeed they are the feature. All an LLM does is produce hallucinations, it’s just that we find some of them useful.

This is an example of my least favorite style of feigned insight: redefining a term into meaninglessness just so you can say something that sounds different while not actually saying anything new.

Yes, if you redefine "hallucination" from "produce output containing detailed information despite that information not being grounded in external reality, in a manner distantly analogous to a human reporting sense data produced by a literal hallucination rather than the external inputs that are presumed normally to ground sense data" to "produce output", its true that all LLMs do is "hallucinate", and that "hallucinating" is not a undesirable behavior.

But you haven't said anything new about the thing that was called "hallucination" by everyone else, or about the thing--LLM output in general--that you have called "hallucination". Everyone already knew that producing output wasn't undesirable. You've just taken the label conventionally attached to a bad behavior, attached it to a broader category that includes all behavior, and used the power of equivocation to make something that sounds novel without saying anything new.

scott_w · 4 months ago

Fowler is not really redefining "hallucination." He's using a form of irony that emphasises how fundamental "hallucinations" are to the operation of the system. One might also say "you can't get rid of collateral damage from bombs. Indeed, collateral damage is the feature, it's just some of that is what we want to blow up."

You're not meant to take it literally.

xyzzy123 · 4 months ago

You might as well say it's interpolating or extrapolating. That's what people are usually doing too, even when recalling situations that they were personally involved in.

I think we call it "hallucinating" when the machine does this in an un-human-like way.

thwarted · 4 months ago

It's meant to contrast/correct the claim that

"LLMs produce either truth or they produce hallucinations."

Claims worded like that give the impression that if we can just reduce/eliminate the hallucinations, all that will remain will be the truth.

But that's not the case. What is the case is that all the output is the same thing, hallucination, and that some of those hallucinations just so happen to reflect reality (or expectations) so appears to embody truth.

It's like rolling a die and wanting one to come up, and when it does saying "the die knew what I wanted".

An infinite number of monkeys typing on an infinite number of typewriters will eventually produce the script for Hamlet.

That doesn't mean the monkeys know about Shakespeare, or Hamlet, or even words, or even what they are doing being chained to typewriters.

We've found a way to optimize the infinite typing monkeys to output something that passes for Hamlet much sooner than infinity.

sceptic123 · 4 months ago

I actually found that comment interesting. It's pointing towards something I've struggled with around LLMs. They are (currently) incapable of knowing if what they output is correct, so the idea that "it's all hallucinations" acknowledges that point and gives useful context for anyone using LLMs for software development.

Eridrus · 4 months ago

Humans are also incapable of knowing whether their output is correct. We merely convince ourselves that it is and then put our thoughts in contact with the external world and other people to see if we actually are.

cess11 · 4 months ago

The point, though awkwardly stated, is that there is no difference between 'hallucination' output and any other output from these compressed databases, just like there is no such difference between two queries on a typical RDBMS.

It's a good point.

dragonwriter · 4 months ago

> The point, though awkwardly stated, is that there is no difference between 'hallucination' output and any other output from these compressed databases,

But there is, except when you redefine "hallucination" so there isn't. And, when you retain the definition where there is a difference, you find there are techniques by which you can reduce hallucinations, which is important and useful. Changing the definition to eliminate the distinction is actively harmful to understanding and productive use of LLMs, for the benefit of making what superficially seems like an insightful comment.

bartread · 4 months ago

I’ve never liked that this behaviour is described using the term “hallucination”.

If a human being talked confidently about something that they were just making up out of thin air by synthesizing based (consciously or unconsciously) on other information they know you wouldn’t call it “hallucination”: you’d call it “bullshit”.

And, honestly, “bullshit” is a much more helpful way of thinking about this behaviour because it somewhat nullifies the arguments people make against the use of LLMs due to this behaviour. Fundamentally, if you don’t want to work with LLMs because they sometimes “bullshit”, are you planning on no longer working with human beings as well?

It doesn’t hold up.

But, more than that, going back to your point: it’s much harder to redefine the term “bullshit” to mean something different to the common understanding.

All of that said, I don’t mind the piece and, honestly, the “I haven’t the foggiest” comment about the future of software development as a career is well made. I guess it’s just a somewhat useful collection of scattered thoughts on LLMs and, as such, an example of a piece where the “thoughts on” title fits well. I don’t think the author is trying to be particularly authoritative.

dragonwriter · 4 months ago

> I’ve never liked that this behaviour is described using the term “hallucination”.

I have a standard canned rant about "confabulation" is a much better metaphor, but it wasn't the point I was focussed on here.

> Fundamentally, if you don’t want to work with LLMs because they sometimes “bullshit”, are you planning on no longer working with human beings as well?

I will very much not voluntarily rely on a human for particular tasks if that human has demonstrated a pattern of bullshitting me when given that kind of task, yes, especially if, on top of the opportunity cost inherent in relying on a person for a particular task, I am also required to compensate them—e.g., financially—for their notional attention to the task.

scott_w · 4 months ago

> If a human being talked confidently about something that they were just making up out of thin air by synthesizing based (consciously or unconsciously) on other information they know you wouldn’t call it “hallucination”: you’d call it “bullshit”.

I'd recommend you watch https://www.youtube.com/watch?v=u9CE6a5t59Y&t=2134s&pp=ygUYc... which covers the topic of bullshit. I don't think we can call LLM output "bullshit" because someone spewing bullshit has to not care about whether what they're saying is true or false. LLMs don't "care" about anything because they're not human. It's better to give it an alternative term to differentiate it from the human behaviour, even if the observed output is recognisable.

movpasd · 4 months ago

At the risk of sounding woo, I find some parallels in how LLMs work to my experiences with meditation and writing. My subjective experience of it is that there is some unconscious part of my brain that supplies a scattered stream of words as the sentence forms --- without knowing the neuroscience of it, I could speculate it is a "neurological transformer", some statistical model that has memorised a combination of the grammar and contextual semantic meaning of language.

The difference is that the LLM is _only that part_. In producing language as a human, I filter these words, I go back and think of new phrasings, I iterate --- in writing consciously, in speech unconsciously. So rather than a sequence it is a scattered tree filled with rhetorical dead ends, pruned through interaction with my world-model and other intellectual faculties. You can pull on one thread of words as though it were fully-formed already as a kind of Surrealist exercise (like a one-person cadavre exquis), and the result feels similar to an LLM with the temperature turned up too high.

But if nothing else, this highlights to me how easily the process of word generation may be decoupled from meaning. And it serves to explain another kind of common human experience, which feels terribly similar to the phenomenon of LLM hallucination: the "word vomit" of social anxiety. In this process it suddenly becomes less important that the words you produce are anchored to truth, and instead the language-system becomes tuned to produce any socially plausible output at all. That seems to me to be the most apt analogy.

jimbokun · 4 months ago

"Bullshit engine" is the term that best explains to a lay person what it is that LLMs do.

belter · 4 months ago

> You've just taken the label conventionally attached to a bad behavior, attached it to a broader category that includes all behavior, and used the power of equivocation to make something that sounds novel without saying anything new.

You mean like every evangelist says AI changes everything every 5 min...When in reality what they mean is neural nets statistical code generators are getting pretty good? because that is almost all AI there is at the moment?

Just to make the current AI sound bigger than it is?

Deleted Comment

redditor98654 · 4 months ago

The way you have expressed this, I am borrowing it for myself. Many times i run into these kind of situations and I fail to explain why doing something like this is frustrating and actually useless. Thank you.

ljm · 4 months ago

If everything is a hallucination then nothing is a hallucination.

Thanks for listening to my Ted Talk.

Deleted Comment

In my company I feel that we getting totally overrun with code that's 90% good, 10% broken and almost exactly what was needed.

We are producing more code, but quality is definitely taking a hit now that no-one is able to keep up.

So instead of slowly inching towards the result we are getting 90% there in no time, and then spending lots and lots of time on getting to know the code and fixing and fine-tuning everything.

Maybe we ARE faster than before, but it wouldn't surprise me if the two approaches are closer than what one might think.

What bothers me the most is that I much prefer to build stuff rather than fixing code I'm not intimately familiar with.

whstl · 4 months ago

LLMs are amazing at producing boilerplate, which removes the incentive to get rid of it.

Boilerplate sucks to review. You just see a big mass of code and can't fully make sense of it when reviewing. Also, Github sucks for reviewing PRs with too many lines.

So junior/mid devs are just churning boilerplate-rich code and don't really learn.

The only outcome here is code quality is gonna go down very very fast.

jstummbillig · 4 months ago

I envy the people working at mystical places where humans were on average writing code of high quality prior LLMs. I'll never know you now.

ekidd · 4 months ago

> In my company I feel that we getting totally overrun with code that's 90% good, 10% broken and almost exactly what was needed.

This is painfully similar to what happens when a team grows from 3 developers to 10 developers. All of sudden, there's a vast pile of coding being written, you've never seen 75% of it, your architectural coherence is down, and you're relying a lot more on policy and CI.

Where LLM's differ is that you can't meaningfully mentor them, and you can't let them go after the 50th time they try turn off the type checker, or delete the unit tests to hide bugs.

Probably, the most effective way to use LLMs is to make the person driving the LLM 100% responsible for the consequences. Which would mean actually knowing the code that gets generated. But that's going to be complicated to ensure.

jimbokun · 4 months ago

Have thorough code reviews and hold the developer using the LLM responsible for everything in the PR before it can be merged.

dapperdrake · 4 months ago

Perlis, epigram 7:

7. It is easier to write an incorrect program than understand a correct one.

Link: http://cs.yale.edu/homes/perlis-alan/quotes.html

utyop22 · 4 months ago

"but quality is definitely taking a hit now that no-one is able to keep up."

And its going to get worse! So please explain to me how in the net, you are going to be better off? You're not.

I think most people haven't taken a decent economics class and don't deeply understand the notion of trade offs and the fact there is no free lunch.

computerex · 4 months ago

Technology has always helped people. Are you one of the people that say optimizing compilers are bad? Do you not use the intellisense? Or IDEs? Do you not use higher level languages? Why not write in assembly all the time? No free lunch right.

Yes there are trade offs, but at this point if you haven’t found a way to significantly amplify and scale yourself using llms, and your plan is to instead pretend that they are somehow not useful, that uphill battle can only last so long. The genie is out of the bag. Adapt to the times or you will be left behind. That’s just what I think.

globular-toast · 4 months ago

Yep, my strong feeling is that the net benefit of all of this will be zero. The time you have to spend holding the LLM hand is almost equal to how much time you would have spent writing it yourself. But then you've got yourself a codebase that you didn't write yourself, and we all know hunting bugs in someone else's code is way harder than code you had a part in designing/writing.

People are honestly just drunk on this thing at this point. The sunken cost fallacy has people pushing on (ie. spending more time) when LLMs aren't getting it right. People are happy to trade convenience for everything else, just look at junk food where people trade in flavour and their health. And ultimately we are in a time when nobody is building for the future, it's all get rich quick schemes: squeeze then get out before anyone asks why the river ran dry. LLMs are like the perfect drug for our current society.

Just look at how technology has helped us in the past decades. Instead of launching us towards some kind of Star Trek utopia, most people now just work more for less!

threecheese · 4 months ago

Fast feedback is one benefit, given the 90% is releasable - even if only to a segment of users. This might be anathema to good engineering, but a benefit to user experience research and to organizations that want to test their market for demand.

Fast feedback is also great for improving release processes; when you have a feedback loop with Product, UX, Engineering, Security etc, being able to front load some % of a deliverable can help you make better decisions that may end up being a time saver net/net.

naasking · 4 months ago

> And its going to get worse!

That isn't clear given the fact that LLMs and, more importantly, LLM programming environments that manage context better are still improving.

stevage · 4 months ago

> What bothers me the most is that I much prefer to build stuff rather than fixing code I'm not intimately familiar with.

Me too. But I think there's a split here. Some people love the new fast and loose way and rave about how they're experiencing more joy coding than ever before.

But I tried it briefly on a side project, and hated the feeling of disconnect. I started over, doing everything manually but boosted by AI and it's deeply satisfying. There is just one section of AI written code that I don't entirely understand, a complex SQL query I was having trouble writing myself. But at least with an SQL query it's very easy to verify the code does exactly what you want with no possibility of side effects.

Cthulhu_ · 4 months ago

I'd argue that this awareness is a good thing; it means you're measuring, analyzing, etc all the code.

Best practices in software development for forever have been to verify everything; CI, code reviews, unit tests, linters, etc. I'd argue that with LLM generated code, a software developer's job and/or that of an organization as a whole has shifted even more towards reviewing and verification.

If quality is taking a hit you need to stop; how important is quality to you? How do you define quality in your organization? And what steps do you take to ensure and improve quality before merging LLM generated code? Remember that you're still the boss and there is no excuse for merging substandard code.

thrawa8387336 · 4 months ago

Imagine someones add 10 UTs carefully devised and someone notices they need 1 more during the PR.

Scenario B, you add 40 with an LLM, that look good on paper but only cover 6 of the original ones. Besides, who's going to pay careful attention to a PR with 40.

"Must be so thorough!".

epolanski · 4 months ago

As Fowler himself states, there's a need to learn to use these tools properly.

In any case poor work quality is a failure of tech leadership and culture, it's not AI's fault.

FromTheFirstIn · 4 months ago

It’s funny how nothing seems to be AI’s fault.

jeppester · 4 months ago

oo0shiny · 4 months ago

What a great way of framing it. I've been trying to explain this to people, but this is a succinct version of what I was stumbling to convey.

jstrieb · 4 months ago

I have been explaining this to friends and family by comparing LLMs to actors. They deliver a performance in-character, and are only factual if it happens to make the performance better.

https://jstrieb.github.io/posts/llm-thespians/

red75prime · 4 months ago

The analogy goes down the drain when a criterion for good performance is being objectively right. Like with Reinforcement Learning from Verifiable Rewards.

ngc248 · 4 months ago

A better analogy is of an overconfident 5 year old kid, who never says that they don't know the answer and always has an "answer" for everything.

lagrange77 · 4 months ago

I'll steal that.

bo1024 · 4 months ago

This is also related to the philosophical definition of bullshit[1]: speech intended to persuade or influence without any active intention to be either true or false.

[1] https://en.wikipedia.org/wiki/On_Bullshit

aitchnyu · 4 months ago

All models are wrong, some are merely useful - 1976/1933/earlier adage.

Right, all models are inherently wrong. It's up to the user know about its limits / uncertainty.

But i think this 'being wrong' is kind of confusing when talking about LLMs (in contrast to systems/scientific modelling). In what they model (language), the current LLMs are really good and acurate, except for example the occasional chinese character in the middle of a sentence.

But what we mean by LLMs 'being wrong' most of the time is being factually wrong in answering a question, that is expressed as language. That's a layer on top of what the model is designed to model.

EDITS:

So saying 'the model is wrong' when it's factually wrong above the language level isn't fair.

I guess this is essentially the same thought as 'all they do is hallucinate'.

pjmorris · 4 months ago

Generally attributed to George Box

mohsen1 · 4 months ago

Intelligence in a way is the ability to filter out useless information. Be it, thoughts or sensory information

tugberkk · 4 months ago

Yes, can't remember who said it but LLM's always hallucinate, it is just that they are 90 something percent right.

If I was to drop acid and hallucinate an alien invasion, and then suddenly a xenomorph runs loose around the city while I’m tripping balls, does being right in that one instance mean the rest of my reality is also a hallucination?

Because it seems the point being made multiple times that a perceptual error isn’t a key component of hallucinating, the whole thing is instead just a convincing illusion that could theoretically apply to all perception, not just the psychoactively augmented kind.

OtomotO · 4 months ago

Which totally depends on your domain and subdomain.

E.g. Programming in JS or Python: good enough

Programming in Rust: I can scrap over 50% of the code because it will

a) not compile at all (I see this while the "AI" types)

b) not meet the requirements at all

rootusrootus · 4 months ago

For a hot second I thought LLMs were coming for our jobs. Then I realized they were just as likely to end up creating mountains of things for us to fix later. And as things settle down, I find good use cases for Claude Code that augment me but are in no danger of replacing me. It certainly has its moments.

jama211 · 4 months ago

Finally, an opinion on here that’s reasonable and isn’t “AI is perfect” or “AI is useless”.

mexicocitinluez · 4 months ago

One of the things that has struck me as odd is just how little self-awareness devs have when talking about "skin in the game" with regard to CEO's hawking AI products.

Like, we have just as much to lose as they have to gain. Of course a part of us doesn't want these tools to be as good as some people say they are because it directly affects our future and livelihood.

No, they can't do everything. Yes, they can do some things. It's that simple.

bko · 4 months ago

> Certainly if we ever ask a hallucination engine for a numeric answer, we should ask it at least three times, so we get some sense of the variation.

This works on people as well!

Cops do this when interrogating. You tell the same story three times, sometimes backwards. It's hard to keep track of everything if you're lying or you don't recall clearly so you can get a sense of confidence. Also works on interviews, ask them to explain a subject in three different ways to see if they truly understand.

It works to confuse people and make them sound like they’re lying when they’re not, too. Gotta be careful with this.

hnbad · 4 months ago

If you're trying to hit quotas rather than find out the truth, that sounds like a feature.

Terr_ · 4 months ago

> This works on people as well!

Only within certain conditions or thresholds that we're still figuring out. There are many cases where the more someone recalls and communicates their memory, the more details get corrupted.

> Cops do this when interrogating.

Sometimes that's not to "get sense of the variation" but to deliberately encourage a contradiction to pounce upon it. Ask me my own birthday enough times in enough ways and formats, and eventually I'll say something incorrect.

Care must also be taken to ensure that the questioner doesn't change the details, such as by encouraging (or sometimes forcing) the witness/suspect to imagine things which didn't happen.

inerte · 4 months ago

Triple modular redundancy. I remember reading that's how Nasa space shuttles calculate things because a processor / memory might have been affected by space radiation https://llis.nasa.gov/lesson/18803

anon7725 · 4 months ago

Triple redundancy works because you know that under nominal conditions each computer would produce the correct result independently. If 2 out of 3 computers agree, you have high confidence that they are correct and the 3rd one isn’t.

With LLMs you have no such guarantee or expectation.

chistev · 4 months ago

Who remembers that scene on Better Call Saul between Lalo, Saul, and Kim?

daviding · 4 months ago

I get a lot of productivity out of LLMs so far, which for me is a simple good sign. I can get a lot done in a shorter time and it's not just using them as autocomplete. There is this nagging doubt that there's some debt to pay one day when it has too loose a leash, but LLMs aren't alone in that problem.

One thing I've done with some success is use a Test Driven Development methodology with Claude Sonnet (or recently GPT-5). Moving forward the feature in discrete steps with initial tests and within the red/green loop. I don't see a lot written or discussed about that approach so far, but then reading Martin's article made me realize that the people most proficient with TDD are not really in the Venn Diagram intersection of those wanting to throw themselves wholeheartedly into using LLMs to agent code. The 'super clippy' autocomplete is not the interesting way to use them, it's with multiple agents and prompt techniques at different abstraction levels - that's where you can really cook with gas. Many TDD experts have great pride in the art of code, communicating like a human and holding the abstractions in their head, so we might not get good guidance from the same set of people who helped us before. I think there's a nice green field of 'how to write software' lessons with these tools coming up, with many caution stories and lessons being learnt right now.

edit: heh, just saw this now, there you go - https://news.ycombinator.com/item?id=45055439

tra3 · 4 months ago

It feels like Tdd/llm connection is implied — “and also generate tests”. Thought it’s not cannonical tdd of course. I wonder if it’ll turn the tide towards tech that’s easier to test automatically, like maybe ssr instead of react.

Yep, it's great for generating tests and so much of that is boilerplate that it feels great value. As a super lazy developer it's great as the burden of all that mechanical 'stuff' being spat out is nice. Test code being like baggage feels lighter when it's just churned out as part of the process, as in no guilt just to delete it all when what you want to do changes. That in itself is nice. Plus of course MCP things (Playwright etc) for integration things is great.

But like you said, it was meant more TDD as 'test first' - so a sort of 'prompt-as-spec' that then produces the test/spec code first, and then go iterate on that. The code design itself is different as influenced by how it is prompted to be testable. So rather than go 'prompt -> code' it's more an in-between stage of prompting the test initially and then evolve, making sure the agent is part of the game of only writing testable code and automating the 'gate' of passes before expanding something. 'prompt -> spec -> code' repeat loop until shipped.

rvz · 4 months ago

> It feels like Tdd/llm connection is implied — “and also generate tests”.

That sounds like an anti-pattern and not true TDD to get LLMs to generate tests for you if you don't know what to test for.

It also reduces your confidence in knowing if the generated test does what it says. Thus, you might as well write it yourself.

Otherwise you will get these sort of nasty incidents. [0] Even when 'all tests passed'.

[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

gjsman-1000 · 4 months ago

For my money, I used this analogy at work:

Before AI, we were trying to save money, but through a different technique: Prompting (overseas) humans.

After over a decade of trying that, we learned that had... flaws. So round 2: Prompting (smart) robots.

The job losses? This is just Offshoring 2.0; complete with everyone getting to re-learn the lessons of Offshoring 1.0.

the_af · 4 months ago

> Prompting (overseas) humans [...] After over a decade of trying that, we learned that had... flaws.

I think this is a US-centric point of view, and seems (though I hope it's not!) slightly condescending to those of us not in the US.

Software engineering is more than what happens to US-based businesses and their leadership commanding hundreds or thousands of overseas humans. Offshoring in software is certainly a US concern (and to a lesser extent, other nations suffer it), but is NOT a universal problem of software engineering. Software engineering happens in multiple countries, and while the big money is in the US, that's not all there is to it.

Software engineering exists "natively" in countries other than the US, so any problems with it should probably (also) be framed without exclusive reference to the US.

The problem isn't that there aren't high quality offshore developers - far from it. Or even high quality AI models.

The problems are inherent with outsourcing to a 3rd party and having little oversight. Oversight is, in both cases, way harder than it appears.

keeda · 4 months ago

Funnily enough, I left a similar comment just the other day: https://news.ycombinator.com/item?id=44944717

The conclusion I reached was different, though. We learnt how to do outsourcing "properly" pretty quickly after some initial high-profile failures, which is why it has only continued to grow into such a huge industry. This also involved local talent refocusing on higher-value tasks, which is why job losses were limited. Those same lessons and outcomes of outsourcing are very relevant to "bot-sourcing'.

However, I do feel concerned that AI is gaining skill-levels much faster than the rate at which people can upskill themselves.

Dead Comment

CuriouslyC · 4 months ago

I'm sure the blacksmiths and weavers will find solace in that take. Their time will return!

sebnukem2 · 4 months ago

> hallucinations aren’t a bug of LLMs, they are a feature. Indeed they are the feature. All an LLM does is produce hallucinations, it’s just that we find some of them useful.

Nice.

nine_k · 4 months ago

I'd rather say that LLMs live in a world that consists entirely of stories, nothing but words and their combinations. Thy have no other reality. So they are good at generating more stories that would sit well with the stories they already know. But the stories are often imprecise, and sometimes contradictory, so they have to guess. Also, LLMs don't know how to count, but they know that two usually follows one, and three is usually said to be larger than two, so they can speak in a way that mostly does not contradict this knowledge. They can use tools to count, like a human who knows digits would use a calculator.

But much more than an arithmetic engine, the current crop of AI needs an epistemic engine, something that would help follow logic and avoid contradictions, to determine what is a well-established fact, and what is a shaky conjecture. Then we might start trusting the AI.

chadcmulligan · 4 months ago

One night, I asked it to write me some stories, it did seem happy doing that. I just kept saying do what you want when it asked me for a choice, its a fun little way to spend a couple of hours.

I've asked LLMs to modify 5 files at a time and mark a checklist. Also image generators no longer draw 6 fingered hands.

gnerd00 · 4 months ago

this was true, but then it wasn't... the research world several years ago, had a moment when the machinery could reliably solve multi-step problems.. there had to be intermediary results; and machinery could solve problems in a domain where they were not trained specifically.. this caused a lot of excitement, and several hundred billion dollars in various investments.. Since no one actually knows how all of it works, not even the builders, here we are.

awesome_dude · 4 months ago

I have a very similar (probably unoriginal) thought about some human mental illnesses.

So, we VALUE creativity, we claim that it helps us solve problems, improves our understanding of the universe, etc.

BUT people with some mental illnesses, their brain is so creative that they lose the understanding of where reality is and where their imagination/creativity takes over.

eg. Hearing voices? That's the brain conjuring up a voice - auditory and visual hallucinations are the easy example.

But it goes further, depression is where people's brains create scenarios where there is no hope, and there's no escape. Anxiety too, the brain is conjuring up fears of what's to come

malloryerik · 4 months ago

You may like to check out Iain McGilchrist's take on schizophrenia, which essentially he says is a relative excess of rationality ("if then else" thinking) and a deficit of reasonableness (as in sensible context inhabiting).

tptacek · 4 months ago

In that framing, you can look at an agent as simply a filter on those hallucinations.

armchairhacker · 4 months ago

This vaguely relates to a theory about human thought: that our subconscious constantly comes up with random ideas, then filters the unreasonable ones, but in people with delusions (e.g. schizophrenia) the filter is broken.

Salience (https://en.wikipedia.org/wiki/Salience_(neuroscience)), "the property by which some thing stands out", is something LLMs have trouble with. Probably because they're trained on human text, which ranges from accurate descriptions of reality to nonsense.

More of a error-correcting feedback loop rather than a filter, really. Which is very much what we do as humans, apparently. One recent theory of neuroscience that is becoming influential is Predictive Processing --https://en.wikipedia.org/wiki/Predictive_coding -- this postulates that we also constantly generate a "mental model" of our environment (a literal "prediction") and use sensory inputs to correct and update it.

So the only real difference between "perception" and a "hallucination" is whether it is supported by physical reality.

Lionga · 4 months ago

Isn't an "agent" not just hallucinations layered on top of other random hallucinations to create new hallucinations?

th0ma5 · 4 months ago

Yes yes, with yet to be discovered holes

I've prefered to riff off of the other quote:

"All (large language) model outputs are hallucinations, but some are useful."

Some astonishingly large proportion of them, actually. Hence the AI boom.

I find it a bit of a reductive way of looking at it personally

anthem2025 · 4 months ago

Isn’t that why people argue against calling them hallucinations?

It implies that some parts of the output aren’t hallucinations, when the reality is that none of it has any thought behind it.

ninetyninenine · 4 months ago

Nah I don't agree with this characterization. The problem is, the majority of those hallucinations are true. What was said would make more sense if the majority of the responses were, in fact, false, but this is not the case.

xmprt · 4 months ago

I think you're both correct but have different definitions of hallucinations. You're judging it as a hallucination based on the veracity of the output. Whereas Fowler is judging it based on the method by which the output is achieved. By that judgement, everything is a hallucination because the user cannot differentiate between when the LLM is telling the truth and isn't.

This is different from human hallucinations where it makes something up because of something wrong with the mind rather than some underlying issue with the brain's architecture.