Why language models hallucinate

I find this rather oddly phrased.

LLMs hallucinate because they are language models. They are stochastic models of language. They model language, not truth.

If the “truthy” responses are common in their training set for a given prompt, you might be more likely to get something useful as output. Feels like we fell into that idea and said - ok this is useful as an information retrieval tool. And now we use RL to reinforce that useful behaviour. But still, it’s a (biased) language model.

I don’t think that’s how humans work. There’s more to it. We need a model of language, but it’s not sufficient to explain our mental mechanisms. We have other ways of thinking than generating language fragments.

Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.

crystal_revenge · 6 months ago

People also tend not to understand the absurdity of assuming that we can make LLMs stop hallucinating. It would imply not only that truth is absolutely objective, but that it exists on some smooth manifold which language can be mapped to.

That means there would be some high dimensional surface representing "all true things". Any fact could be trivially resolved as "true" or "false" simply by exploring whether or not it was represented on this surface. Where or not "My social security number is 123-45-6789" is true could be determined simply by checking whether or not that statement was mappable to the truth manifold. Likewise you could wander around that truth manifold and start generating output of all true things.

If such a thing existed it would make even the wildest fantasies about AGI seem tame.

edit: To simplify it further, this would imply you could have an 'is_true(statement: string): bool' function for any arbitrary statement in English.

jdietrich · 6 months ago

>People also tend not to understand the absurdity of assuming that we can make LLMs stop hallucinating. It would imply not only that truth is absolutely objective, but that it exists on some smooth manifold which language can be mapped to.

Frankly, this is a silly line of argument. There is a vast spectrum between regularly inventing non-existent citations and total omniscience. "We can't define objective truth" isn't a gotcha, it's just irrelevant.

Nobody in the field is talking about or working on completely eliminating hallucinations in some grand philosophical sense, they're just grinding away at making the error rate go down, because that makes models more useful. As shown in this article, relatively simple changes can have a huge effect and meaningful progress is being made very rapidly.

We've been here before, with scepticism about Wikipedia. A generation of teachers taught their students "you can't trust Wikipedia, because anyone can edit it". Two decades and a raft of studies later, it became clear that Wikipedia is at least as factually accurate as traditional encyclopedias and textbooks. The contemporary debate about the reliability of Wikipedia is now fundamentally the same as arguments about the reliability of any carefully-edited resource, revolving around subtle and insidious biases rather than blatant falsehoods.

Large neural networks do not have to be omniscient to be demonstrably more reliable than all other sources of knowledge, they just need to keep improving at their current rate for a few more years. Theoretical nitpicking is missing the forest for the trees - what we can empirically observe about the progress in AI development should have us bracing ourselves for radical social and economic transformation.

mqus · 6 months ago

Well, no. The article pretty much says that any arbitrary statement can be mapped to {true, false, I don't know}. This is still not 100% accurate, but at least something that seems reachable. The model should just be able to tell unknowns, not be able to verify every single fact.

thisoneisreal · 6 months ago

A great book in this vein is "Language vs. Reality." The main thesis of the book is that language evolved to support approximate, ad hoc collaboration, and is woefully inadequate for doing the kind of work that e.g. scientists do, which requires incredible specificity and precision (hence the amount of effort devoted to definitions and quantification).

BobbyTables2 · 6 months ago

Agree. I deeply suspect the problem of asking an LLM to not hallucinate is equivalent to the classic Halting Problem.

beeflet · 6 months ago

Maybe if a language model was so absolutely massive, it could <think> enough to simulate the entire universe and determine your social security number

thisoneisreal · 6 months ago

This strikes me as a perfect description of the core problem. Whenever I think about this, what sticks out to me is that other animals do all sorts of things that look like "intelligence," or at least cognition, and they do it totally without language. My cat clearly recognizes objects, assigns them different values ("scary," "tasty," "fun to play with"), interacts with them in some kind of loop, even predicts their behavior to some extent and acts curious about them (it was really fun to watch her try to figure out the construction guys when I had some work done on my house over a period of a few days). These strike me as much more foundational aspects of intelligence than language. Language has of course immeasurably contributed to what makes human cognition and intelligence, but it's almost certainly built on these pre-linguistic foundations. Another very good hint in this direction is all of the non-verbal thinking that humans have done. Einstein has a famous quote about thinking visually and physically, without using language at all. All of these are powerful suggestions that something else is going on, and most likely some aspect of these things are necessary for true intelligence.

simianparrot · 6 months ago

I’ve always thought everyone agreed language was a lossy but useful method of compression for sharing inner concepts and ideas. That my conscious thoughts are “in a language” doesn’t mean my reasoning and entire being interacts with the world using language.

I’m only “thinking in language” when I’m practicing compressing my intent into a shareable format. I don’t think about the majority of highly complex interactions I have with the physical world throughout the day.

As a child did you need to be able to explain in language how the physics of a swing works to be able to use it? Did other kids have to explain it to you in detailed language for you to pick up on how to move your body to do complex tasks?

No. In fact exactly because our compression and decompression of language is even more limited as children, we rely more heavily on raw observation and mimicry of actions occurring in reality itself.

The very idea that a language model can recreate everything we do from the lossy and compressed languages we use to share limited descriptions of much more complex intentions and actions is fundamentally flawed and oversimplified.

utyop22 · 6 months ago

The reality is, language itself does not capture the entirety of what is really going on. And I'd get argue its the poorest way of expressing - but one that enables transmission through various mediums efficiently on a cost basis.

E.g. when I explain a concept, what comes to my mind is not a string of letters and words. There is a mix of imagery and even sounds that I may have acquired from learning about a concept - then I translate that into text so it can be communicated.

Theres a reason why people use native subtitles when watching netflix - text complements imagery and sounds.

kelnos · 6 months ago

I use subtitles becomes sometimes I have trouble understanding the actors. I believe I read something that suggested that the sound mix in movies and cinematic TV shows has changed a lot in the past couple decades, and a result is that it's harder to understand dialogue.

I don't like this; I find my eyes spending more time than I'd like on the text, and not enough on the visual imagery on the rest of the screen. If I truly wanted more text, I'd just read a book.

pawelmurias · 6 months ago

I would assume most people use native subtitles when it's hard to understand what words the actors said.

crabmusket · 6 months ago

> I don’t think that’s how humans work.

Every time this comes up I have to bring up Deutsch. He has the best description of intelligent cognition that I've come across. He takes Popper's "conjecture and criticism" approach to science and argues that this guess-and-check loop applies to all our thinking.

E.g. understanding spoken language has some elements of guessing what might have been said and checking that against the sounds we heard. Visual processing has similar analogies.

LLMs seem to be great at conjecturing stuff, but seem incapable of checking or even knowing they need to check.

codethief · 6 months ago

> Every time this comes up I have to bring up Deutsch. He has the best description of intelligent cognition that I've come across.

Would you have a reference?

munchler · 6 months ago

This is directly addressed in the article, which states that language models can be trained to abstain when uncertain, by changing how rewards are set up. Incentives currently encourage guessing rather than being honest about uncertainty. If you disagree, it would be helpful to explain why, rather than just responding to the title alone.

asats · 6 months ago

Exactly. I always found it strange when people assume that "hallucinations" are just some sort of a bug in the system, as if by you tweaking some code or training modality will produce an oracle of absolute truth incapable of making mistakes.

ComplexSystems · 6 months ago

> Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.

Why? It seems no less odd than eliminating cases where it gives "undesirable" code snippets with hallucinated errors. This is very important and not odd at all.

rhubarbtree · 6 months ago

To clarify, because you will be left with a biased language model. It will continue to hallucinate, and as you squeeze some hallucinations in one part of the language space you may well create new ones elsewhere. It doesn’t seem a solid line of attack

didibus · 6 months ago

I agree with everything you said except:

> Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.

Take it back to what it is like you say, this is a predictive model, and the work of any ML scientist is to iterate on the model to try and get perfect accuracy on unseen data. It makes sense to want to tune the models to lower the rate of predictive errors. And because perfect predictive accuracy is rarely possible, you need to make judgment calls between precision and recall, which, in the case of LLMs, directly affects how often the model will hallucinate versus how often it will stay silent or overly cautious.

rubatuga · 6 months ago

But we're getting into the limits of knowledge and what is true/untrue. A stochastic model will be wrong sometimes.

humanfromearth9 · 6 months ago

Humans think with inductive and deductive reasoning. First inductive, then we generalize and deduce, which allows for quick decision-making, hence increases our survival fitness. I don't know how the transition is done from inductive to deductive, and that's probably why currently, AI is not able to reason like humans.

They hallucinate because it's an ill-defined problem with two conflicting usecases:

1. If I tell it the first two lines of a story, I want the LLM to complete the story. This requires hallucination, because it has to make up things. The story has to be original.

2. If I ask it a question, I want it to reply with facts. It should not make up stuff.

LMs were originally designed for (1) because researchers thought that (2) was out of reach. But it turned out that, without any fundamental changes, LMs could do a little bit of (2) and since that discovery things have improved but not to the point that hallucination disappeared or was under control.

didibus · 6 months ago

The word "hallucination" mis-characterizes it.

LLMs predict the likely tokens to follow the context. And they can make incorrect predictions.

LLMs therefore don't have perfect accuracy of prediction. When their predictions are incorrect, people say they "hallucinate".

Nobody questions why predictive weather models aren't perfectly accurate, because it makes sense that a prediction can be wrong.

Marketing and hype has tried to sell LLMs as "logical rational thinkers" equal to human thinking. A human doing actual thinking knows when they are making stuff up. So if a human truly believes obviously false things to be true, it tends to be because they are hallucinating. Their thinking isn't wrong, they've lost track of reality to ground their thinking.

We've anthropomorphized LLMs to the point we wonder why are they hallucinating like we can offer a diagnostic. But if you stop anthropomorphising them and go back to their actual nature as a predictive model, then it's not even a surprising outcome that predictions can turn out to be wrong.

Jensson · 6 months ago

A weather model is made to predict the weather and used to predict the weather, so there you are right.

A language model is made to predict language, but used to generate code or answers to math questions, that is not the same situation as a weather model. The language model is not made to solve math or generate correct code, if you ask it to predict the weather it wont try to predict the weather, it will just predict the language that is a probable to such a question.

This sort of misunderstanding is what is causing all these debates, many people really struggle understanding what these language models really are.

wavemode · 6 months ago

Indeed - as Rebecca Parsons puts it, all an LLM knows how to do is hallucinate. Users just tend to find some of these hallucinations useful, and some not.

saghm · 6 months ago

This is a a super helpful way of putting it. I've tried to explain to my less technical friends and relatives that from the standpoint of an LLM, there's no concept of "truth", and that all it basically just comes up with the shape of what a response should look like and then fills in the blanks with pretty much anything it wants. My success in getting the point across has been mixed, so I'll need to try out this much more concise way of putting it next time!

Zigurd · 6 months ago

I recently asked Gemini to riff on the concept of "Sustainable Abundance" and come up with similar plausible bullshit. I could've filled a slate of TED talks with the brilliant and plausible sounding nonsense it came up with. Liberated from the chains of correctness, LLMs' power is unleashed. For example:

The Symbiocene Horizon: A term suggesting a techno-utopian future state where humanity and technology have merged with ecological systems to achieve a perfect, self-correcting state of equilibrium.

fumeux_fume · 6 months ago

In the article, OpenAI defines hallucinations as "plausible but false statements generated by language models." So clearly it's not all that LLMs know how to do. I don't think Parsons is working from a useful or widely agreed upon definition of what a hallucination is which leads to these "hot takes" that just clutter and muddy up the conversation around how to reduce hallucinations to produce more useful models.

throwawaymaths · 6 months ago

that's wrong. there is probably a categorical difference between making something up due to some sort of inferential induction from the kv cache context under the pressure of producing a token -- any token -- and actually looking something up and producing a token.

so if you ask, "what is the capital of colorado" and it answers "denver" calling it a Hallucination is nihilistic nonsense that paves over actually stopping to try and understand important dynamics happening in the llm matrices

leptons · 6 months ago

"A broken clock is right twice a day"

skybrian · 6 months ago

I don’t think it’s inherently ill-defined, since the context can tell you whether fiction is being requested or not. For an AI chatbot, the default shouldn’t be fiction.

What is true is that during pretraining, the model doesn’t know enough to determine this or to distinguish between what it knows and what it’s making up. This is a higher-level distinction that emerges later, if at all.

The recent research discovering an “evil vector” is an example of a higher-level distinction.

codethief · 6 months ago

I was inclined to agree at first but do those use cases really conflict?

If I ask the LLM to generate a fictional story set in medieval Francs, and it then responds with a fictional story set in medieval France, that's an appropriate ("correct") response to the task I gave it. If it responded with a story set in medieval England, though, that would not be correct. If, instead, I had asked it to generate a story in "medieval times", both France and England would have been correct as locations because the problem was underspecified and asked for some creativity. A medieval story set in the US, however, would still not have been correct or consistent with the training data. You can come up with more such examples even in entirely fictional settings: Once the story has been set to take place in fictional city X, it would not be consistent if two sentences later the characters were in city Y all of a sudden. (That would be a bit too creative.) What I'm trying to say is: Creativity might be "correct" (appropriate) in a given context, or it might not be. Even fiction and creativity require a certain degree of consistency and coherence.

Now, correct answers, in turn, might also require a certain degree of creativity:

If I ask the LLM for some straight up facts, which are not in its training data nor in the prompt context, the only really correct answer is "I don't know". However, sometimes it might be possible to narrow down the correct answer to a few possible options based on the training data. So then it might be appropriate for the LLM to say "I don't know the exact answer but here are some educated guesses based on what I do know: …" And maybe, having pondered those options, it is able to deduce the correct answer after all. (In the same way as I am writing this HN comment to help me think and clarify my thoughts.)

This is reminiscent of mathematics and mathematical research, which are often described as a creative process. Obviously, the creative output is heavily constrained. You make educated guesses and then validate them against what you already know to be true. Someone else here in this thread[0] mentioned Popper's "Conjectures and Refutations" as a possible model for what intelligent cognition is about and the more I think about that, the more convincing I find it.

[0]: https://news.ycombinator.com/item?id=45153695

hodgehog11 · 6 months ago

I don't agree that it is an ill-defined problem, since we can design separate models to excel in each of these two tasks. For a "factual" LLM, if the output is a verifiable statement, it should be correct. Otherwise it "hallucinates". But since an LLM can't know everything, a better approach is to effectively state its own uncertainty so that it avoids making definitive statements with low confidence.

ninetyninenine · 6 months ago

Did you read the article? You’re going on some generic tangent and regurgitating the same spiel about LLMs that you see all over the internet.

I mean it’s plain that you have an orthogonal (though generic) opinion on why LLMs hallucinate but how does that relate to the article? How does your opinion which you blatantly just dropped as if it’s the final opinion override the opinion of the article?

Seems off topic honestly.

simianwords · 6 months ago

I agree. It’s just people who have a different view taking their opportunity to vent out their frustration.

raincole · 6 months ago

Generally HN commenters don't read the article. They use the title as a prompt to express their opinions on a specific topic.

Dead Comment

cjauvin · 6 months ago

If you consider this from the angle of Wittgenstein's "language games", you could say that the problem would be "simply" to distinguish between these two, quite different, language games, and act accordingly.

Deleted Comment

johnnyanmac · 6 months ago

>This requires hallucination, because it has to make up things. The story has to be original.

Is it a hallucination if the story is original? There's a difference between "what's the rest of this famous poem?" and "let's just make poetry".

lucketone · 6 months ago

It is irrelevant for the point being made: LLM does exactly the same thing in both cases - generates statistically plausible text, based on examples it was exposed during training.

Deleted Comment

furyofantares · 6 months ago

Wanting it to pick between those modes based on what you asked for is not remotely ill-defined.

But even if we restricted ourselves to the case of factual queries, the article discusses why training in a certain way would still produce hallucinations, and how to change the training method to reduce this.

Like many of the other responses here, your dismissal doesn't really address any of the content of the article, just the title.

I like that OpenAI is drawing a clear line on what “hallucination” means, giving examples, and showing practical steps for addressing them. The post isn’t groundbreaking, but it helps set the tone for how we talk about hallucinations.

What bothers me about the hot takes is the claim that “all models do is hallucinate.” That collapses the distinction entirely. Yes, models are just predicting the next token—but that doesn’t mean all outputs are hallucinations. If that were true, it’d be pointless to even have the term, and it would ignore the fact that some models hallucinate much less than others because of scale, training, and fine-tuning.

That’s why a careful definition matters: not every generation is a hallucination, and having good definitions let us talk about the real differences.

freehorse · 6 months ago

> What bothers me about the hot takes is the claim that “all models do is hallucinate.” That collapses the distinction entirely

That is a problem for "Open"AI because they want to sell their products, and because they want to claim that LLMs will scale to superintelligence. Not for others.

"Bad" hallucinations come in different forms, and what the article describes is one of them. Not all of them come from complete uncertainty. There are also the cases where the LLM is hallucinating functions in a library, or they reverse cause and effect when summarising a complex article. Stuff like this still happen all the time, even with SOTA models. They do not happen because the model is bad with uncertainty, they have nothing to do with knowledge uncertainty. Esp stuff like producing statements that misinterpret causal relationships within text, imo, reveals exactly the limits of the architectural approach.

p_v_doom · 6 months ago

The problem is not so much IMO that all models hallucinate. Its more that our entire reality, especially as expressed through the training data - text, is entirely constructed. There is no difference in the world made by the text, say when it comes to the reality of Abraham Lincoln and Bilbo Baggins. We often talk about the later as if he is just as real. Is Jesus real? Is Jesus god? Is it hallucination to claim the one you dont agree with? We cant even agree amongst oursevles what is real and what is not.

What we perceive as "not hallucination" is merely a very big consensus supported by education, culture, personal beliefs and varies quite a bit. And little in the existence of the model gives it the tools to make those distinctions. Quite the opposite

catlifeonmars · 6 months ago

So there are two angles to this:

- From the perspective of LLM research/engineering, saying all LLM generation is hallucination is not particularly useful. It’s meaningless for the problem space.

- From the perspective of AI research/engineering in general (not LLM specific) it can be useful to consider architectures that do not rely on hallucination in the second sense.

druskacik · 6 months ago

I like this quote:

'Everything an LLM outputs is a hallucination. It's just that some of those hallucinations are true.'

swores · 6 months ago

To me that seems as pointless as saying "everything a person sees is a hallucination, it's just some of those hallucinations are true". Sure, technically whenever we see anything it's actually our brain interpreting how light bounces off stuff and combining that with the mental models we have of the world to produce an image in our mind of what we're looking at... but if we start calling everything we see a hallucination, there's no longer any purpose in having that word.

So instead of being that pedantic, we decided that "hallucination" only applies to when what our brain thinks we see does not match reality, so now hallucination is actually a useful word to use. Equally with LLMs, when people talk about hallucinations part of the definition includes that the output be incorrect in some way. If you just go with your quote's way of thinking about it, then once again the word loses all purpose and we can just scrap it since it now means exactly the same thing as "all LLM output".

Absolutely in agreement here. This same statement should also be applied to the words "know", "understand", and "conceptualize". "Generalize", "memorize" and "out-of-distribution" should also be cautiously considered when working with systems trained on incomprehensibly large datasets.

We need to establish proper definitions and models for these things before we can begin to argue about them. Otherwise we're just wasting time.

parentheses · 6 months ago

Yes. Maybe a better way to put it would be, "all models guess every time because they are stochastic in nature. However, we only want the answers with high confidence."

player1234 · 6 months ago

Correct, it is a useless term with the goal to gaslight and antropmorphise a system that predicts the next token.

vrighter · 6 months ago

if you insist that they are different, then please find one logical, non-subjective, way to distinguish between a hallucination and not-a-hallucination. Looking at the output and deciding "this is clearly wrong" does not count. No vibes.

esafak · 6 months ago

> Looking at the output and deciding "this is clearly wrong" does not count.

You need the ground truth to be able to make that determination, so using your knowledge does count. If you press the model to answer even when it does not know, you get confabulation. What today's models lack is the ability to measure their confidence, so they know when to abstain.

ttctciyf · 6 months ago

"Hallucination" is a euphemism at best, and the implication it carries that LLMs correctly perceive (meaning) when they are not hallucinating is fallacious and disinforming.

The reification of counterfactual outputs which are otherwise indistinguishable from the remainder of LLM production etiologically is a better candidate for the label "hallucination" IMO.

aleph_minus_one · 6 months ago

> Think about it like a multiple-choice test. If you do not know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say “I don’t know.”

To me, this seems to be an "US-American" way of thinking about multiple-choice tests. Other common ways to grade multiple-choice test that I have seen commonly are:

1. If the testee has the information that exactly one of N given choices is correct:

1.1 Give N-1 points for the correct answer, and -1 [negative one] point(s) for a wrong answer. This way, if the testee just answers the questions randomly, he will as expected value score 0 points.

1.2 A more brutal way if N>=3: the correct answer gives 1 point, all wrong answers give -1 points. You should learn your lesson only to give an answer if it is [alliteration unintended :-) ] correct (if N=2, the grading is identical to 1.1).

2. If there are possibly multiple correct answers, turn each item into choices of "yes" or "no" (with the option to give no answer). The correct choice gives you 1 point, the wrong gives you -1 point (i.e. as in 1.1).

roxolotl · 6 months ago

The SAT, American college entrance examine, used to, I haven’t looked in years so maybe it still does, take away points for wrong answers and give 0 points for no answer. I’m pretty sure it was +1 for right answer, 0 for no answer, -1/4 for wrong answer.

thaumasiotes · 6 months ago

They used to do that, but then they stopped and announced that you were better off guessing because there would be no adjustment for it.

A lot of what they do is based on public relations rather than psychometric validity.

bananaflag · 6 months ago

This is mentioned in the text:

> This idea is not new. Some standardized tests have long used versions of negative marking for wrong answers or partial credit for leaving questions blank to discourage blind guessing.

there's not really an easy way to train for that at scale. a "correct" answer may not be one token, there may be multiple synonymous answers starting with different tokens, you could add five space tokens in front of the answer amd it likely shouldn't make it "wrong".

CGMthrowaway · 6 months ago

>> Think about it like a multiple-choice test. If you do not know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say “I don’t know.”

For TIMED multiple-choice tests (and the timed constraint makes sense in OP analogy as well), probabilistic answering is the kryptonite that lets smart people do well on SATs and IQ tests and other things like that.

I took an IQ test recently and it all came rushing back to me.

For math problems, often the right answer can be found just by inspecting the ones digit of the possible answers and process of elimination. Others, by abstracting what errors the test writer is expecting you to make, and eliminating those as possible answers. It's like magic. Sure, you could actually sit and SOLVE each problem, but when spend the time, when time is valuable?

Pretty sure these types of strategies are not actively taught to anyone unless you have a good college counselor /interested teacher/ SAT tutor. But perhaps they ought to be.

mock-possum · 6 months ago

Yeah when you realize that the fake answers to the test have been created by humans, you can predict what false answers might look like - you can even get a feel for the kind of false answers that the test’s author tends to provide, and by the end of the test you can start to spot them fairly confidently. You still check your work, obviously, but it’s like picking up on a poker player’s tell - it gives you an edge.

amelius · 6 months ago

This seems inherently false to me. Or at least partly false. It’s reasonable to say LLMs hallucinate because they aren’t trained to say they don’t have a statistically significant answer. But there is no knowledge of correct vs incorrect in these systems. It’s all statistics so what OpenAI is describing sounds like a reasonable way to reduce hallucinations but not a way to eliminate them nor the root cause.

goalieca · 6 months ago

> It’s reasonable to say LLMs hallucinate because they aren’t trained to say they don’t have a statistically significant answer.

I’ve not seen anyone intuitively explain parameters for a real scale model.. perhaps because it’s all just thousand dimensional nonsense.

Statistics is a funny thing too. Pretty much everyone has seen how trend lines don’t always extrapolate very well.

I think OpenAI is biased to thinking that adding more parameters and training better will fix all ills. In a handwaving way, you can see this like adding more degrees to the polynomial when you curve fit on a spreadsheet. With enough parameters you can perfectly fit any dataset. That all works until you run across new inputs that are unlike training data.

"I think OpenAI is biased to thinking that adding more parameters and training better will fix all ills."

Their whole existence depends on this happening. Else they go bust.

ACCount37 · 6 months ago

Is there any knowledge of "correct vs incorrect" inside you?

If "no", then clearly, you can hit general intelligence without that.

And if "yes", then I see no reason why an LLM can't have that knowledge crammed inside it too.

Would it be perfect? Hahahaha no. But I see no reason why "good enough" could not be attained.

> Is there any knowledge of "correct vs incorrect" inside you?

There is a sort of knowledge humans possess that LLMs don't (and in fact can't, without a fundamental architectural change), which is knowledge of how certain one is about something.

If you ask a human a question about how something works in biology, they will be able to give you an answer as well as a sort of "epistemic" citation (i.e. the difference between "I don't remember where exactly I originally read that, but I'm a research biologist and am quite certain that's how it works" versus "I don't remember where I read that - it's probably just something we learned about in biology class in high school. Take it with a grain of salt, as I could be misremembering.")

LLMs don't have this reflexive sense of their own knowledge - there's a fundamental divide between training data (their "knowledge") and context (their "memory") which causes them to not really be capable of understanding how they know what they know (or, indeed, whether they truly know it at all). If a model could be created where the context and training data were unified, like in a brain, I could see a more realistic path to general intelligence than what we have now.

> And if "yes", then I see no reason why an LLM can't have that knowledge crammed inside it too.

An LLM, by definition, doesn't have such a concept. It's a model of language, hence "LLM".

Do you think the phrase just means "software"? Why?

I'm going to tell you straight up. I am a very intelligent man and I've been programming for a very long time. My identity is tied up with this concept that I am intelligent and I'm a great programmer so I'm not going to let some AI do my job for me. Anything that I can grasp to criticize the LLM I'm gonna do it because this is paramount to me maintaining my identity. So you and your rationality aren't going to make me budge. LLMs are stochastic parrots and EVERYONE on this thread agrees with me. They will never take over my job!

I will add they will never take over my job <in my lifetime> because it makes me sound more rational and it's easier to swallow that then to swallow the possibility that they will make me irrelevant once the hallucination problem is solved.

mountainriver · 6 months ago

There is knowledge of correct and incorrect, that’s what loss is, there are just often many possible answers to a question.

This is the same reason that RLVR works. There is just right one answer and LLMs learn this fairly well but not perfectly (yet)

> There is knowledge of correct and incorrect, that’s what loss is

Loss is only correctness in terms of correct language, not correct knowledge. It correlates with correct knowledge, but that is all, that correlation is why LLM is useful for tasks at all but we still don't have a direct measure for correct knowledge in the models.

So for language tasks loss is correctness, so for things like translations LLM are extremely reliable. But for most other kinds of tasks they are just loosely correlated.

FusionX · 6 months ago

They partly address this near the end

> It’s doubly hard to distinguish valid statements from invalid ones when you don’t have any examples labeled as invalid. But even with labels, some errors are inevitable. To see why, consider a simpler analogy. In image recognition, if millions of cat and dog photos are labeled as “cat” or “dog,” algorithms can learn to classify them reliably. But imagine instead labeling each pet photo by the pet’s birthday. Since birthdays are essentially random, this task would always produce errors, no matter how advanced the algorithm.

> The same principle applies in pretraining. Spelling and parentheses follow consistent patterns, so errors there disappear with scale. But arbitrary low-frequency facts, like a pet’s birthday, cannot be predicted from patterns alone and hence lead to hallucinations. Our analysis explains which kinds of hallucinations should arise from next-word prediction. Ideally, further stages after pretraining should remove them, but this is not fully successful for reasons described in the previous section.

johnea · 6 months ago

I think a better title would be:

"Why do venture capital funded startups try to turn PR propaganda terms into widely used technical jargon"

Supporting points:

1) LLMs are not intelligence in any form, artificial or otherwise.

2) Hallucination is a phenomenon of a much more complex conscious entity. LLM's are not conscious, and therefore can't hallucinate in any way similar to a conscious entity.

3) Anthropomorphizing inanimate systems is a common phenomenon in human psychology.

Please stop spreading PR propaganda as if it were technical fact.

A reference from today's feed:

https://www.theatlantic.com/podcasts/archive/2025/09/ai-and-...

kingstnap · 6 months ago

There is this deeply wrong part of this paper that no one has mentioned:

The model head doesn't hallucinate. The sampler does.

If you ask an LLM when x was born and it doesn't know.

And you take a look at the actual model outputs which is a probability distribution over tokens.

IDK is cleanly represented as a uniform probability Jan 1 to Dec 31

If you ask it to answer a multiple choice question and it doesn't know. It will say this:

25% A, 25% B, 25% C, 25%D.

Which is exactly, and correctly, the "right answer". The model has admitted it doesn't know. It doesn't hallucinate anything.

In reality we need something smarter than a random sampler to actually extract this information out. The knowledge and lack of knowledge is there, you just produced bullshit out of it.

No, that's a misconception. It's not nearly that simple.

There are questions that have a palpable split in probability between the answers, with logit distribution immediately exposing the underlying lack-of-confidence.

But there are also questions that cause an LLM to produce consistent-but-wrong answers. For example, because the question was associated with another not-the-same-but-somewhat-similar question internally, and that was enough to give an LLM a 93% on B, despite B being the wrong answer.

An LLM might even have some latent awareness of its own uncertainty in this case. But it has, for some reason, decided to proceed with a "best guess" answer, which was in this case wrong.

numeri · 6 months ago

This isn't right – calibration (informally, the degree to which certainty in the model's logits correlates with its chance of getting an answer correct) is well studied in LLMs of all sizes. LLMs are not (generally) well calibrated.

a2128 · 6 months ago

This is only true if you have a pretrained base model trained on infinite true data with no bias. In practice it will have picked up some bias, maybe it encountered more famous "James" birthdays in January and on a digit starting with 2, so Jan 2 and Jan 20-29 has a higher probability than all. But finetuning and especially RL completely breaks these probabilities as a measure of certainty because the goals shift from generally modelling text to something else entirely.

cyanydeez · 6 months ago

Im betting there's a graph model using various vectors that could improve known-knowns in outcomes.

But unknown-unknowns likely reduce to the Halting problem, which human intelligence doesnt really solve either.

yreg · 6 months ago

Maybe it goes against the definition but I like saying that _all_ output is a hallucination, when explaining LLMs.

It just happens that a lot of that output is useful/corresponding with the real world.

Well yes, it goes against the accepted definition. And if all output is hallucination, then it's not really a useful way to describe anything, so why bother?

MattPalmer1086 · 6 months ago

I agree that saying everything is a hallucination doesn't help to narrow down on possible solutions.

It does however make the point that hallucinations are not some special glitch which is distinct from the normal operation of the model. It's just outputting plausible text, which is right often enough to be useful.

Adding in some extra sauce to help the model evaluate the correctness of answers, or when it doesn't know enough to give a good answer, is obviously one way to mitigate this otherwise innate behaviour.

drekipus · 6 months ago

But it's the perfect definition because it shows what it is. The output is a hallucination in what it thinks you want, which you can use for better form prompts or the like.

To say "it only hallucinates sometimes" is burying the lede and confusing for people who are trying to use it

Q: How do I stop Hallucinations? A: useless question, because you can't. It is the mechanism that gives you what you want

I find it useful to underline the intrinsic properties of LLMs. When an LLM makes up something untrue, it's not a 'bug'.

I think that thinking of all LLM output as 'hallucinations' while making use of the fact that these hallucinations are often true for the real world is a good mindset, especially for nontechnical people, who might otherwise not realise.