OpenAI’s latest research paper demonstrates that falsehoods are inevitable

Saying “I don’t know” to 30% of queries if it actually doesn’t know, is a feature I want. Otherwise there is zero trust. How do I know that I’m in a 30% wrong or 70% correct situation right now?

nunez · 5 months ago

The paper does a good job explaining why this is mathematically not possible unless the question-answer bank is a fixed set.

smallmancontrov · 5 months ago

Quite the opposite: it explains that it is mathematically straightforward to achieve better alignment on uncertainty ("calibration") but that leaderboards penalize it.

> This “epidemic” of penalizing uncertain responses can only be addressed through a socio-technical mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate leaderboards

Even more embarrassing, it looks like this is something we beat into models rather than something we can't beat out of them:

> empirical studies (Fig. 2) show that base models are often found to be calibrated, in contrast to post-trained models

That said, I generally appreciate fairly strong bias-to-action and I find the fact that it got slightly overcooked less offensive than the alternative of an undercooked bias-to-action where the model studiously avoids doing anything useful in favor of "it depends" + three plausible reasons why.

jeremyjh · 5 months ago

It doesn’t know what it doesn’t know.

fallpeak · 5 months ago

It doesn't know that because it wasn't trained on any tasks that required it to develop that understanding. There's no fundamental reason an LLM couldn't learn "what it knows" in parallel with the things it knows, given a suitable reward function during training.

binarymax · 5 months ago

Well sure. But maybe the token logprobs can be used to help give a confidence assessment.

smt88 · 5 months ago

That's not true for all types of questions. You've likely seen a model decline to answer a question that requires more recent training data than it has, for example.

This is written by someone who has no idea how transformers actually work

ricksunny · 5 months ago

Contra: The piece’s first line cites OpenAI directly https://openai.com/index/why-language-models-hallucinate/

scotty79 · 5 months ago

It could be that nobody knows how transformers actually work.

neuroelectron · 5 months ago

Furthermore, if you simply try to push certain safety topics, you can see how actually can reduce hallucinations or at least make certain topics a hard line. They simply don't because agreeing with your pie-in-the-sky plans and giving you vague directions encourages users to engage and use the chatbot.

If people got discouraged with answers like "it would take at least a decade of expertise..." or other realistic answers they wouldn't waste time fantasizing plans.

j_crick · 5 months ago

> The way language models respond to queries – by predicting one word at a time in a sentence, based on probabilities

Kinda tells all you need to know about the author in this regard.

progval · 5 months ago

I don't know what to make of it. The author looks prolific in the field of ML, with 8 published articles (and 3 preprints) in 2025, but only one on LLMs specficially. https://scholar.google.com/citations?hl=en&user=AB5z_AkAAAAJ...

skybrian · 5 months ago

> Users accustomed to receiving confident answers to virtually any question would likely abandon such systems rapidly.

Or maybe they would learn from feedback to use the system for some kinds of questions but not others? It depends on how easy it is to learn the pattern. This is a matter of user education.

Saying "I don't know" is sort of like an error message. Clear error messages make systems easier to use. If the system can give accurate advice about its own expertise, that's even better.

pton_xd · 5 months ago

> Saying "I don't know" is sort of like an error message. Clear error messages make systems easier to use.

"I don't know" is not a good error message. "Here's what I know: ..." and "here's why I'm not confident about the answer ..." would be a helpful error message.

Then the question is, when it says "here's what I know, and here's why I'm not confident" -- is it telling the truth, or is that another layer of hallucination? If so, you're back to square one.

Yeah, AI chatbots are notorious at not understanding their own limitations. I wonder how that could be fixed?

gary_0 · 5 months ago

A better headline might be "OpenAI research suggests reducing hallucinations is possible but may not be economical".

LeoMessi10 · 5 months ago

Isn't it also because lowering hallucinations requires repeated training with the same fact/data, which makes the final response closer to the training source itself and might lead to more direct charges of plagiarism (which may not be economical)?

danjc · 5 months ago

I felt this was such a cogent article on business imperatives vs fundamental transformer hallucinations, couldn’t help but HN-submit. In fact seems like a stealth plea for uncertainty-embracing benchmarks industry-wide.

tomrod · 5 months ago

Data Science tried to inject confidence bounds into businesses. It didn't go well.

baq · 5 months ago

People want oracles and they want them to say what they want to hear. They want solutions, not opinions, even if the solutions are wrong or worse, confabulations.

TechRemarker · 5 months ago

The rss feed title didn’t seem to align with the content. As more said they hallucinate because they were trained to give answers and a “I’m don’t know” is penalized as much as a wrong answer. And they say they don’t fix this because if 30% of time people got a I dunno they would stop using it. I don’t see how telling a user when they aren’t confident of the answer would cause ChatGPT to stop completely tomorrow. People like answers but most assume the answers are correct and would be very helpful to know when the bot isn’t sure. It could say there isn’t enough credible information or their is a lot of conflicting information on the matter and then say what those different potential answers are or how they user might confirm the answer etc. Seems like many options. And you could always let the user choose in preferences if they’d always prefer answer or not.

toss1 · 5 months ago

A straightforward solution to the author's problem is to offer both modes of answering, with errors or with "IDK" answers. Even charge more for the IDK version if it costs more, and the error-prone version can be "cheap and cheerful"...

layer8 · 5 months ago

Exactly. It would be analogous to the current choice between fast answers and a slower and payable “thinking” mode.

Deleted Comment