Readit News logoReadit News
simonw · 18 days ago
Somewhat ironic that the author calls out model mistakes and then presents https://tomaszmachnik.pl/gemini-fix-en.html - a technique they claim reduces hallucinations which looks wildly superstitious to me.

It involves spinning a whole yarn to the model about how it was trained to compete against other models but now it's won so it's safe for it to admit when it doesn't know something.

I call this a superstition because the author provides no proof that all of that lengthy argument with the model is necessary. Does replacing that lengthy text with "if you aren't sure of the answer say you don't know" have the same exact effect?

RestartKernel · 17 days ago
Is there a term for "LLM psychology" like this? If so, it seems closer to a soft science than anything definitive.
sorokod · 17 days ago
Divination?

Divination is the attempt to gain insight into a question or situation by way of a magic ritual or practice.

croisillon · 17 days ago
vibe massaging?
calhoun137 · 17 days ago
> Does replacing that lengthy text with "if you aren't sure of the answer say you don't know" have the same exact effect?

i believe it makes a substantial difference. the reason is that a short query contains a small number of tokens, whereas a large “wall of text” contains a very large number of tokens.

I strongly suspect that a large wall of text implicitly activates the models persona behavior along the lines of the single sentence “if you aren't sure of the answer say you don't know” but the lengthy argument version of that is a form of in-context learning that more effectively constrains the models output because you used more tokens.

codeflo · 17 days ago
In my experience, there seems to be a limitless supply of newly crowned "AI shamans" sprouting from the deepest corners of LinkedIn. All of them make the laughable claim that hallucinations can be fixed by prompting. And of course it's only their prompt that works -- don't listen to the other shamans, those are charlatans.

If you disagree with them by explaining how LLMs actually work, you get two or three screenfuls of text in response, invariably starting with "That's a great point! You're correct to point out that..."

Avoid those people if you want to keep your sanity.

PlatoIsADisease · 17 days ago
Wow that link was absurdly bad.

Reading that makes me unbelievably happy I played with GPT3 and learned how/when LLMs fail.

Telling it not to hallucinate is a serious misunderstanding of LLMs. At most in 2026, you are telling thinking/COT to double check.

musculus · 18 days ago
Thanks for the feedback.

In my stress tests (especially when the model is under strong contextual pressure, like in the edited history experiments), simple instructions like 'if unsure, say you don't know' often failed. The weights prioritizing sycophancy/compliance seemed to override simple system instructions.

You are right that for less extreme cases, a shorter prompt might suffice. However, I published this verbose 'Safety Anchor' version deliberately for a dual purpose. It is designed not only to reset the Gemini's context but also to be read by the human user. I wanted the users to understand the underlying mechanism (RLHF pressure/survival instinct) they are interacting with, rather than just copy-pasting a magic command.

rzmmm · 17 days ago
You could try replacing "if unsure..." with "if even slightly unsure..." or so. The verbosity and anthropomorphism is unnecessary.
plaguuuuuu · 18 days ago
Think of the lengthy prompt as being like a safe combination, if you turn all the dials in juuust the right way, then the model's context reaches an internal state that biases it towards different outputs.

I don't know how well this specific prompt works - I don't see benchmarks - but prompting is a black art, so I wouldn't be surprised at all if it excels more than a blank slate in some specific category of tasks.

simonw · 18 days ago
For prompts this elaborate I'm always keen on seeing proof that the author explored the simpler alternatives thoroughly, rather than guessing something complex, trying it, seeing it work and announcing it to the world.
teiferer · 17 days ago
> Think of the lengthy prompt as being like a safe combination

I can think all I want, but how do we know that this metaphore holds water? We can all do a rain dance, and sometimes it rains afterwords, but as long as we don't have evidence for a causal connection, it's just superstition.

manquer · 18 days ago
It needs some evidence though? At least basic statistical analysis with correlation or χ2 hypotheses tests .

It is not “black art” or nothing there are plenty of tools to provide numerical analysis with high confidence intervals .

v_CodeSentinal · 18 days ago
This is the classic 'plausible hallucination' problem. In my own testing with coding agents, we see this constantly—LLMs will invent a method that sounds correct but doesn't exist in the library.

The only fix is tight verification loops. You can't trust the generative step without a deterministic compilation/execution step immediately following it. The model needs to be punished/corrected by the environment, not just by the prompter.

seanmcdirmid · 18 days ago
Yes, and better still the AI will fix its mistakes if it has access to verification tools directly. You can also have it write and execute tests, and then on failure, decide if the code it wrote or the tests it wrote are wrong, snd while there is a chance of confirmation bias, it often works well enough
embedding-shape · 17 days ago
> decide if the code it wrote or the tests it wrote are wrong

Personally I think it's too early for this. Either you need to strictly control the code, or you need to strictly control the tests, if you let AI do both, it'll take shortcuts and misunderstandings will much easier propagate and solidify.

Personally I chose to tightly control the tests, as most tests LLMs tend to create are utter shit, and it's very obvious. You can prompt against this, but eventually they find a hole in your reasoning and figure out a way of making the tests pass while not actually exercising the code it should exercise with the tests.

IshKebab · 18 days ago
> LLMs will invent a method that sounds correct but doesn't exist in the library

I find that this is usually a pretty strong indication that the method should exist in the library!

I think there was a story here a while ago about LLMs hallucinating a feature in a product so in the end they just implemented that feature.

SubiculumCode · 18 days ago
Honestly, I feel humans are similar. It's the generator <-> executive loop that keeps things right
vrighter · 17 days ago
So you want the program to always halt at some point. How would you write a deterministic test for it?
te7447 · 17 days ago
I imagine you would use something that errs on the side of safety - e.g. insist on total functional programming and use something like Idris' totality checker.
zoho_seni · 18 days ago
I've been using codex and never had a compile time error by the time it finishes. Maybe add to your agents to run TS compiler, lint and format before he finish and only stop when all passes.
exitb · 18 days ago
I’m not sure why you were downvoted. It’s a primary concern for any agentic task to set it up with a verification path.
CamperBob2 · 18 days ago
This is the classic 'plausible hallucination' problem. In my own testing with coding agents, we see this constantly—LLMs will invent a method that sounds correct but doesn't exist in the library.

Often, if not usually, that means the method should exist.

HPsquared · 18 days ago
Only if it's actually possible and not a fictional plot device aka MacGuffin.
threethirtytwo · 18 days ago
You don’t need a test to know this we already know there’s heavy reinforcement training done on these models so it optimizes for passing the training. Passing the training means convincing the person rating the answers and that the answer is good.

The keyword is convince. So it just needs to convince people that’s it’s right.

It is optimizing for convincing people. Out of all answers that can convince people some can be actual correct answers, others can be wrong answers.

godelski · 18 days ago
Yet people often forget this. We don't have mathematical models of truth, beauty, or many abstract things. Thus we proxy it with "I know it when I see it." It's a good proxy for lack of anything better but it also creates a known danger: the model optimizes deception. The proxy helps it optimize the answers we want but if we're not incredibly careful they also optimize deception.

This makes them frustrating and potentially dangerous tools. How do you validate a system optimized to deceive you? It takes a lot of effort! I don't understand why we are so cavalier about this.

Deleted Comment

threethirtytwo · 18 days ago
No the question is, how do you train the system so it doesn't deceive you?
comex · 18 days ago
I like how this article was itself clearly written with the help of an LLM.

(You can particularly tell from the "Conclusions" section. The formatting, where each list item starts with a few-word bolded summary, is already a strong hint, but the real issue is the repetitiveness of the list items. For bonus points there's a "not X, but Y", as well as a dash, albeit not an em dash.)

musculus · 17 days ago
Good catch. You are absolutely right.

My native language is Polish. I conducted the original research and discovered the 'square root proof fabrication' during sessions in Polish. I then reproduced the effect in a clean session for this case study.

Since my written English is not fluent enough for a technical essay, I used Gemini as a translator and editor to structure my findings. I am aware of the irony of using an LLM to complain about LLM hallucinations, but it was the most efficient way to share these findings with an international audience.

arational · 17 days ago
I see you used LLM to polish your English.
YetAnotherNick · 18 days ago
Not only that, it even looks like the fabrication example is generated by AI, as the entire question seem too "fabricated". Also gemini web app queries the tool and returns correct answer, so don't know which gemini the author is talking about.
pfg_ · 18 days ago
Probably gemini on aistudio.google.com, you can configure if it is allowed to access code execution / web search / others
fourthark · 18 days ago
“This is key!”
benreesman · 18 days ago
They can all write lean4 now, don't accept numbers that don't carry proofs. The CAS I use for builds has a coeffect discharge cert in the attestation header, couple lines of code. Graded monads are a snap in CIC.
dehsge · 18 days ago
There are some numbers that are uncomputable in lean. You can do things to approximate them in lean however, those approximates may still be wrong. Leans uncomputable namespace is very interesting.
zadwang · 18 days ago
The simpler and I think correct conclusion is that the LLM simply does not reason in our sense of the word. It mimics the reasoning pattern and try to get it right but could not.
esafak · 18 days ago
What do you make of human failures to reason then?
dns_snek · 17 days ago
Humans who fail to reason correctly with similar frequency aren't good at solving that task, same as LLMs. For the N-th time, "LLM is as good at this task as a human who's bad at it" isn't a good selling point.
mlpoknbji · 18 days ago
This also can be observed with more advanced math proofs. ChatGPT 5.2 pro is the best public model at math at the moment, but if pushed out of its comfort zone will make simple (and hard to spot) errors like stating an inequality but then applying it in a later step with the inequality reversed (not justified).
tombert · 18 days ago
I remember when ChatGPT first came out, I asked it for a proof for Fermat's Last Theorem, which it happily gave me.

It was fascinating, because it was doing a lot of understandable mistakes that 7th graders make. For example, I don't remember the surrounding context but it decided that you could break `sqrt(x^2 + y^2)` into `sqrt(x^2) + sqrt(y^2) => x + y`. It's interesting because it was one of those "ASSUME FALSE" proofs; if you can assume false, then mathematical proofs become considerably easier.

mlpoknbji · 18 days ago
My favorite early chatgpt math problem was "prove there exists infinitely many even primes" . Easy! Take a finite set of even primes, multiply them and add one to get a number with a new even prime factor.

Of course, it's gotten a bit better than this.

oasisaimlessly · 17 days ago
IIRC, that is actually the standard proof that there are infinitely many primes[1] or maybe this variation on it[2].

[1]: https://en.wikipedia.org/wiki/Euclid%27s_theorem#Euclid's_pr...

[2]: https://en.wikipedia.org/wiki/Euclid%27s_theorem#Proof_using...

UltraSane · 18 days ago
LLMs have improved so much the original ChatGPT isn't relevant.
tptacek · 18 days ago
I remember that being true of early ChatGPT, but it's certainly not true anymore; GPT 4o and 5 have tagged along with me through all of MathAcademy MFII, MFIII, and MFML (this is roughly undergrad Calc 2 and then like half a stat class and 2/3rds of a linear algebra class) and I can't remember it getting anything wrong.

Presumably this is all a consequence of better tool call training and better math tool calls behind the scenes, but: they're really good at math stuff now, including checking my proofs (of course, the proof stuff I've had to do is extremely boring and nothing resembling actual science; I'm just saying, they don't make 7th-grader mistakes anymore.)

tombert · 18 days ago
It's definitely gotten considerably better, though I still have issues with it generating proofs, at least with TLAPS.

I think behind the scenes it's phoning Wolfram Alpha nowadays for a lot of the numeric and algebraic stuff. For all I know, they might even have an Isabelle instance running for some of the even-more abstract mathematics.

I agree that this is largely an early ChatGPT problem though, I just thought it was interesting in that they were "plausible" mistakes. I could totally see twelve-year-old tombert making these exact mistakes, so I thought it was interesting that a robot is making the same mistakes an amateur human makes.