An LLM is a lossy encyclopedia

I totally agree with the author. Sadly, I feel like that's not what the majority of LLM users tend to view LLMs. And it's definitely not what AI companies marketing.

> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters

the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input is what can lead to taking LLM output as factual. If one side of the exchange knows nothing about the subject, the other side can use jargon and even present random facts or lossy facts which can almost guarantee to impress the other side.

> The way to solve this particular problem is to make a correct example available to it.

My question is how much effort would it take to make a correct example available for the LLM before it can output quality and useful data? If the effort I put in is more than what I would get in return, then I feel like it's best to write and reason it myself.

cj · 2 days ago

> the user will at least need to know something about the topic beforehand.

I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication. "Provide dosage guidelines for medication [insert here]"

It spit back dosing guidelines that were an order of magnitude wrong (suggested 100mcg instead of 1mg). When I saw 100mcg, I was suspicious and said "I don't think that's right" and it quickly corrected itself and provided the correct dosing guidelines.

These are the kind of innocent errors that can be dangerous if users trust it blindly.

The main challenge is LLMs aren't able to gauge confidence in its answers, so it can't adjust how confidently it communicates information back to you. It's like compressing a photo, and a photographer wrongly saying "here's the best quality image I have!" - do you trust the photographer at their word, or do you challenge him to find a better quality image?

zehaeva · 2 days ago

What if you had told it again that you don't think that's right? Would it have stuck to it's guns and went "oh, no, I am right here" or would it have backed down and said "Oh, silly me, you're right, here's the real dosage!" and give you again something wrong?

I do agree that to get the full usage out of an LLM you should have some familiarity with what you're asking about. If you didn't already have a sense of what a dosage is already, why wouldn't 100mcg be the right one?

blehn · 2 days ago

Perhaps the absolute worst use-case for an LLM

BeetleB · 2 days ago

> I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication.

This use case is bad by several degrees.

Consider an alternative: Using Google to search for it and relying on its AI generated answer. This usage would be bad by one degree less, but still bad.

What about using Google and clicking on one of the top results? Maybe healthline.com? This usage would reduce the badness by one further degree, but still be bad.

I could go on and on, but for this use case, unless it's some generic drug (ibuprofen or something), the only correct use case is going to the manufacturer's web site, ensuring you're looking at the exact same medication (not some newer version or a variant), and looking at the dosage guidelines.

No, not Mayo clinic or any other site (unless it's a pretty generic medicine).

This is just not a good example to highlight the problems of using an LLM. You're likely not that much worse off than using Google.

SV_BubbleTime · 2 days ago

LANGUAGE model, not FACT model.

kenjackson · 2 days ago

"The main challenge is LLMs aren't able to gauge confidence in its answers"

This seems like a very tractable problem. And I think in many cases they can do that. For example, I tried your example with Losartan and it gave the right dosage. Then I said, "I think you're wrong", and it insisted it was right. Then I said, "No, it should be 50g." And it replied, "I need to stop you there". Then went on to correct me again.

I've also seen cases where it has confidence where it shouldn't, but there does seem to be some notion of confidence that does exist.

QuantumGood · 2 days ago

With search and references, and without search and references are two different tools. They're supposed to be closer to the same thing, but are not. That isn't to say there's a guarantee of correctness with references, but in my experience, accuracy is better, and seeing unexpected references is helpful when confirming.

naet · 2 days ago

That is exactly the kind of question that I would never trust to chatgpt.

tuatoru · 2 days ago

Modern Russian Roulette, using LLMs for dose calculations.

Aeolun · 2 days ago

I feel like asking an LLM for medicine dosage guidelines is exactly what you should never use it for…

dncornholio · 2 days ago

Using a LLM for medical research is just as dangerous as Googling it. Always ask your doctors!

christkv · 2 days ago

I find if I force thinking mode and then force it to search the web it’s much better.

ljsprague · 2 days ago

Maybe don't use an LLM for dosing guidelines.

giancarlostoro · 3 days ago

> the user will at least need to know something about the topic beforehand.

This is why I've said a few times here on HN and elsewhere, if you're using an LLM you need to think of yourself as an architect guiding a Junior to Mid Level developer. Juniors can do amazing things, they can also goof up hard. What's really funny is you can make them audit their own code in a new context window, and give you a detailed answer as to why that code is awful.

I use it mostly on personal projects especially since I can prototype quickly as needed.

skydhash · 3 days ago

> if you're using an LLM you need to think of yourself as an architect guiding a Junior to Mid Level developer.

The thing is coding can (and should) be part of the design process. Many times, I though I have a good idea of what the solution should look like, then while coding, I got exposed more to the libraries and other parts of the code, which led me to a more refined approach. This exposure is what you will miss and it will quickly result in unfamiliar code.

bobbylarrybobby · 2 days ago

https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect

puchatek · 2 days ago

I was thinking the exact same thing.

HarHarVeryFunny · 2 days ago

> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters

It's also useful to have an intuition for what things an LLM is liable to get wrong/hallucinate, one of which is questions where the question itself suggests one or more obvious answers (which may or may not be correct), which the LLM may well then hallucinate, and sound reasonable, if it doesn't "know".

felipeerias · 2 days ago

LLMs are very sensitive to leading questions. A small hint of that the expected answer looks like will tend to produce exactly that answer.

netcan · 2 days ago

>the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input

I think there's a parallel here for the internet as an i formation source. It delivered on "unlimited knowledge at the tip of everyone's fingertips" but lowering the bar also lowered the bar.

That access "works" only when the user is capable of doing their part too. Evaluating sources, integrating knowledge. Validating. Cross examining.

Now we are just more used to recognizing that accessibility comes with its own problem.

Some of this is down to general education. Some to domain expertize. Personality plays a big part.

The biggest factor is, i think, intelligence. There's a lot of 2nd and 3rd order thinking required to simultaneously entertain a curiosity, consider of how the LLM works, and exercise different levels of skepticism depending on the types of errors LLMs are likely to make.

Using LLMs correctly and incorrectly is.. subtle.

theshrike79 · 3 days ago

> the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand

This is why simonw (The author) has his "pelican on a bike" -test, it's not 100% accurate but it is a good indicator.

I have a set of my own standard queries and problems (no counting characters or algebra crap) I feed to new LLMs I'm testing

None of the questions exist outside of my own Obsidian note so they can't be gamed by LLM authors. And I've tested multiple different LLMs using them so I have a "feeling" on what the answer should look like. And I personally know the correct answer so I can immediately validate them.

barapa · 3 days ago

They are training on your queries. So they may have some exposure to them going forward.

estimator7292 · a day ago

It's really strange to me that the only way to effectively use LLMs is if you already have all the knowledge and skill to do the task yourself.

I can't think of any other tools like this. An LLM can multiply your efforts, but only if you were capable of doing it yourself. Wild.

A lossy encyclopaedia should be missing information and be obvious about it, not making it up without your knowledge and changing the answer every time.

When you have a lossy piece of media, such as a compressed sound or image file, you can always see the resemblance to the original and note the degradation as it happens. You never have a clear JPEG of a lamp, compress it, and get a clear image of the Milky Way, then reopen the image and get a clear image of a pile of dirt.

Furthermore, an encyclopaedia is something you can reference and learn from without a goal, it allows you to peruse information you have no concept of. Not so with LLMs, which you have to query to get an answer.

gjm11 · 3 days ago

Lossy compression does make things up. We call them compression artefacts.

In compressed audio these can be things like clicks and boings and echoes and pre-echoes. In compressed images they can be ripply effects near edges, banding in smoothly varying regions, but there are also things like https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres... where one digit is replaced with a nice clean version of a different digit, which is pretty on-the-nose for the LLM failure mode you're talking about.

Compression artefacts generally affect small parts of the image or audio or video rather than replacing the whole thing -- but in the analogy, "the whole thing" is an encyclopaedia and the artefacts are affecting little bits of that.

Of course the analogy isn't exact. That would be why S.W. opens his post by saying "Since I love collecting questionable analogies for LLMs,".

moregrist · 2 days ago

> Lossy compression does make things up. We call them compression artefacts.

I don’t think this is a great analogy.

Lossy compression of images or signals tends to throw out information based on how humans perceive it, focusing on the most important perceptual parts and discarding the less important parts. For example, JPEG essentially removes high frequency components from an image because more information is present with the low frequency parts. Similarly, POTS phone encoding and mp3 both compress audio signals based on how humans perceive audio frequency.

The perceived degradation of most lossy compression is gradual with the amount of compression and not typically what someone means when they say “make things up.”

LLM hallucinations aren’t gradual and the compression doesn’t seem to follow human perception.

latexr · 3 days ago

I feel like my comment is pretty clear that a compression artefact is not the same thing as making the whole thing up.

> Of course the analogy isn't exact.

And I don’t expect it to be, which is something I’ve made clear several times before, including on this very thread.

https://news.ycombinator.com/item?id=45101679

jpcompartir · 3 days ago

Interesting, in the LLM case these compression artefacts then get fed into the generating process of the next token, hence the errors compound.

TomasBM · 2 days ago

I'd rather say LLMs are a lossy encyclopedia + other things. The other things part obviously does a lot of work here, but if we strip it away, we can claim that the remaining subset of the underlying network encodes true information about the world.

Purely based on language use, you could expect "dog bit the man" more often than "man bit the dog", which is a lossy way to represent "dogs are more likely to bite people than vice versa." And there's also the second lossy part where information not occurring frequently enough in the training data will not survive training.

Of course, other things also include inaccurate information, frequent but otherwise useless sentences (any sentence with "Alice" and "Bob"), and the heavily pruned results of the post-training RL stage. So, you can't really separate the "encyclopedia" from the rest.

Also, not sure if lossy always means that loss is distributed (i.e., lower resolution). Loss can also be localized / biased (i.e., lose only black pixels), it's just that useful lossy compression algorithms tend to minimize the noticeable loss. Tho I could be wrong.

gf000 · 3 days ago

I don't think there is a singular "should" that fits every use case.

E.g. a Bloom filter also doesn't "know" what it knows.

latexr · 3 days ago

I don’t understand the point you’re trying to make. The given example confused me further, since nothing in my argument is concerned with the tool “knowing” anything, that has no relation to the idea I’m expressing.

I do understand and agree with a different point you’re making somewhere else in this thread, but it doesn’t seem related to what you’re saying here.

https://news.ycombinator.com/item?id=45101946

mock-possum · 2 days ago

Yeah an LLM is an unreliable librarian, if anything.

latexr · 2 days ago

That’s a much better analogy. You have to specifically ask them for information and they will happily retrieve it for you, but because they are unreliable they may get you the wrong thing. If you push back they’ll apologise and try again (librarians try to be helpful) but might again give you the wrong thing (you never know, because they are unreliable).

Lerc · 3 days ago

The argument is that a banana is a squishy hammer.

You're saying hammers shouldn't be squishy.

Simon is saying don't use a banana as a hammer.

latexr · 2 days ago

> You're saying hammers shouldn't be squishy.

No, that is not what I’m saying. My point is closer to “the words chosen to describe the made up concept do not translate to the idea being conveyed”. I tried to make that fit into your idea of the banana and squishy hammer, but now we’re several levels of abstraction deep using analogies to discuss analogies so it’s getting complicated to communicate clearly.

> Simon is saying don't use a banana as a hammer.

Which I agree with.

JustFinishedBSG · 3 days ago

I actually disagree. Modern encoding formats can, and do, hallucinate blocks.

It’s a lot less visible and I guess dramatic than LLMs but it happens frequently enough that I feel like at every major event there are false conspiracies based on video « proofs » that are just encoding artifacts

petesergeant · 2 days ago

You are absolutely right, and exactly the same thing came into my head while reading this. Some of the replies to you here are very irritating and seem not to grasp the point you're making, so I thought I'd chime in for moral support.

simonw · 3 days ago

I think you are missing the point of the analogy: a lossy encyclopedia is obviously a bad idea, because encyclopedias are meant to be reliable places to look up facts.

latexr · 3 days ago

And my point is that “lossy” does not mean “unreliable”. LLMs aren’t reliable sources of facts, no argument there, but a true lossy encyclopaedia might be. Lossy algorithms don’t just make up and change information, they remove it from places where they might not make a difference to the whole. A lossy encyclopaedia might be one where, for example, you remove the images plus gramatical and phonetic information. Eventually you might compress the information where the entry for “dog” only reads “four legged creature”—which is correct but not terribly helpful—but you wouldn’t get “space mollusk”.

baq · 3 days ago

A lossy encyclopedia which you can talk to and it can look up facts in the lossless version while having a conversation OTOH is... not a bad idea at all, and hundreds of millions of people agree if traffic numbers are to be believed.

(but it isn't and won't ever be an oracle and apparently that's a challenge for human psychology.)

checkyoursudo · 3 days ago

I am sympathetic to your analogy. I think it works well enough.

But it falls a bit short in that encyclopedias, lossy or not, shouldn't affirmatively contain false information. The way I would picture a lossy encyclopedia is that it can misdirect by omission, but it would not change A to ¬A.

Maybe a truthy-roulette enclyclopedia?

butlike · 2 days ago

I don't like the confident hallucinations of LLMs either, but don't they rewrite and add entries in the encyclopedia every few years? Implicitly that makes your old copy "lossy"

Again, never really want a confidently-wrong encyclopedia, though

rynn · 2 days ago

Aren't all encyclopedias 'lossy'? They are all partial collections of information; none have all of the facts.

TacticalCoder · 3 days ago

> You never have a clear JPEG of a lamp, compress it, and get a clear image of the Milky Way, then reopen the image and get a clear image of a pile of dirt.

Oh but it's much worse than that: because most LLMs aren't deterministic in the way they operate [1], you can get a pristine image of a different pile of dirt every single time you ask.

[1] there are models where if you have the "model + prompt + seed" you're at least guaranteed to get the same output every single time. FWIW I use LLMs but I cannot integrate them in anything I produce when what they output ain't deterministic.

ACCount37 · 3 days ago

"Deterministic" is overrated.

Computers are deterministic. Most of the time. If you really don't think about all the times they aren't. But if you leave the CPU-land and go out into the real world, you don't have the privilege of working with deterministic systems at all.

Engineering with LLMs is closer to "designing a robust industrial process that's going to be performed by unskilled minimum wage workers" than it is to "writing a software algorithm". It's still an engineering problem - but of the kind that requires an entirely different frame of mind to tackle.

latexr · 3 days ago

> you can get a pristine image of a different pile of dirt every single time you ask.

That’s what I was trying to convey with the “then reopen the image” bit. But I chose a different image of a different thing rather than a different image of a similar thing.

energy123 · 3 days ago

An encyclopaedia also can't win a gold medals at the IMO and IOI. So yeah, they're not the same thing, even though the analogy is pretty good.

latexr · 3 days ago

Of course they’re not the same thing, the goal of an analogy is not to be perfect but to provide a point of comparison to explain an idea.

My point is that I find the chosen term inadequate. The author made it up from combining two existing words, where one of them is a poor fit for what they’re aiming to convey.