I totally agree with the author. Sadly, I feel like that's not what the majority of LLM users tend to view LLMs. And it's definitely not what AI companies marketing.
> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters
the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input is what can lead to taking LLM output as factual. If one side of the exchange knows nothing about the subject, the other side can use jargon and even present random facts or lossy facts which can almost guarantee to impress the other side.
> The way to solve this particular problem is to make a correct example available to it.
My question is how much effort would it take to make a correct example available for the LLM before it can output quality and useful data? If the effort I put in is more than what I would get in return, then I feel like it's best to write and reason it myself.
> the user will at least need to know something about the topic beforehand.
I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication. "Provide dosage guidelines for medication [insert here]"
It spit back dosing guidelines that were an order of magnitude wrong (suggested 100mcg instead of 1mg). When I saw 100mcg, I was suspicious and said "I don't think that's right" and it quickly corrected itself and provided the correct dosing guidelines.
These are the kind of innocent errors that can be dangerous if users trust it blindly.
The main challenge is LLMs aren't able to gauge confidence in its answers, so it can't adjust how confidently it communicates information back to you. It's like compressing a photo, and a photographer wrongly saying "here's the best quality image I have!" - do you trust the photographer at their word, or do you challenge him to find a better quality image?
What if you had told it again that you don't think that's right? Would it have stuck to it's guns and went "oh, no, I am right here" or would it have backed down and said "Oh, silly me, you're right, here's the real dosage!" and give you again something wrong?
I do agree that to get the full usage out of an LLM you should have some familiarity with what you're asking about. If you didn't already have a sense of what a dosage is already, why wouldn't 100mcg be the right one?
> I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication.
This use case is bad by several degrees.
Consider an alternative: Using Google to search for it and relying on its AI generated answer. This usage would be bad by one degree less, but still bad.
What about using Google and clicking on one of the top results? Maybe healthline.com? This usage would reduce the badness by one further degree, but still be bad.
I could go on and on, but for this use case, unless it's some generic drug (ibuprofen or something), the only correct use case is going to the manufacturer's web site, ensuring you're looking at the exact same medication (not some newer version or a variant), and looking at the dosage guidelines.
No, not Mayo clinic or any other site (unless it's a pretty generic medicine).
This is just not a good example to highlight the problems of using an LLM. You're likely not that much worse off than using Google.
"The main challenge is LLMs aren't able to gauge confidence in its answers"
This seems like a very tractable problem. And I think in many cases they can do that. For example, I tried your example with Losartan and it gave the right dosage. Then I said, "I think you're wrong", and it insisted it was right. Then I said, "No, it should be 50g." And it replied, "I need to stop you there". Then went on to correct me again.
I've also seen cases where it has confidence where it shouldn't, but there does seem to be some notion of confidence that does exist.
With search and references, and without search and references are two different tools. They're supposed to be closer to the same thing, but are not. That isn't to say there's a guarantee of correctness with references, but in my experience, accuracy is better, and seeing unexpected references is helpful when confirming.
> the user will at least need to know something about the topic beforehand.
This is why I've said a few times here on HN and elsewhere, if you're using an LLM you need to think of yourself as an architect guiding a Junior to Mid Level developer. Juniors can do amazing things, they can also goof up hard. What's really funny is you can make them audit their own code in a new context window, and give you a detailed answer as to why that code is awful.
I use it mostly on personal projects especially since I can prototype quickly as needed.
> if you're using an LLM you need to think of yourself as an architect guiding a Junior to Mid Level developer.
The thing is coding can (and should) be part of the design process. Many times, I though I have a good idea of what the solution should look like, then while coding, I got exposed more to the libraries and other parts of the code, which led me to a more refined approach. This exposure is what you will miss and it will quickly result in unfamiliar code.
> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters
It's also useful to have an intuition for what things an LLM is liable to get wrong/hallucinate, one of which is questions where the question itself suggests one or more obvious answers (which may or may not be correct), which the LLM may well then hallucinate, and sound reasonable, if it doesn't "know".
>the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input
I think there's a parallel here for the internet as an i formation source. It delivered on "unlimited knowledge at the tip of everyone's fingertips" but lowering the bar also lowered the bar.
That access "works" only when the user is capable of doing their part too. Evaluating sources, integrating knowledge. Validating. Cross examining.
Now we are just more used to recognizing that accessibility comes with its own problem.
Some of this is down to general education. Some to domain expertize. Personality plays a big part.
The biggest factor is, i think, intelligence. There's a lot of 2nd and 3rd order thinking required to simultaneously entertain a curiosity, consider of how the LLM works, and exercise different levels of skepticism depending on the types of errors LLMs are likely to make.
> the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand
This is why simonw (The author) has his "pelican on a bike" -test, it's not 100% accurate but it is a good indicator.
I have a set of my own standard queries and problems (no counting characters or algebra crap) I feed to new LLMs I'm testing
None of the questions exist outside of my own Obsidian note so they can't be gamed by LLM authors. And I've tested multiple different LLMs using them so I have a "feeling" on what the answer should look like. And I personally know the correct answer so I can immediately validate them.
A lossy encyclopaedia should be missing information and be obvious about it, not making it up without your knowledge and changing the answer every time.
When you have a lossy piece of media, such as a compressed sound or image file, you can always see the resemblance to the original and note the degradation as it happens. You never have a clear JPEG of a lamp, compress it, and get a clear image of the Milky Way, then reopen the image and get a clear image of a pile of dirt.
Furthermore, an encyclopaedia is something you can reference and learn from without a goal, it allows you to peruse information you have no concept of. Not so with LLMs, which you have to query to get an answer.
Lossy compression does make things up. We call them compression artefacts.
In compressed audio these can be things like clicks and boings and echoes and pre-echoes. In compressed images they can be ripply effects near edges, banding in smoothly varying regions, but there are also things like https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres... where one digit is replaced with a nice clean version of a different digit, which is pretty on-the-nose for the LLM failure mode you're talking about.
Compression artefacts generally affect small parts of the image or audio or video rather than replacing the whole thing -- but in the analogy, "the whole thing" is an encyclopaedia and the artefacts are affecting little bits of that.
Of course the analogy isn't exact. That would be why S.W. opens his post by saying "Since I love collecting questionable analogies for LLMs,".
> Lossy compression does make things up. We call them compression artefacts.
I don’t think this is a great analogy.
Lossy compression of images or signals tends to throw out information based on how humans perceive it, focusing on the most important perceptual parts and discarding the less important parts. For example, JPEG essentially removes high frequency components from an image because more information is present with the low frequency parts. Similarly, POTS phone encoding and mp3 both compress audio signals based on how humans perceive audio frequency.
The perceived degradation of most lossy compression is gradual with the amount of compression and not typically what someone means when they say “make things up.”
LLM hallucinations aren’t gradual and the compression doesn’t seem to follow human perception.
I'd rather say LLMs are a lossy encyclopedia + other things. The other things part obviously does a lot of work here, but if we strip it away, we can claim that the remaining subset of the underlying network encodes true information about the world.
Purely based on language use, you could expect "dog bit the man" more often than "man bit the dog", which is a lossy way to represent "dogs are more likely to bite people than vice versa." And there's also the second lossy part where information not occurring frequently enough in the training data will not survive training.
Of course, other things also include inaccurate information, frequent but otherwise useless sentences (any sentence with "Alice" and "Bob"), and the heavily pruned results of the post-training RL stage. So, you can't really separate the "encyclopedia" from the rest.
Also, not sure if lossy always means that loss is distributed (i.e., lower resolution). Loss can also be localized / biased (i.e., lose only black pixels), it's just that useful lossy compression algorithms tend to minimize the noticeable loss. Tho I could be wrong.
I don’t understand the point you’re trying to make. The given example confused me further, since nothing in my argument is concerned with the tool “knowing” anything, that has no relation to the idea I’m expressing.
I do understand and agree with a different point you’re making somewhere else in this thread, but it doesn’t seem related to what you’re saying here.
That’s a much better analogy. You have to specifically ask them for information and they will happily retrieve it for you, but because they are unreliable they may get you the wrong thing. If you push back they’ll apologise and try again (librarians try to be helpful) but might again give you the wrong thing (you never know, because they are unreliable).
No, that is not what I’m saying. My point is closer to “the words chosen to describe the made up concept do not translate to the idea being conveyed”. I tried to make that fit into your idea of the banana and squishy hammer, but now we’re several levels of abstraction deep using analogies to discuss analogies so it’s getting complicated to communicate clearly.
I actually disagree. Modern encoding formats can, and do, hallucinate blocks.
It’s a lot less visible and I guess dramatic than LLMs but it happens frequently enough that I feel like at every major event there are false conspiracies based on video « proofs » that are just encoding artifacts
You are absolutely right, and exactly the same thing came into my head while reading this. Some of the replies to you here are very irritating and seem not to grasp the point you're making, so I thought I'd chime in for moral support.
I think you are missing the point of the analogy: a lossy encyclopedia is obviously a bad idea, because encyclopedias are meant to be reliable places to look up facts.
And my point is that “lossy” does not mean “unreliable”. LLMs aren’t reliable sources of facts, no argument there, but a true lossy encyclopaedia might be. Lossy algorithms don’t just make up and change information, they remove it from places where they might not make a difference to the whole. A lossy encyclopaedia might be one where, for example, you remove the images plus gramatical and phonetic information. Eventually you might compress the information where the entry for “dog” only reads “four legged creature”—which is correct but not terribly helpful—but you wouldn’t get “space mollusk”.
A lossy encyclopedia which you can talk to and it can look up facts in the lossless version while having a conversation OTOH is... not a bad idea at all, and hundreds of millions of people agree if traffic numbers are to be believed.
(but it isn't and won't ever be an oracle and apparently that's a challenge for human psychology.)
I am sympathetic to your analogy. I think it works well enough.
But it falls a bit short in that encyclopedias, lossy or not, shouldn't affirmatively contain false information. The way I would picture a lossy encyclopedia is that it can misdirect by omission, but it would not change A to ¬A.
I don't like the confident hallucinations of LLMs either, but don't they rewrite and add entries in the encyclopedia every few years? Implicitly that makes your old copy "lossy"
Again, never really want a confidently-wrong encyclopedia, though
> You never have a clear JPEG of a lamp, compress it, and get a clear image of the Milky Way, then reopen the image and get a clear image of a pile of dirt.
Oh but it's much worse than that: because most LLMs aren't deterministic in the way they operate [1], you can get a pristine image of a different pile of dirt every single time you ask.
[1] there are models where if you have the "model + prompt + seed" you're at least guaranteed to get the same output every single time. FWIW I use LLMs but I cannot integrate them in anything I produce when what they output ain't deterministic.
Computers are deterministic. Most of the time. If you really don't think about all the times they aren't. But if you leave the CPU-land and go out into the real world, you don't have the privilege of working with deterministic systems at all.
Engineering with LLMs is closer to "designing a robust industrial process that's going to be performed by unskilled minimum wage workers" than it is to "writing a software algorithm". It's still an engineering problem - but of the kind that requires an entirely different frame of mind to tackle.
> you can get a pristine image of a different pile of dirt every single time you ask.
That’s what I was trying to convey with the “then reopen the image” bit. But I chose a different image of a different thing rather than a different image of a similar thing.
Of course they’re not the same thing, the goal of an analogy is not to be perfect but to provide a point of comparison to explain an idea.
My point is that I find the chosen term inadequate. The author made it up from combining two existing words, where one of them is a poor fit for what they’re aiming to convey.
Please, everybody, preserve your records. Preserve your books, preserve your downloaded files (that can't be tampered with), keep everything. AI is going to make it harder and harder to find out the truth about anything over the next few years.
You have a moral duty to keep your books, and keep your locally-stored information.
I get very annoyed when llms respond with quotes around certain things I ask for, then when I say what is the source of that quote? they say oh I was paraphrasing and that isnt a real quote.
At least wikipedia has sources that probably support what it says and normally the quotes are real quotes. LLMs just seem to add quotation marks as, "proof" that its confident something is correct.
Thinking of an LLM as any kind of encyclopedia is probably the wrong model. LLMs are information presentation/processing tools that incidentally, as a consequence of the method by which they are built to do that, may occasionally produce factual information that is not directly prompted.
If you want an LLM to be part of a tool that is intended to provide access to (presumably with some added value) encyclopedic information, it is best not to consider the LLM as providing any part of the encyclopedic information function of the system, but instead as providing part of the user interface of the system. The encyclopedic information should be provided by appropriate tooling that, at request by an appropriately prompted LLM or at direction of an orchestration layer with access to user requests (and both kinds of tooling might be used in the same system) provides relevant factual data which is inserted into the LLM’s context.
The correct modifier to insert into the sentence “An LLM is an encyclopedia” is “not”, not “lossy”.
Using artificial neural networks directly for information storage and retrieval (i.e. not just leveraging them as tools accessing other types of storage) is currently infeasible, agreed.
On the other hand, biological neural networks are doing it all the time :) And there might well be an advantage to it (or a hybrid method), once we can make it more economical.
After all, the embedding vector space is shaped by the distribution of training data, and if you have out-of-distribution data coming in due to a new or changed environment, RAG using pre-trained models and their vector spaces will only go so far.
I think an LLM can be used as a kind of lossy encyclopedia, but equating it directly to one isn't entirely accurate. The human mind is also, in a sense, a lossy encyclopedia.
I prefer to think of LLMs as lossy predictors. If you think about it, natural "intelligence" itself can be understood as another type of predictor: you build a world model to anticipate what will happen next so you can plan your actions accordingly and survive.
In the real world, with countless fuzzy factors, no predictor can ever be perfectly lossless. The only real difference, for me, is that LLMs are lossier predictors than human minds (for now). That's all there is to it.
Whatever analogy you use, it comes down to the realization that there's always some lossiness involved, whether you frame it as an encyclopedia or not.
Are LLMs really lossier than humans? I think it depends on the context. Given any particular example, LLMs might hallucinate more and a human might do a better job at accuracy. But overall LLMs will remember far more things than a human. Ask a human to reproduce what they read in a book last year and there's a good chance you'll get either absolutely nothing or just a vague idea of what the book was about - in this context they can be up to 100% lossy. The difference here is that human memory decays over time while a LLM's memory is hardwired.
I think what trips people up is that LLMs and humans are both lossy, but in different ways.
The intuitions that we've developed around previous interactions are very misleading when applied to LLMs. When interacting with a human, we're used to being able to ask a question about topic X in context Y and assume that if you can answer it we can rely on you to be able to talk about it in the very similar context Z.
But LLMs are bad at commutative facts; A=B and B=A can have different performance characteristics. Just because it can answer A=B does not mean it is good at answering B=A; you have to test them separately.
I've seen researchers who should really know better screw this up, rendering their methodology useless for the claim they're trying to validate. Our intuition for how humans do things can be very misleading when working with LLMs.
That's not exactly true. Every time you start a new conversation; you get a new LLM for all intents. Asking an LLM about an unrelated topic towards the end of a ~500 page conversation will get you vastly different results than at the beginning. If we could get to multi-thousand page contexts, it would probably be less accurate than a human, tbh.
Imagine having the world's most comprehensive encyclopedia at your literal fingertips, 24 hours a day, but being so lazy that you offload the hard work of thinking by letting retarded software pathologically lie to you and then blindly accepting the non-answers it spits at you rather than typing in two or three keywords to Wikipedia and skimming the top paragraph.
>I prefer to think of LLMs as lossy predictors.
I've started to call them the Great Filter.
In the latest issue of the comic book Lex Luthor attempts to exterminate humanity by hacking the LLM and having it inform humanity that they can hold their breath underwater for 17 hours.
Lossy is an incomplete characterization. LLMs are also much more fluctuating and fuzzy. You can get wildly varying output depending on prompting, for what should be the same (even if lossy) knowledge. There is not just loss during the training, but also loss and variation during inference. An LLM overall is a much less coherent and consistent thing than most humans, in terms of knowledge, mindset, and elucidations.
The foundational conceit (if you will) of LLMs is that they build a semantic (world) model to 'make sense' of their training. However it is much more likely that they are simply building a syntactic model in response to the training. As far as I know there is no evidence of a semantic model emerging.
Maybe I don’t have a precise enough definition of syntax and semantics, but it seems like it’s more than just syntactic since interchangeable tokens in the same syntax affect the semantics of the sentence. Or do you view completing a prompt such as “The president of the United States is?” as a syntax question?
There's some evidence of valid relationships: you can build a map of Manhattan by asking about directions from each street corner and plotting the relations.
This is still entirely referential, but in a way that a human would see some relation to the actual thing, albeit in a somewhat weird and alien way.
> If you think about it, natural "intelligence" itself can be understood as another type of predictor: you build a world model to anticipate what will happen next so you can plan your actions accordingly and survive.
Yes.
Human intelligence consists of three things.
First, groundedness: The ability to form a representation of the world and one’s place in it.
Second, a temporal-spatial sense: A subjective and bounded idea of self in objective space and time.
Third: A general predictive function which is capable of broad abstraction.
At its most basic level, this third element enables man to acquire, process, store, represent, and continually re-acquire knowledge which is external to that man's subjective existence. This is calculation in the strictest sense.
And it is the third element -- the strength, speed, and breadth of the predictive function -- which is synonymous with the word "intelligence." Higher animals have all three elements, but they're pretty hazy -- especially the third. And, in humans, short time horizons are synonymous with intellectual dullness.
All of this is to say that if you have a "prediction machine" you're 90% of the way to a true "intelligence machine." It also, I think, suggests routes that might lead to more robust AI in the future. (Ground the AI, give it a limited physical presence in time and space, match its clocks to the outside world.)
"Prediction" is hardly more than another term for inference. It's the very essence of machine learning. There is nothing new or useful in this concept.
Another difference is that you are predicting future sensory experiences in real-time, while LLMs "predict" text which a "helpful, honest, harmless" assistant would produce.
There are a lot of parallels between AI and compression.
In fact the best compression algorithms and LLMs have in common that they work by predicting the next word. Compression algorithms take an extra step called entropy coding to encode the difference between the prediction and the actual data efficiently, and the better the prediction, the better the compression ratio.
What makes a LLM "lossy" is that you don't have the "encode the difference" step.
And yes, it means you can turn a LLM into a (lossless) compression algorithm, and I think a really good one in term of compression ratio on huge data sets. You can also turn a compression algorithm like gzip into a language model! A very terrible one, but the output is better than a random stream of bytes.
I suspect this ends up being pretty important for the next advancements in AI, specifically LLM-based AI. To me, the transformer architecture is a sort of compression algorithm that is being exploited for emergent behavior at the margins. But I think this is more like stream of consciousness than premeditated thought. Eventually I think we figure out a way to "think" in latent space and have our existing AI models be just the mouthpiece.
In my experience as a human, the more you know about a subject, or even the more you have simply seen content about it, the easier it is to ramble on about it convincingly. It's like a mirroring skill, and it does not actually mean you understand what you're saying.
LLMs seem to do the same thing, I think. At scale this is widely useful, though, I am not discounting it. Just think it's an order of magnitude below what's possible and all this talk of existing stream-of-consciousness-like LLMs creating AGI seems like a miss
One difference is that compression gives you one and only one thing when decompressing. Decompression isn't a function taking arbitrary additional input and producing potentially arbitrary, nondeterministic output based on it.
We would have very different conversations if LLMs were things that merely exploded into a singular lossy-expanded version of Wikipedia, but where looking at the article for any topic X would give you the exact same article each time.
LLMs deliberately insert randomness. If you run a model locally (or sometimes via API), you can turn that off and get the same response for the same input every time.
Yes, LLM is a lossy encyclopedia with a human-language answering interface. This has some benefits, mostly in terms of convenience. You don't have to browse or read through so many pages of a real encyclopedia to get a quick answer.
However, there is also a clear downside. Currently, LLM is unable to judge if your question is formulated incorrectly or if your question opens up more questions that should be answered first. It always jumps to answering something. A real human would assess the questioner first and usually ask for more details before answering. I feel this is the predominant reason why LLM answers feel so dumb at times. It never asks for clarification.
I don't think that's universally true with the new models - I've seen Claude 4 and GPT-5 ask for clarification on questions with obvious gaps.
With GPT-5 I sometimes see it spot a question that needs clarifying in its thinking trace, then pick the most likely answer, then spit out an answer later that says "assuming you meant X ..." - I've even had it provide an answer in two sections for each branch of a clear ambiguity.
This is also why the Kagi Assistant is still be the AI tool I’ve found. The failure state is the same as a search results, it either can’t find anything, finds something irrelevant, or finds material that contradicts the premise of your question.
It seems to me the more you can pin it to another data set, the better.
> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters
the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input is what can lead to taking LLM output as factual. If one side of the exchange knows nothing about the subject, the other side can use jargon and even present random facts or lossy facts which can almost guarantee to impress the other side.
> The way to solve this particular problem is to make a correct example available to it.
My question is how much effort would it take to make a correct example available for the LLM before it can output quality and useful data? If the effort I put in is more than what I would get in return, then I feel like it's best to write and reason it myself.
I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication. "Provide dosage guidelines for medication [insert here]"
It spit back dosing guidelines that were an order of magnitude wrong (suggested 100mcg instead of 1mg). When I saw 100mcg, I was suspicious and said "I don't think that's right" and it quickly corrected itself and provided the correct dosing guidelines.
These are the kind of innocent errors that can be dangerous if users trust it blindly.
The main challenge is LLMs aren't able to gauge confidence in its answers, so it can't adjust how confidently it communicates information back to you. It's like compressing a photo, and a photographer wrongly saying "here's the best quality image I have!" - do you trust the photographer at their word, or do you challenge him to find a better quality image?
I do agree that to get the full usage out of an LLM you should have some familiarity with what you're asking about. If you didn't already have a sense of what a dosage is already, why wouldn't 100mcg be the right one?
This use case is bad by several degrees.
Consider an alternative: Using Google to search for it and relying on its AI generated answer. This usage would be bad by one degree less, but still bad.
What about using Google and clicking on one of the top results? Maybe healthline.com? This usage would reduce the badness by one further degree, but still be bad.
I could go on and on, but for this use case, unless it's some generic drug (ibuprofen or something), the only correct use case is going to the manufacturer's web site, ensuring you're looking at the exact same medication (not some newer version or a variant), and looking at the dosage guidelines.
No, not Mayo clinic or any other site (unless it's a pretty generic medicine).
This is just not a good example to highlight the problems of using an LLM. You're likely not that much worse off than using Google.
This seems like a very tractable problem. And I think in many cases they can do that. For example, I tried your example with Losartan and it gave the right dosage. Then I said, "I think you're wrong", and it insisted it was right. Then I said, "No, it should be 50g." And it replied, "I need to stop you there". Then went on to correct me again.
I've also seen cases where it has confidence where it shouldn't, but there does seem to be some notion of confidence that does exist.
This is why I've said a few times here on HN and elsewhere, if you're using an LLM you need to think of yourself as an architect guiding a Junior to Mid Level developer. Juniors can do amazing things, they can also goof up hard. What's really funny is you can make them audit their own code in a new context window, and give you a detailed answer as to why that code is awful.
I use it mostly on personal projects especially since I can prototype quickly as needed.
The thing is coding can (and should) be part of the design process. Many times, I though I have a good idea of what the solution should look like, then while coding, I got exposed more to the libraries and other parts of the code, which led me to a more refined approach. This exposure is what you will miss and it will quickly result in unfamiliar code.
It's also useful to have an intuition for what things an LLM is liable to get wrong/hallucinate, one of which is questions where the question itself suggests one or more obvious answers (which may or may not be correct), which the LLM may well then hallucinate, and sound reasonable, if it doesn't "know".
I think there's a parallel here for the internet as an i formation source. It delivered on "unlimited knowledge at the tip of everyone's fingertips" but lowering the bar also lowered the bar.
That access "works" only when the user is capable of doing their part too. Evaluating sources, integrating knowledge. Validating. Cross examining.
Now we are just more used to recognizing that accessibility comes with its own problem.
Some of this is down to general education. Some to domain expertize. Personality plays a big part.
The biggest factor is, i think, intelligence. There's a lot of 2nd and 3rd order thinking required to simultaneously entertain a curiosity, consider of how the LLM works, and exercise different levels of skepticism depending on the types of errors LLMs are likely to make.
Using LLMs correctly and incorrectly is.. subtle.
This is why simonw (The author) has his "pelican on a bike" -test, it's not 100% accurate but it is a good indicator.
I have a set of my own standard queries and problems (no counting characters or algebra crap) I feed to new LLMs I'm testing
None of the questions exist outside of my own Obsidian note so they can't be gamed by LLM authors. And I've tested multiple different LLMs using them so I have a "feeling" on what the answer should look like. And I personally know the correct answer so I can immediately validate them.
I can't think of any other tools like this. An LLM can multiply your efforts, but only if you were capable of doing it yourself. Wild.
When you have a lossy piece of media, such as a compressed sound or image file, you can always see the resemblance to the original and note the degradation as it happens. You never have a clear JPEG of a lamp, compress it, and get a clear image of the Milky Way, then reopen the image and get a clear image of a pile of dirt.
Furthermore, an encyclopaedia is something you can reference and learn from without a goal, it allows you to peruse information you have no concept of. Not so with LLMs, which you have to query to get an answer.
In compressed audio these can be things like clicks and boings and echoes and pre-echoes. In compressed images they can be ripply effects near edges, banding in smoothly varying regions, but there are also things like https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres... where one digit is replaced with a nice clean version of a different digit, which is pretty on-the-nose for the LLM failure mode you're talking about.
Compression artefacts generally affect small parts of the image or audio or video rather than replacing the whole thing -- but in the analogy, "the whole thing" is an encyclopaedia and the artefacts are affecting little bits of that.
Of course the analogy isn't exact. That would be why S.W. opens his post by saying "Since I love collecting questionable analogies for LLMs,".
I don’t think this is a great analogy.
Lossy compression of images or signals tends to throw out information based on how humans perceive it, focusing on the most important perceptual parts and discarding the less important parts. For example, JPEG essentially removes high frequency components from an image because more information is present with the low frequency parts. Similarly, POTS phone encoding and mp3 both compress audio signals based on how humans perceive audio frequency.
The perceived degradation of most lossy compression is gradual with the amount of compression and not typically what someone means when they say “make things up.”
LLM hallucinations aren’t gradual and the compression doesn’t seem to follow human perception.
> Of course the analogy isn't exact.
And I don’t expect it to be, which is something I’ve made clear several times before, including on this very thread.
https://news.ycombinator.com/item?id=45101679
Purely based on language use, you could expect "dog bit the man" more often than "man bit the dog", which is a lossy way to represent "dogs are more likely to bite people than vice versa." And there's also the second lossy part where information not occurring frequently enough in the training data will not survive training.
Of course, other things also include inaccurate information, frequent but otherwise useless sentences (any sentence with "Alice" and "Bob"), and the heavily pruned results of the post-training RL stage. So, you can't really separate the "encyclopedia" from the rest.
Also, not sure if lossy always means that loss is distributed (i.e., lower resolution). Loss can also be localized / biased (i.e., lose only black pixels), it's just that useful lossy compression algorithms tend to minimize the noticeable loss. Tho I could be wrong.
E.g. a Bloom filter also doesn't "know" what it knows.
I do understand and agree with a different point you’re making somewhere else in this thread, but it doesn’t seem related to what you’re saying here.
https://news.ycombinator.com/item?id=45101946
You're saying hammers shouldn't be squishy.
Simon is saying don't use a banana as a hammer.
No, that is not what I’m saying. My point is closer to “the words chosen to describe the made up concept do not translate to the idea being conveyed”. I tried to make that fit into your idea of the banana and squishy hammer, but now we’re several levels of abstraction deep using analogies to discuss analogies so it’s getting complicated to communicate clearly.
> Simon is saying don't use a banana as a hammer.
Which I agree with.
It’s a lot less visible and I guess dramatic than LLMs but it happens frequently enough that I feel like at every major event there are false conspiracies based on video « proofs » that are just encoding artifacts
(but it isn't and won't ever be an oracle and apparently that's a challenge for human psychology.)
But it falls a bit short in that encyclopedias, lossy or not, shouldn't affirmatively contain false information. The way I would picture a lossy encyclopedia is that it can misdirect by omission, but it would not change A to ¬A.
Maybe a truthy-roulette enclyclopedia?
Again, never really want a confidently-wrong encyclopedia, though
Oh but it's much worse than that: because most LLMs aren't deterministic in the way they operate [1], you can get a pristine image of a different pile of dirt every single time you ask.
[1] there are models where if you have the "model + prompt + seed" you're at least guaranteed to get the same output every single time. FWIW I use LLMs but I cannot integrate them in anything I produce when what they output ain't deterministic.
Computers are deterministic. Most of the time. If you really don't think about all the times they aren't. But if you leave the CPU-land and go out into the real world, you don't have the privilege of working with deterministic systems at all.
Engineering with LLMs is closer to "designing a robust industrial process that's going to be performed by unskilled minimum wage workers" than it is to "writing a software algorithm". It's still an engineering problem - but of the kind that requires an entirely different frame of mind to tackle.
That’s what I was trying to convey with the “then reopen the image” bit. But I chose a different image of a different thing rather than a different image of a similar thing.
My point is that I find the chosen term inadequate. The author made it up from combining two existing words, where one of them is a poor fit for what they’re aiming to convey.
You have a moral duty to keep your books, and keep your locally-stored information.
At least wikipedia has sources that probably support what it says and normally the quotes are real quotes. LLMs just seem to add quotation marks as, "proof" that its confident something is correct.
Dead Comment
If you want an LLM to be part of a tool that is intended to provide access to (presumably with some added value) encyclopedic information, it is best not to consider the LLM as providing any part of the encyclopedic information function of the system, but instead as providing part of the user interface of the system. The encyclopedic information should be provided by appropriate tooling that, at request by an appropriately prompted LLM or at direction of an orchestration layer with access to user requests (and both kinds of tooling might be used in the same system) provides relevant factual data which is inserted into the LLM’s context.
The correct modifier to insert into the sentence “An LLM is an encyclopedia” is “not”, not “lossy”.
On the other hand, biological neural networks are doing it all the time :) And there might well be an advantage to it (or a hybrid method), once we can make it more economical.
After all, the embedding vector space is shaped by the distribution of training data, and if you have out-of-distribution data coming in due to a new or changed environment, RAG using pre-trained models and their vector spaces will only go so far.
Deleted Comment
I prefer to think of LLMs as lossy predictors. If you think about it, natural "intelligence" itself can be understood as another type of predictor: you build a world model to anticipate what will happen next so you can plan your actions accordingly and survive.
In the real world, with countless fuzzy factors, no predictor can ever be perfectly lossless. The only real difference, for me, is that LLMs are lossier predictors than human minds (for now). That's all there is to it.
Whatever analogy you use, it comes down to the realization that there's always some lossiness involved, whether you frame it as an encyclopedia or not.
The intuitions that we've developed around previous interactions are very misleading when applied to LLMs. When interacting with a human, we're used to being able to ask a question about topic X in context Y and assume that if you can answer it we can rely on you to be able to talk about it in the very similar context Z.
But LLMs are bad at commutative facts; A=B and B=A can have different performance characteristics. Just because it can answer A=B does not mean it is good at answering B=A; you have to test them separately.
I've seen researchers who should really know better screw this up, rendering their methodology useless for the claim they're trying to validate. Our intuition for how humans do things can be very misleading when working with LLMs.
This drastically depends on the example. For average trivia questions, modern LLMs (even smaller, open ones) beat humans easily.
>I prefer to think of LLMs as lossy predictors.
I've started to call them the Great Filter.
In the latest issue of the comic book Lex Luthor attempts to exterminate humanity by hacking the LLM and having it inform humanity that they can hold their breath underwater for 17 hours.
The foundational conceit (if you will) of LLMs is that they build a semantic (world) model to 'make sense' of their training. However it is much more likely that they are simply building a syntactic model in response to the training. As far as I know there is no evidence of a semantic model emerging.
This is still entirely referential, but in a way that a human would see some relation to the actual thing, albeit in a somewhat weird and alien way.
Deleted Comment
Yes.
Human intelligence consists of three things.
First, groundedness: The ability to form a representation of the world and one’s place in it.
Second, a temporal-spatial sense: A subjective and bounded idea of self in objective space and time.
Third: A general predictive function which is capable of broad abstraction.
At its most basic level, this third element enables man to acquire, process, store, represent, and continually re-acquire knowledge which is external to that man's subjective existence. This is calculation in the strictest sense.
And it is the third element -- the strength, speed, and breadth of the predictive function -- which is synonymous with the word "intelligence." Higher animals have all three elements, but they're pretty hazy -- especially the third. And, in humans, short time horizons are synonymous with intellectual dullness.
All of this is to say that if you have a "prediction machine" you're 90% of the way to a true "intelligence machine." It also, I think, suggests routes that might lead to more robust AI in the future. (Ground the AI, give it a limited physical presence in time and space, match its clocks to the outside world.)
In fact the best compression algorithms and LLMs have in common that they work by predicting the next word. Compression algorithms take an extra step called entropy coding to encode the difference between the prediction and the actual data efficiently, and the better the prediction, the better the compression ratio.
What makes a LLM "lossy" is that you don't have the "encode the difference" step.
And yes, it means you can turn a LLM into a (lossless) compression algorithm, and I think a really good one in term of compression ratio on huge data sets. You can also turn a compression algorithm like gzip into a language model! A very terrible one, but the output is better than a random stream of bytes.
In my experience as a human, the more you know about a subject, or even the more you have simply seen content about it, the easier it is to ramble on about it convincingly. It's like a mirroring skill, and it does not actually mean you understand what you're saying.
LLMs seem to do the same thing, I think. At scale this is widely useful, though, I am not discounting it. Just think it's an order of magnitude below what's possible and all this talk of existing stream-of-consciousness-like LLMs creating AGI seems like a miss
We would have very different conversations if LLMs were things that merely exploded into a singular lossy-expanded version of Wikipedia, but where looking at the article for any topic X would give you the exact same article each time.
With GPT-5 I sometimes see it spot a question that needs clarifying in its thinking trace, then pick the most likely answer, then spit out an answer later that says "assuming you meant X ..." - I've even had it provide an answer in two sections for each branch of a clear ambiguity.
So there are improvements version to version - from both increases in raw model capabilities and better training methods being used.
It seems to me the more you can pin it to another data set, the better.
indeed, Ted's piece (ChatGPT Is a Blurry JPEG of the Web) is here:
https://archive.is/iHSdS