Readit News logoReadit News
tmnvdb commented on LLMs aren't world models   yosefk.com/blog/llms-aren... · Posted by u/ingve
yosefk · 22 days ago
A "world model" depends on the context which defines which world the problem is in. For chess, which moves are legal and needing to know where the pieces are to make legal moves are parts of the world model. For alpha blending, it being a mathematical operation and the visibility of a background given the transparency of the foreground are parts of the world model.

The examples are from all the major commercial American LLMs as listed in a sister comment.

You seem to conflate context windows with tracking chess pieces. The context windows are more than large enough to remember 10 moves. The model should either track the pieces, or mention that it would be playing blindfold chess absent a board to look at and it isn't good at this, so could you please list the position after every move to make it fair, or it doesn't know what it's doing; it's demonstrably the latter.

tmnvdb · 20 days ago
If you train an LLM on chess, it will learn that too. You don't need to explain the rules, just feed it chess games, and it will stop making illegal moves at some point. It is a clear example of an inferred world model from training.

https://arxiv.org/abs/2501.17186

PS "Major commercial American LLM" is not very meaningful, you could be using GPT4o with that description.

tmnvdb commented on GPT-5: "How many times does the letter b appear in blueberry?"   bsky.app/profile/kjhealy.... · Posted by u/minimaxir
andrewmcwatters · 23 days ago
These aren't wild assertions. I'm not using charged language.

> Reasoning and consciousness are seperate(sic) concepts

No, they're not. But, in tech, we seem to have a culture of severing the humanities for utilitarian purposes, but no, classical reasoning uses consciousness and awareness as elements of processing.

It's only meaningless if you don't know what the philosophical or epistemological definitions of reasoning are. Which is to say, you don't know what reasoning is. So you'd think it was a meaningless statement.

Do computers think, or do they compute?

Is that a meaningless question to you? I'm sure given your position it's irrelevant and meaningless, surely.

And this sort of thinking is why we have people claiming software can think and reason.

tmnvdb · 23 days ago
You have again answered with your customary condescension. Is that really necessary? Everything you write is just dripping with patronizing superiority and combatative sarcasm.

> "classical reasoning uses consciousness and awareness as elements of processing"

They are not the _same_ concept then.

> It's only meaningless if you don't know what the philosophical or epistemological definitions of reasoning are. Which is to say, you don't know what reasoning is. So you'd think it was a meaningless statement.

The problem is the only information we have is internal. So we may claim those things exist in us. But we have no way to establish if they are happening in another person, let alone in a computer.

> Do computers think, or do they compute?

Do humans think? How do you tell?

tmnvdb commented on GPT-5: "How many times does the letter b appear in blueberry?"   bsky.app/profile/kjhealy.... · Posted by u/minimaxir
exasperaited · 23 days ago
This is not a demonstration of a trick question.

This is a demonstration of a system that delusionally refuses to accept correction and correct its misunderstanding (which is a thing that is fundamental to their claim of intelligence through reasoning).

Why would anyone believe these things can reason, that they are heading towards AGI, when halfway through a dialogue where you're trying to tell it that it is wrong it doubles down with a dementia-addled explanation about the two bs giving the word that extra bounce?

It's genuinely like the way people with dementia sadly shore up their confabulations with phrases like "I'll never forget", "I'll always remember", etc. (Which is something that... no never mind)

> Even OSS 20b gets it right the first time, I think the author was just mistakenly routed to the dumbest model because it seemed like an easy unimportant question.

Why would you offer up an easy out for them like this? You're not the PR guy for the firm swimming in money paying million dollar bonuses off what increasingly looks, at a fundamental level, like castles in the sand. Why do the labour?

tmnvdb · 23 days ago
> This is not a demonstration of a trick question.

It's a question that purposefully uses a limitation of the system. There are many such questions for humans. They are called trick questions. It is not that crazy to call it a trick question.

> This is a demonstration of a system that delusionally refuses to accept correction and correct its misunderstanding (which is a thing that is fundamental to their claim of intelligence through reasoning).

First, the word 'delusional' is strange here unless you believe we are talking about a sentient system. Second, you are just plain wrong. LLMs are not "unable to accept correction" at all, in fact they often accept incorrect corrections (sycophanty). In this case the model is simply unable to understand the correction (because of the nature of the tokenizer) and it is therefore 'correct' behaviour for it to insist on it's incorrect answer.

> Why would anyone believe these things can reason, that they are heading towards AGI, when halfway through a dialogue where you're trying to tell it that it is wrong it doubles down with a dementia-addled explanation about the two bs giving the word that extra bounce?

People believe the models can reason because they produce output consistent with reasoning. (That is not to say they are flawless or we have AGI in our hands.) If you don't agree, provide a definition of reasoning that the model does not meet.

> Why would you offer up an easy out for them like this? You're not the PR guy for the firm swimming in money paying million dollar bonuses off what increasingly looks, at a fundamental level, like castles in the sand. Why do the labour?

This, like many of your other messages, is rather obnoxious and dripping with performative indignation while adding little in the way of substance.

tmnvdb commented on GPT-5: "How many times does the letter b appear in blueberry?"   bsky.app/profile/kjhealy.... · Posted by u/minimaxir
andrewmcwatters · 23 days ago
No, it's the entire architecture of the model. There's no real reasoning. It seems that reasoning is just a feedback loop on top of existing autocompletion.

It's really disingenuous for the industry to call warming tokens for output, "reasoning," as if some autocomplete before more autocomplete is all we needed to solve the issue of consciousness.

Edit: Letter frequency apparently has just become another scripted output, like doing arithmetic. LLMs don't have the ability to do this sort of work inherently, so they're trained to offload the task.

Edit: This comment appears to be wildly upvoted and downvoted. If you have anything to add besides reactionary voting, please contribute to the discussion.

tmnvdb · 23 days ago
> No, it's the entire architecture of the model.

Wrong, it's an artifact of tokenizing. The model doesn't have access to the individual letters, only to the tokens. Reasoning models can usually do this task well - they can spell out the word in the reasoning buffer - the fact that GPT5 fails here is likely a result of it incorrectly answering the question with a non-reasoning version of the model.

> There's no real reasoning.

This seems like a meaningless statement unless you give a clear definition of "real" reasoning as opposed to other kinds of reasoning that are only apparant.

> It seems that reasoning is just a feedback loop on top of existing autocompletion.

The word "just" is doing a lot of work here - what exactly is your criticism here? The bitter lesson of the past years is that relatively simple architectures that scale with compute work surprisingly well.

> It's really disingenuous for the industry to call warming tokens for output, "reasoning," as if some autocomplete before more autocomplete is all we needed to solve the issue of consciousness.

Reasoning and consciousness are seperate concepts. If I showed the output of an LLM 'reasoning' (you can call it something else if you like) to somebody 10 years ago they would agree without any doubt that reasoning was taking place there. You are free to provide a definition of reasoning which an LLM does not meet of course - but it is not enough to just say it is so. Using the word autocomplete is rather meaningless name-calling.

> Edit: Letter frequency apparently has just become another scripted output, like doing arithmetic. LLMs don't have the ability to do this sort of work inherently, so they're trained to offload the task.

Not sure why this is bad. The implicit assumption seems to be that an LLM is only valueable if it literally does everything perfectly?

> Edit: This comment appears to be wildly upvoted and downvoted. If you have anything to add besides reactionary voting, please contribute to the discussion.

Probably because of the wild assertions, charged language, and rather superficial descriptions of actual mechanics.

tmnvdb commented on Yet Another LLM Rant   overengineer.dev/txt/2025... · Posted by u/sohkamyung
Seb-C · 23 days ago
Hallucinations are not a bug or an exception, but a feature. Everything outputted by LLMs is 100% made-up, with a heavy bias towards what has been fed to it at first (human written content).

The fundamental reason why it cannot be fixed is because the model does not know anything about the reality, there is simply no such concept here.

To make a "probability cutoff" you first need a probability about what the reality/facts/truth is, and we have no such reliable and absolute data (and probably never will).

tmnvdb · 23 days ago
You use a lot of anthropomorphisms: doesn't "know" anything (does your hard drive know things? Is it relevant?), "making things up" is even more linked to continuous intent. Unless you believe the LLMs are sentient this is a strange choice of words.
tmnvdb commented on Yet Another LLM Rant   overengineer.dev/txt/2025... · Posted by u/sohkamyung
lazide · 23 days ago
It’s a tool that fundamentally can’t be used reliably without double checking everything it. That is rather different than you’re presenting it.
tmnvdb · 23 days ago
So similar to wikipedia

Deleted Comment

tmnvdb commented on Why Understanding Software Cycle Time Is Messy, Not Magic   arxiv.org/abs/2503.05040... · Posted by u/SiempreViernes
discreteevent · 3 months ago
How can something like " The Principles of Product Development Flow" be applied to software development when every item has a different size and quality than every other item?
tmnvdb · 3 months ago
PPDF is a great book but hard to apply. I recommend looking at some Kanban literature. Classic in this space is Actionable Agile Metrics for Predictability.
tmnvdb commented on Why Understanding Software Cycle Time Is Messy, Not Magic   arxiv.org/abs/2503.05040... · Posted by u/SiempreViernes
wry_durian · 3 months ago
Cycle time is imprtant, but three problems with it. First, it (like many other factors) is just a proxy variable in the total cost equation. Second, cycle time is a lagging indicator so it gives you limited foresight into the systemic control levers at your disposal. And third, queue size plays a larger causal role in downstream economic problems with products. This is why you should always consider your queue size before your cycle time.

I didn't see these talked about much in the paper at a glance. Highly recommend Reinertsen's The Principles of Product Development Flow here instead.

tmnvdb · 3 months ago
It is precisely to reduce cycle time that we control queue size. It's also not entirely true that cycle time is purely lagging. Every day an item ages in your queue, you know the cycle time had increased by one day. Hence the advice to track item age to control cycle time.
tmnvdb commented on Why Understanding Software Cycle Time Is Messy, Not Magic   arxiv.org/abs/2503.05040... · Posted by u/SiempreViernes
jyounker · 3 months ago
Because "GUI" and "Algorithm" are too big. You have to further decompose into small tasks which you can actually estimate. An estimable composition for a GUI task might be something like:

* Research scrollbar implementation options. (note, time box to x hours).

* Determine number of lines in document.

* Add scrollbar to primary pane. * Determine number of lines presentable based on current window size.

* Determine number of lines in document currently visible.

* Hide scrollbar when number of displayed lines < document size.

* Verify behavior when we reach reach the end of the document.

* Verify behavior when we scroll to the top.

When you decompose a task it's also important to figure out which steps you don't understand well enough to estimate. The unpredictability of these steps are what blows your estimation, and the more of these are in your estimate, the less reliable your estimate will be.

If it's really important to produce an accurate estimate, then you have to figure out the details of these unknowns before you begin the project.

tmnvdb · 3 months ago
If it's really important to have an accurate estimate for a large work package you are in trouble, there is no such thing.

u/tmnvdb

KarmaCake day427February 15, 2022View Original