This is more interesting and deserves better discussion than we got from the previous title, which was derailed by the "AGI" bit, so I replaced the title with a representative sentence from the video.
(Edit: plus a question mark, as we sometimes do with contentious titles.)
How different are world models from LLMs? I'm not in the AI space but follow it here. I always assumed they belonged to the same "family" of tech and were more similar than different.
But are they sufficiently different that stalling progress in one doesn't imply stalling progress in the other?
Depends if you’re asking about real world models or synthetic AI world models.
One of them only exists in species with a long evolutionary history of survivorship (and death) over generations living in the world being modeled.
There’s a sense of “what it’s like to be” a thing. That’s still a big question mark in my mind, whether AI will ever have any sense of what it’s like to be human, any more than humans know what it’s like to be a bat or a dolphin.
You know what it’s like for the cool breeze to blow across your face on a nice day. You could try explaining that to a dolphin, assuming we can communicate one day, but they won’t know what it’s like from any amount of words. That seems like something in the area of neuralink or similar.
The world models are not really useful yet. So they are starting lower, compared to LLM. So they probably have some decent gains to make still, before it gets really hard (diminishing returns).
On the one hand, that isn't necessarily a problem. It can be just a useful algorithm for tool calling or whatever.
On the other hand, if you're telling your investors that AGI is about two years away, then you can only do that for a few years. Rumor has it that such claims were made? Hopefully no big investors actually believed that.
The real question to be asking is, based on current applications of LLMs, can one pay for the hardware to sustain it? The comparison to smartphones is apt; by the time we got to the "Samsung Galaxy" phase, where only incremental improvements were coming, the industry was making a profit on each phone sold. Are any of the big LLMs actually profitable yet? And if they are, do they have any way to keep the DeepSeeks of the world from taking it away?
What happens if you built your business on a service that turns out to be hugely expensive to run and not profitable?
>On the other hand, if you're telling your investors that AGI is about two years away, then you can only do that for a few years.
Musk has been doing this with autonomous driving since 2015. Machine learning has enough hype surrounding it that you have to embellish to keep up with every other company's ridiculous claims.
I doubt this was the main driver for the investors. People were buying Tesla even without it.
Whether there is hype or not, the laws of money remain the same. If you invest and don’t get expected returns, you will be eventually concerned and will do something about it.
Lying to investors is illegal, and investors have incentive and means to sue if they think they were defrauded. The problem is proving it. I'm sure a lot of founders genuinely believe AGI is about to appear out of thin air, so they're technically not lying. Even the cynical ones who say whatever they think investors want to hear are hard to catch in a lie. It's not really about being rich and powerful. That's just the unfortunate reality of rhetoric.
Predictions about the future and puffery are not illegal. Lying about facts are. Nobody knows how far away AGI is, everyone just has their own predictions.
In addition to the other comments/answers to this, I would like to add that if you lie to your investors (in public), and they suspect you're lying but also think it will allow you to cash out before the lie becomes apparent, they may not care, especially if the lie is difficult to distinguish from pathological levels of optimism.
It's not a crime to be wrong; it's only a crime to deliberately lie. And unless there's an email saying "haha we're lying to our investors", it's just not easy to prove.
I mean there are different definitions on what to call an AGI. Most of the time people don't specify which one they use.
For me an AGI would mean truly at least human level as in "this clearly has a consciousness paired with knowledge", a.k.a. a person. In that case, what do the investors expect? Some sort of slave market of virtual people to exploit?
OpenAI defines AGI as a "highly autonomous system that outperforms humans at most economically valuable work" [0]. It may not be the most satisfying definition, but it is practical and a good goal to aim for if you are an AI company.
My personal definition is "The ability to form models from observations and extrapolate from them."
LLMs are great at forming models of language from observations of language and extrapolating language constructs from them. But to get general intelligence we're going to have to let an AI build their models from direct measurements of reality.
They really aren't even great at forming models of language. They are a single model of language. They don't build models, much less use those models. See, for example, ARC-AGI 1 and 2. They only performed ARC 1 decently [0] with additional training, and are failing miserably on ARC 2. That's not even getting to ARC 3.
> Note on "tuned": OpenAI shared they trained the o3 we tested on 75% of the Public Training set. They have not shared more details. We have not yet tested the ARC-untrained model to understand how much of the performance is due to ARC-AGI data.
... Clearly not able to reason about the problems without additional training. And no indication that the additional training didn't include some feature extraction, scaffolding, RLHF, etc created by human intelligence. Impressive that fine tuning can get >85%, but it's still additional human directed training and not self contained intelligence at the level of performance reported. The blog was very generous making the undefined "fine tuning" a footnote and praising the results as if they were directly from the model that would have cost > $65,000 to run.
Edit: to be clear, I understand LLMs are a huge leap forward in AI research and possibly the first models that can provide useful results across multiple domains without being retrained. But they're still not creating their own models, even of language.
Med-Gemini is clearly intelligent, but equally clearly it is an inhuman intelligence with different failure modes from human intelligence.
If we say Med-Gemini is not intelligent, we will end up having to concede that actually it is intelligent. And the danger of this concession is that we will under-estimate how different it is from human intelligence and then get caught out by inhuman failures.
(Edit: plus a question mark, as we sometimes do with contentious titles.)
Maybe for LLMs but they are not the only possible algorithm. Only this week we had Genie 3 as in:
>The Surprising Leap in AI: How Genie 3’s World Model Redefines Synthetic Reality https://www.msn.com/en-us/news/technology/the-surprising-lea...
and:
>DeepMind thinks its new Genie 3 world model presents a stepping stone toward AGI https://techcrunch.com/2025/08/05/deepmind-thinks-genie-3-wo...
But are they sufficiently different that stalling progress in one doesn't imply stalling progress in the other?
Depends if you’re asking about real world models or synthetic AI world models.
One of them only exists in species with a long evolutionary history of survivorship (and death) over generations living in the world being modeled.
There’s a sense of “what it’s like to be” a thing. That’s still a big question mark in my mind, whether AI will ever have any sense of what it’s like to be human, any more than humans know what it’s like to be a bat or a dolphin.
You know what it’s like for the cool breeze to blow across your face on a nice day. You could try explaining that to a dolphin, assuming we can communicate one day, but they won’t know what it’s like from any amount of words. That seems like something in the area of neuralink or similar.
>It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model.
my point is more people can try different models and algorithms rather than having to stick to LLMs.
On the other hand, if you're telling your investors that AGI is about two years away, then you can only do that for a few years. Rumor has it that such claims were made? Hopefully no big investors actually believed that.
The real question to be asking is, based on current applications of LLMs, can one pay for the hardware to sustain it? The comparison to smartphones is apt; by the time we got to the "Samsung Galaxy" phase, where only incremental improvements were coming, the industry was making a profit on each phone sold. Are any of the big LLMs actually profitable yet? And if they are, do they have any way to keep the DeepSeeks of the world from taking it away?
What happens if you built your business on a service that turns out to be hugely expensive to run and not profitable?
Musk has been doing this with autonomous driving since 2015. Machine learning has enough hype surrounding it that you have to embellish to keep up with every other company's ridiculous claims.
Whether there is hype or not, the laws of money remain the same. If you invest and don’t get expected returns, you will be eventually concerned and will do something about it.
Deleted Comment
Deleted Comment
For me an AGI would mean truly at least human level as in "this clearly has a consciousness paired with knowledge", a.k.a. a person. In that case, what do the investors expect? Some sort of slave market of virtual people to exploit?
How to find out if something has probably consciousness? Much less clearly? What is consciousness?
[0] https://openai.com/our-structure/
LLMs are great at forming models of language from observations of language and extrapolating language constructs from them. But to get general intelligence we're going to have to let an AI build their models from direct measurements of reality.
They really aren't even great at forming models of language. They are a single model of language. They don't build models, much less use those models. See, for example, ARC-AGI 1 and 2. They only performed ARC 1 decently [0] with additional training, and are failing miserably on ARC 2. That's not even getting to ARC 3.
[0] https://arcprize.org/blog/oai-o3-pub-breakthrough
> Note on "tuned": OpenAI shared they trained the o3 we tested on 75% of the Public Training set. They have not shared more details. We have not yet tested the ARC-untrained model to understand how much of the performance is due to ARC-AGI data.
... Clearly not able to reason about the problems without additional training. And no indication that the additional training didn't include some feature extraction, scaffolding, RLHF, etc created by human intelligence. Impressive that fine tuning can get >85%, but it's still additional human directed training and not self contained intelligence at the level of performance reported. The blog was very generous making the undefined "fine tuning" a footnote and praising the results as if they were directly from the model that would have cost > $65,000 to run.
Edit: to be clear, I understand LLMs are a huge leap forward in AI research and possibly the first models that can provide useful results across multiple domains without being retrained. But they're still not creating their own models, even of language.
Think about this story https://news.ycombinator.com/item?id=44845442
Med-Gemini is clearly intelligent, but equally clearly it is an inhuman intelligence with different failure modes from human intelligence.
If we say Med-Gemini is not intelligent, we will end up having to concede that actually it is intelligent. And the danger of this concession is that we will under-estimate how different it is from human intelligence and then get caught out by inhuman failures.
I guess when it comes to the definition of intelligence, just like porn, different people have different levels of tolerance.
Deleted Comment
I believe that’s Eliezer Yudkowsky’s definition.