If you mean pure as in there’s not additional training beyond the pretraining, I don’t think any model has been pure since gpt-3.5.
If you mean pure as in there’s not additional training beyond the pretraining, I don’t think any model has been pure since gpt-3.5.
This seems to be one of the brutal truths of the modern world, and as far as I can tell it applies to everything. There's always a race to the bottom to make everything as cheaply as possible, and the further the industry goes down that "cheapness" scale, the more "quality" loses market share, the more expensive "quality" must be in order to operate at all, and finally things that used to be just "normal" and not too expensive are now luxury goods.
Consider textiles, carpentry, masonry, machine tooling, appliances, etc. etc.
This doesn't feel like a good outcome, but I'm not sure there's anything that can be done about it.
Instead of broad employment of artisan breadsmiths, we have people doing email work, because it’s more economically valuable. If the government mandated a higher quality of bread, we’d be slightly richer and bread and slightly poorer in everything else.
He was extremely kind, gave me a lot of interesting life advice. I remember him saying that he got most of his ideas just from playing around with mechanics and experimenting a lot, he was never really one to get grand visions.
Anyways, great fellow, glad he opened source V (as he called it).
Dead Comment
I listened to Lex Friedman for a long time, and there was a lot of critiques of him (Lex) as an interviewer, but since the guests were amazing, I never really cared.
But after listening to Dwarkesh, my eyes are opened (or maybe my soul). It doesn't matter I've heard of not-many of his guests, because he knows exactly the right questions to ask. He seems to have genuine curiosity for what the guest is saying, and will push back if something doesn't make sense to him. Very much recommend.
Gemini 2.5 Pro got 72.9%
o3 high gets 81.3%, o4-mini high gets 68.9%
1. Here is evaluation of my recent predictions: https://garymarcus.substack.com/p/25-ai-predictions-for-2025...
2. Here is annotated evaluation, slightly dated, considering almost line by line, of the original Deep Learning is Hitting a Wall paper: https://garymarcus.substack.com/p/two-years-later-deep-learn...
Ask yourself how much has really changed in the intervening year?
I know you as like the #1 AI skeptic (no offense), but like when I see points like "16. Less than 10% of the work force will be replaced by AI. Probably less than 5%.", that's something that seems OPTIMISTIC about AI capabilities to me. 5% of all jobs being automated would be HUGE, and it's something that we're up in the air about.
Same with "AI “Agents” will be endlessly hyped throughout 2025 but far from reliable, except possibly in very narrow use cases." - even the very existence of agents who are reliable in very narrow use cases is crazy impressive! When I was in college 5 years ago for Computer Science, this would sound like something that would take a decade of work for one giant tech conglomerate for ONE agentic task. Now its like a year off for one less giant tech conglomerate, for many possible agentic tasks.
So I guess it's just a matter of perspective of how impressive you see or don't see these advances.
I will say, I do disagree with your comment sentiment right here where you say "Ask yourself how much has really changed in the intervening year?".
I think the o1 paradigm has been crazy impressive. There was much debate over whether scaling up models would be enough. But now we have an entirely new system which has unlocked crazy reasoning capabilities.
Reason? Maybe. But there's one limitation that we currently have no idea how to overcome; LLMs don't know how much they know. If they tell you they don't something it may be a lie. If they tell you they do, this may be a lie too. I, a human, certainly know what I know and what I don't and can recall from where I know the information
> Mathematicians used to comb through model solutions because earlier systems would quietly flip an inequality or tuck in a wrong step, creating hallucinated answers.
> Brown says the updated IMO reasoning model now tends to say “I’m not sure” whenever it lacks a valid proof, which sharply cuts down on those hidden errors.
> TLDR, the model shows a clear shift away from hallucinations and toward reliable, self‑aware reasoning.
Source: https://x.com/chatgpt21/status/1950606890758476264