dcre (u/dcre) - Readit News

dcre commented on AGI is an engineering problem, not a model training problem vincirufus.com/posts/agi-... · Posted by u/vincirufus

dcre · 9 hours ago

The first premise of the argument is that LLMs are plateauing in capability and this is obvious from using them. It is not obvious to me.

dcre commented on From GPT-4 to GPT-5: Measuring progress through MedHELM [pdf] fertrevino.com/docs/gpt5_... · Posted by u/fertrevino

brendoelfrendo · 2 days ago

Bad news: it doesn't seem to work as well as you might think: https://arxiv.org/pdf/2508.01191

As one might expect, because the AI isn't actually thinking, it's just spending more tokens on the problem. This sometimes leads to the desired outcome but the phenomenon is very brittle and disappears when the AI is pushed outside the bounds of its training.

To quote their discussion, "CoT is not a mechanism for genuine logical inference but rather a sophisticated form of structured pattern matching, fundamentally bounded by the data distribution seen during training. When pushed even slightly beyond this distribution, its performance degrades significantly, exposing the superficial nature of the “reasoning” it produces."

dcre · 2 days ago

The other commenter is more articulate, but you simply cannot draw the conclusion from this paper that reasoning models don't work well. They trained tiny little models and showed they don't work. Big surprise! Meanwhile every other piece of evidence available shows that reasoning models are more reliable at sophisticated problems. Just a few examples.

- https://arcprize.org/leaderboard

- https://aider.chat/docs/leaderboards/

- https://arstechnica.com/ai/2025/07/google-deepmind-earns-gol...

Surely the IMO problems weren't "within the bounds" of Gemini's training data.

dcre commented on From GPT-4 to GPT-5: Measuring progress through MedHELM [pdf] fertrevino.com/docs/gpt5_... · Posted by u/fertrevino

aprilthird2021 · 2 days ago

> Of course nobody knows quite why directing more computation in this way makes them better, and nobody seems to take the reasoning trace too seriously as a record of what is happening. But it is clear that it works!

One thing it's hard to wrap my head around is that we are giving more and more trust to something we don't understand with the assumption (often unchecked) that it just works. Basically your refrain is used to justify all sorts of odd setup of AIs, agents, etc.

dcre · 2 days ago

Trusting things to work based on practical experience and without formal verification is the norm rather than the exception. In formal contexts like software development people have the means to evaluate and use good judgment.

I am much more worried about the problem where LLMs are actively misleading low-info users into thinking they’re people, especially children and old people.

dcre commented on From GPT-4 to GPT-5: Measuring progress through MedHELM [pdf] fertrevino.com/docs/gpt5_... · Posted by u/fertrevino

ares623 · 2 days ago

Sorry, not directed at you specifically. But every time I see questions like this I can’t help but rephrase in my head:

“Did you try running it over and over until you got the results you wanted?”

dcre · 2 days ago

This is not a good analogy because reasoning models are not choosing the best from a set of attempts based on knowledge of the correct answer. It really is more like what it sounds like: “did you think about it longer until you ruled out various doubts and became more confident?” Of course nobody knows quite why directing more computation in this way makes them better, and nobody seems to take the reasoning trace too seriously as a record of what is happening. But it is clear that it works!

dcre commented on Sequoia backs Zed zed.dev/blog/sequoia-back... · Posted by u/vquemener

zeddyyrud · 3 days ago

> Having said that, I don't think an editor should be VC backed.

It just means that we will now see AI crapola being stuffed into this editor in the next few days.

dcre · 3 days ago

Have you heard of Zed before just now?

dcre commented on Is chain-of-thought AI reasoning a mirage? seangoedecke.com/real-rea... · Posted by u/ingve

HarHarVeryFunny · 9 days ago

Surely we do know why - reinforcement learning for reasoning. These systems are trained to generate reasoning steps that led to verified correct conclusions during training. No guarantees how they'll perform on different problems of course, but in relatively narrow closed domains like math and programming, it doesn't seem surprising that when done at scale there are similar enough problems where similar reasoning logic will apply, and it will be successful.

dcre · 5 days ago

We don't know why that is sufficient to enable the models to develop the capability, and we don't know what they are actually doing under the hood when they employ the capability.

dcre commented on Apple and Amazon will miss AI like Intel missed mobile gmays.com/the-biggest-bet... · Posted by u/gmays

grishka · 6 days ago

Except it was always clear that smartphones are here to stay, while AI is an unsustainable fad that will go out of fashion, the sooner the better.

dcre · 6 days ago

Totally unwarranted level of confidence here about both claims.

dcre commented on Is chain-of-thought AI reasoning a mirage? seangoedecke.com/real-rea... · Posted by u/ingve

safety1st · 10 days ago

I'm pretty much a layperson in this field, but I don't understand why we're trying to teach a stochastic text transformer to reason. Why would anyone expect that approach to work?

I would have thought the more obvious approach would be to couple it to some kind of symbolic logic engine. It might transform plain language statements into fragments conforming to a syntax which that engine could then parse deterministically. This is the Platonic ideal of reasoning that the author of the post pooh-poohs, I guess, but it seems to me to be the whole point of reasoning; reasoning is the application of logic in evaluating a proposition. The LLM might be trained to generate elements of the proposition, but it's too random to apply logic.

dcre · 9 days ago

It is sort of amazing that it works, and no one knows why, but empirically speaking it is undeniable that it does work. The IMO result was achieved without any tool calls to a formal proof system. I agree that is a much more plausible-sounding approach.

https://arstechnica.com/ai/2025/07/google-deepmind-earns-gol...

dcre commented on Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens arstechnica.com/ai/2025/0... · Posted by u/blueridge

mvieira38 · 12 days ago

You're saying they don't use math textbooks and math forums to train LLMs, then?

dcre · 12 days ago

The problems are not in textbooks. I’m curious what would count as an out of distribution problem for you. Only problems no one knows how to solve?

dcre commented on Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens arstechnica.com/ai/2025/0... · Posted by u/blueridge

loosetypes · 12 days ago

Mind linking any examples (or categories) of problems that are definitively not in pre training data but can still be solved by LLMs? Preferably something factual rather than creative, genuinely curious.

Dumb question but anything like this that’s written about on the internet will ultimately end up as training fodder, no?

dcre · 12 days ago

How about the International Math Olympiad?

https://arstechnica.com/ai/2025/07/google-deepmind-earns-gol...