Readit News logoReadit News
syllogism commented on Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens   arstechnica.com/ai/2025/0... · Posted by u/blueridge
skywhopper · 15 days ago
So, you agree with the point that they’re making and you’re mad about it? It’s important to state that the models aren’t doing real reasoning because they are being marketed and sold as if they are.

As for your question: ‘So what does "sophisticated simulators of reasoning-like text" even mean here?’

It means CoT interstitial “reasoning” steps produce text that looks like reasoning, but is just a rough approximation, given that the reasoning often doesn’t line up with the conclusion, or the priors, or reality.

syllogism · 15 days ago
What is "real reasoning"? The mechanism that the models use is well described. They do what they do. What is this article's complaint?
syllogism commented on Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens   arstechnica.com/ai/2025/0... · Posted by u/blueridge
syllogism · 15 days ago
It's interesting that there's still such a market for this sort of take.

> In a recent pre-print paper, researchers from the University of Arizona summarize this existing work as "suggest[ing] that LLMs are not principled reasoners but rather sophisticated simulators of reasoning-like text."

What does this even mean? Let's veto the word "reasoning" here and reflect.

The LLM produces a series of outputs. Each output changes the likelihood of the next output. So it's transitioning in a very large state space.

Assume there exists some states that the activations could be in that would cause the correct output to be generated. Assume also that there is some possible path of text connecting the original input to such a success state.

The reinforcement learning objective reinforces pathways that were successful during training. If there's some intermediate calculation to do or 'inference' that could be drawn, writing out a new text that makes that explicit might be a useful step. The reinforcement learning objective is supposed to encourage the model to learn such patterns.

So what does "sophisticated simulators of reasoning-like text" even mean here? The mechanism that the model uses to transition towards the answer is to generate intermediate text. What's the complaint here?

It makes the same sort of sense to talk about the model "reasoning" as it does to talk about AlphaZero "valuing material" or "fighting for the center". These are shorthands for describing patterns of behaviour, but of course the model doesn't "value" anything in a strictly human way. The chess engine usually doesn't see a full line to victory, but in the games it's played, paths which transition through states with material advantage are often good -- although it depends on other factors.

So of course the chain-of-thought transition process is brittle, and it's brittle in ways that don't match human mistakes. What does it prove that there are counter-examples with irrelevant text interposed that cause the model to produce the wrong output? It shows nothing --- it's a probabilistic process. Of course some different inputs lead to different paths being taken, which may be less successful.

syllogism commented on GCP Outage   status.cloud.google.com/... · Posted by u/thanhhaimai
__turbobrew__ · 2 months ago
The trick there is you take the relevant configs and serialize them to disk periodically, and then in a bootstrap scenario you use the configs on disk.

Presumably for the infrequently read configs you could do this so the service with frequently read configs can bootstrap without the service for infrequently read configs.

syllogism · 2 months ago
Like a backup generator for inputs. Makes sense.
syllogism commented on Why agents are bad pair programmers   justin.searls.co/posts/wh... · Posted by u/sh_tomer
syllogism · 3 months ago
LLM agents are very hard to talk about because they're not any one thing. Your action-space in what you say and what approach you take varies enormously and we have very little body of common knowledge about what other people are doing and how they're doing it. Then the agent changes underneath you or you tweak your prompt and it's different again.

In my last few sessions I saw the efficacy of Claude Code plummet on the problem I was working on. I have no idea whether it was just the particular task, a modelling change, or changes I made to the prompt. But suddenly it was glazing every message ("you're absolutely right"), confidently telling me up is down (saying things like "tests now pass" when they completely didn't), it even cheerfully suggested "rm db.sqlite", which would have set me back a fair bit if I said yes.

The fact that the LLM agent can churn out a lot of stuff quickly greatly increases 'skill expression' though. The sharper your insight about the task, the more you can direct it to do something specific.

For instance, most debugging is basically a binary search across the set of processes being conducted. However, the tricky thing is that the optimal search procedure is going to be weighted by the probability of the problem occurring at the different steps, and the expense of conducting different probes.

A common trap when debugging is to take an overly greedy approach. Due to the availability heuristic, our hypotheses about the problem are often too specific. And the more specific the hypothesis, the easier it is to think of a probe that would eliminate it. If you keep doing this you're basically playing Guess Who by asking "Is it Paul? Is it Anne?" etc, instead of "Is the person a man? Does the person have facial hair? etc"

I find LLM agents extremely helpful at forming efficient probes of parts of the stack I'm less fluent in. If I need to know whether the service is able to contact the database, asking the LLM agent to write out the necessary cloud commands is much faster than getting that from the docs. It's also much faster at writing specific tests than I would be. This means I can much more neutrally think about how to bisect the space, which makes debugging time more uniform, which in itself is a significant net win.

I also find LLM agents to be good at the 'eat your vegetables' stuff -- the things I know I should do but would economise on to save time. Populate the tests with more cases, write more tests in general, write more docs as I go, add more output to the scripts, etc.

syllogism commented on My experiment living in a tent in Hong Kong's jungle   corentin.trebaol.com/Blog... · Posted by u/5mv2
yusina · 3 months ago
That was surely a great experiment. But it's very different from actual homelessness. I would have appreciated if the author had acknowledged that more. It's closer to a backpacker-in-a-tent-in-the-mountains experience than homelessness. In the latter, the living-in-a-tent is just a comparatively minor aspect of the experience.

This was a choice (essentially to save money) and the author had multiple fallback plans. Real homelessness is born out of desperation and lack of alternatives. Tragedies of mental health issues, abuse, severe financial distress, no savings, debt, warrants. No nice shower at the gym, no locker to keep a laptop and two suits. The constant fear of not just the police but also of getting robbed by another homeless, likely after something to sell for drugs. That's very different from anytime being able to crash on somebody's sofa to save on rent so you can earlier "afford to build companies".

We can even see it in one of the later paragraphs where potential spots in the bay area are evaluated. The local homeless should not be close. Oh, they shouldn't? That gives you an idea of the conditions actual homeless folks need to live under.

syllogism · 3 months ago
Homelessness is a somewhat broad category though. There's lots of people couch-surfing between friends and their car. They're also in a very different position from people who are sleeping rough.
syllogism commented on Retailers will soon have only about 7 weeks of full inventories left   fortune.com/article/retai... · Posted by u/andrewfromx
anigbrowl · 4 months ago
This is a general cultural problem with liberalism at present. My social media timelines are absolutely full of Serious People analyzing how we got here and situating our present condition in historical and theoretical context. And they're mostly right! but what's lacking is any discussion of what to do about it. Even advocacy for legislative remedies or mass strikes are mostly dismissed in favor of throwing up hands and waiting for the midterm elections, as if the outcome were assured and a repeat of January 6 2021 were unthinkable. I can only conclude that a large part of the populace either can't believe what's happening or can't comprehend the implications.
syllogism · 4 months ago
Carville (DNC strategist) is advocating a "play dead" strategy. Let Trump implode so that he owns the inevitable failure. His base will desperately want to blame the left for not letting the policies work as intended. The less the Democrats do, the harder that is. I think a lot of Democrat politicians are going this way, and it's why Schumer rolled over on the budget.

Part of the logic here is that Trump is indeed different from other authoritarians. He's even less competent. He's blowing all his political capital on imploding the economy. He also can't understand the legal battles, so when Stephen Miller tells him they won the Supreme Court case 9-0, he believes him. This seems to have been a big wake-up call to Gorsuch, Coney-Barrett and Kavanaugh. The administration has shown its hand much too quickly, before it fully consolidated its power.

What the Democrats should be doing already is campaigning more. Run ads that are literally just Trump quotes. Show people Trump calling January the "Trump economy" before inauguration, then calling April the "Biden economy" now that he's crashed it. If Trump polls low enough, more senators will jump ship, and impeachment could be possible.

syllogism commented on Gukesh becomes the youngest chess world champion in history   lichess.org/@/Lichess/blo... · Posted by u/alexmolas
bluecalm · 8 months ago
Magnus was just being honest man. You seem to succumb to a common thinking fallacy that people who express criticism or negative opinions about something or someone must be jealous, conceited or just negative people in general. Meanwhile he was just expressing what is obvious to any strong player: Ding's level of play was subpar coming to the match, a league below elite at least. His play during the match was way below his peak level as well. Ding made 3 amateur player level blunders (hanging a bishop, missing basic tactics and missing transposition to a basic lost pawn endgame) in this match. Carlsen himself made 0 of those during 5 matches. Among his opponents Anand made 0 blunders of this caliber, Karjakin made 0, Caruana made 0 and Nepo made 2 (or 3 if you count the last game in which he was already playing for nothing as the match was decided)

Gukesh underperformed massively in comparison to his recent level (at the Candidates and the Olympiad). I am guessing due to nerves. That made the match closer than it should be. At the end of the day much better player won but it was way closer than it would normally be.

syllogism · 8 months ago
It wasn't even so much the blunders as the strategic decisions I think. Like, a blunder isn't in itself "baffling".
syllogism commented on The Ultimate Guide to Error Handling in Python   blog.miguelgrinberg.com/p... · Posted by u/kdamica
mewpmewp2 · a year ago
But if you do the "in" syntax with some_function() then you would need to assign the value before anyway or have to call the function twice?

  if some_function() not in table:
          return default_value
      value = table[some_function()]

syllogism · a year ago
You could do something like `value = table.get(some_function(), MISSING_VALUE)` and then have the conditional. But let's say for the sake of argument, yeah you need to assign the value up-front.

Let's say you're looking at some code like this:

    if value in table:
        ...
If you need to change this so that it's `some_function(value)`, you're not going to miss the fact that you have a decision to make here: you can either assign a new variable, or you can call the function twice, or you can use the '.get()' approach.

If you instead have:

    try:
        return table[value]
    except KeyError:
        ...
You now have to consciously avoid writing the incorrect code `try: return table[some_function(value)]`. It's very easy to change the code in the 'try' block so that you introduce an unintended way to end up in the 'except' block.

syllogism commented on The Ultimate Guide to Error Handling in Python   blog.miguelgrinberg.com/p... · Posted by u/kdamica
mewpmewp2 · a year ago
I would say that it's almost good if it wasn't for the Error word in KeyError.

If it was something like except KeyDoesNotExist: or KeyNotFound, it would make more sense for me, because it seems hacky to consider it an error where it's a normal default to some value behaviour.

syllogism · a year ago
I still think using the exception mechanism for things that could just be conditionals is bad. I elaborated on this to a sibling comment.
syllogism commented on The Ultimate Guide to Error Handling in Python   blog.miguelgrinberg.com/p... · Posted by u/kdamica
stevesimmons · a year ago
What's really wrong with try/except here other than it's not to your personal taste?

Brett Cannon, one of the Python core devs, wrote a blog post using exactly this dict KeyError example in 2016 [1]. It concludes:

"The key takeaway you should have for the post is that EAFP is a legitimate coding style in Python and you should feel free to use it when it makes sense."

[1] https://devblogs.microsoft.com/python/idiomatic-python-eafp-...

syllogism · a year ago
Yes there's been various recommendations of this over the years and I think it's really bad.

Using try/except for conditional logic gives the developer a spurious choice between two different syntaxes to express the same thing. The reader is then asked to read try/except blocks as meaning two different things: either ordinary expected branching in the function, or handling exceptions.

I think it's definitely better if we just use conditionals to express conditionals, and try/except for errors, like every other language does. Here's some examples of where this duplication of syntax causes problems.

* Exceptions are often not designed to match the interface well enough to make this convenient. For instance, 'x in y' works for both mapping types and lists, but only mapping types will raise a `KeyError`. If your function is expected to take any iterable, the correct catching code will be `except (KeyError, IndexError)`. There's all sorts of these opportunities to be wrong. When people write exceptions, they want to make them specific, and they're not necessarily thinking about them as an interface to conveniently check preconditions.

* Exceptions are not a type-checked part of the interface. If you catch `(KeyError, IndexError)` for a variable that's just a dictionary, no type checker (or even linter?) is going to tell you that the `IndexError` is impossible, and you only need to catch `KeyError`. Similarly, if you catch the wrong error, or your class raises an error that doesn't inherit from the class that your calling code expects it to, you won't get any type errors or other linting. It's totally on you to maintain this.

* Exceptions are often poorly documented, and change more frequently than other parts of the interface. A third-party library won't necessarily consider it a breaking change to raise an error on a new condition with an existing error type, but if you're conditioning on that error in a try/except, this could be a breaking change for you.

* The base exceptions are very general, and catching them in code that should be a conditional runs a significant risk of catching an error by mistake. Consider code like this:

    try:
        value = my_dict[some_function()]
    except KeyError:
        ...
This code is very seriously incorrect: you have no way of knowing whether 'some_function()' contains a bug that raises a KeyError. It's often very annoying to debug this sort of thing.

Because you must never ever ever call a function inside your conditional try block, you're using a syntax that doesn't compose properly with the rest of the language. So you can either rewrite it to something like this:

    value = some_function()
    try:
        return my_dict[value]
    except KeyError:
        ...
Or you can use the conditional version (`if my_dict[some_function()]`) just for these sorts of use-cases. But now you have both versions in your codebase, and you have to think about why this one is correct here and not the other.

The fundamental thing here is that 'try/except' is a "come from": whether you enter the 'except' block depends on which situations the function (or, gulp, functions) you're calling raise that error. The decision isn't local to the code you're looking at. In contrast, if you write a conditional, you have some local value and you're going to branch based on its truthiness or some property of it. We should only be using the 'try/except' mechanism when we _need_ its vagueness --- when we need to say "I don't know or can't check exactly what could lead to this". If we have a choice to tighten the control flow of the program we should.

And what do you buy for all this spurious decision making and the very high risk of very serious bugs anyway? Why should Python do this differently from every other language? I don't see any benefits in that article linked, and I've never seen any in other discussions of this topic.

u/syllogism

KarmaCake day3400October 18, 2010
About
NLP researcher, recently switched to independent developer.

http://honnibal.github.io/spaCy http://honnibal.wordpress.com http://scholar.google.com.au/citations?user=FXwlnmAAAAAJ&hl=en http://github.com/honnibal/ http://github.com/syllog1sm/

View Original