drodgers (u/drodgers)

drodgers commented on The Emperor's New LLM dayafter.substack.com/p/t... · Posted by u/shmval

raincole · 2 months ago

The problem is, no matter how you wrote the prompt, the way you wrote it still triggers some intrinsic bias of LLM.

Even a simple prompt like this:

=

I have two potential solutions.

Solution A:

Solution B:

Which one is better and why?

=

Is biased. Some LLM tends to choose the first option and the other prefer the last one.

(Of course, humans suffer from the same kind of bias too: https://electionlab.mit.edu/research/ballot-order-effects)

drodgers · 2 months ago

Eh. This is true for humans too and doesn’t make humans useless at evaluating business plans or other things.

You just want the signal from the object level question to drown out irrelevant bias (which plan was proposed first, which of the plan proposers are more attractive, which plan seems cooler etc.)

drodgers commented on Launch HN: Vassar Robotics (YC X25) – $219 robot arm that learns new skills · Posted by u/charleszyong

drodgers · 3 months ago

Love it! I've been looking for an excuse to dive into AI planning for robotics, and this looks like it will make it easy to get started.

Just one question: does the power supply have a 220/240v option (I'm in Australia)?

drodgers commented on The Illusion of Thinking: Strengths and limitations of reasoning models [pdf] ml-site.cdn-apple.com/pap... · Posted by u/amrrs

Byamarro · 3 months ago

And we have a good example of a dimwitted, brute-force process creating intelligent designs - evolution.

drodgers · 3 months ago

Also corporations, governments etc. - they're capable of things that none of the individuals could do alone.

drodgers commented on I read all of Cloudflare's Claude-generated commits maxemitchell.com/writings... · Posted by u/maxemitchell

drodgers · 3 months ago

> Prompts as Source Code

Another way to phrase this is LLM-as-compiler and Python (or whatever) as an intermediate compiler artefact.

Finally, a true 6th generation programming language!

I've considered building a toy of this with really aggressive modularisation of the output code (eg. python) and a query-based caching system so that each module of code output only changes when the relevant part of the prompt or upsteam modules change (the generated code would be committed to source control like a lockfile).

I think that (+ some sort of WASM encapsulated execution environment) would one of the best ways to write one off things like scripts which don't need to incrementally get better and more robust over time in the way that ordinary code does.

drodgers commented on The Illusion of Thinking: Strengths and limitations of reasoning models [pdf] ml-site.cdn-apple.com/pap... · Posted by u/amrrs

antics · 3 months ago

I am not sure if you mean this to refute something in what I've written but to be clear I am not arguing for or against what the authors think. I'm trying to state why I think there is a disconnect between them and more optimistic groups that work on AI.

drodgers · 3 months ago

I think that commenter was disagreeing with this line:

> because omniscient-yet-dim-witted models terminate at "superhumanly assistive"

It might be that with dim wits + enough brute force (knowledge, parallelism, trial-and-error, specialisation, speed) models could still substitute for humans and transform the economy in short order.

drodgers commented on The Illusion of Thinking: Strengths and limitations of reasoning models [pdf] ml-site.cdn-apple.com/pap... · Posted by u/amrrs

antics · 3 months ago

I think the intuition the authors are trying to capture is that they believe the models are omniscient, but also dim-witted. And the question they are collectively trying to ask is whether this will continue forever.

I've never seen this question quantified in a really compelling way, and while interesting, I'm not sure this PDF succeeds, at least not well-enough to silence dissent. I think AI maximalists will continue to think that the models are in fact getting less dim-witted, while the AI skeptics will continue to think these apparent gains are in fact entirely a biproduct of "increasing" "omniscience." The razor will have to be a lot sharper before people start moving between these groups.

But, anyway, it's still an important question to ask, because omniscient-yet-dim-witted models terminate at "superhumanly assistive" rather than "Artificial Superintelligence", which in turn economically means "another bite at the SaaS apple" instead of "phase shift in the economy." So I hope the authors will eventually succeed.

drodgers · 3 months ago

> I think AI maximalists will continue to think that the models are in fact getting less dim-witted

I'm bullish (and scared) about AI progress precisely because I think they've only gotten a little less dim-witted in the last few years, but their practical capabilities have improved a lot thanks to better knowledge, taste, context, tooling etc.

What scares me is that I think there's a reasoning/agency capabilities overhang. ie. we're only one or two breakthroughs away from something which is both kinda omniscient (where we are today), and able to out-think you very quickly (if only through dint of applying parallelism to actually competent outcome-modelling and strategic decision making).

That combination is terrifying. I don't think enough people have really imagined what it would mean for an AI to be able to out-strategise humans in the same way that they can now — say — out-poetry humans (by being both decent in terms of quality and super fast). It's like when you're speaking to someone way smarter than you and you realise that they're 6 steps ahead, and actively shaping your thought process to guide you where they want you to end up. At scale. For everything.

This exact thing (better reasoning + agency) is also the top priority for all of the frontier researchers right now (because it's super useful), so I think a breakthrough might not be far away.

Another way to phrase it: I think today's LLMs are about as good at snap judgements in most areas as the best humans (probably much better at everything that rhymes with inferring vibes from text), but they kinda suck at:

1. Reasoning/strategising step-by-step for very long periods

2. Snap judgements about reasoning or taking strategic actions (in the way that expert strategic humans don't actually need to think through their actions step-by-step very often - they've built intuition which gets them straight to the best answer 90% of the time)

Getting good at the long range thinking might require more substantial architectural changes (eg. some sort of separate 'system 2' reasoning architecture to complement the already pretty great 'system 1' transformer models we have). OTOH, it might just require better training data and algorithms so that the models develop good enough strategic taste and agentic intuitions to get to a near-optimal solution quickly before they fall off a long-range reasoning performance cliff.

Of course, maybe the problem is really hard and there's no easy breakthrough (or it requires 100,000x more computing power than we have access to right now). There's no certainty to be found, but a scary breakthrough definitely seems possible to me.

drodgers commented on My AI skeptic friends are all nuts fly.io/blog/youre-all-nut... · Posted by u/tabletcorry

drodgers · 3 months ago

> "LLMs can’t write Rust"

This really doesn't accord with my own experience. Using claude-code (esp. with opus 4) and codex (with o3) I've written lots of good Rust code. I've actually found Rust helps the AI-pair-programming experience because the agent gets such good, detailed feedback from the compiler that it can iterate very quickly and effectively.

Can it set up great architecture for a large, complex project from scratch? No, not yet. It can't do that in Ruby or Typescript either (though it might trick you by quickly getting something that kinda works in those languages). It think that will be a higher bar because of how Rust front-loads a lot of hard work, but I expect continuing improvement.

drodgers commented on Codex CLI is going native github.com/openai/codex/d... · Posted by u/bundie

npalli · 3 months ago

People will do anything to avoid reading and maintaining someone else's code. If that means rewriting in native and someone (usually VC's) is paying for it so be it. If you think working in existing TS/JS code is hard, wait until someone hands you their 100k+ line Rust code and asks you to make changes. In five years, another big shift to rewrite and change everything.

drodgers · 3 months ago

Making changes to huge rust projects is quite easy. For a substantial alteration, you make your change, the compiler tells you the 100 problems it caused, and you fix them all (~50% auto fix, 30% Claude/Codex, 20% manual), then the program probably does the thing.

Architecting the original 100kloc program well requires skill, but that effort is heavily front loaded.

drodgers commented on The ‘white-collar bloodbath’ is all part of the AI hype machine cnn.com/2025/05/30/busine... · Posted by u/lwo32k

jaggederest · 3 months ago

All these folks are once again seeing the first 1/4 of a sigmoid curve and extrapolating to infinity.

drodgers · 3 months ago

No doubt from me that it’s a sigmoid, but how high is the plateau? That’s also hard to know from early in the process, but it would be surprising if there’s not a fair bit of progress left to go.

Human brains seem like an existence proof for what’s possible, but it would be surprising if humans also represent the farthest physical limits of what’s technologically possible without the constraints of biology (hip size, energy budget etc).

drodgers commented on Human coders are still better than LLMs antirez.com/news/153... · Posted by u/longwave

materiallie · 3 months ago

I think benchmark targeting is going to be a serious problem going forward. The recent Nate Silver podcast on poker performance is interesting. Basically, the LLM models still suck at playing poker.

Poker tests intelligence. So what gives? One interesting thing is that for whatever reason, poker performance isn't used a benchmark in the LLM showdown between big tech companies.

The models have definitely improved in the past few years. I'm skeptical that there's been a "break-through", and I'm growing more skeptical of the exponential growth theory. It looks to me like the big tech companies are just throwing huge compute and engineering budgets at the existing transformer tech, to improve benchmarks one by one.

I'm sure if Google allocated 10 engineers a dozen million dollars to improve Gemini's poker performance, it would increase. The idea before AGI and the exponential growth hypothesis is that you don't have to do that because the AI gets smarter in a general sense all on it's own.

drodgers · 3 months ago

I think that's generally fair, but this point goes too far:

> improve benchmarks one by one

If you're right about that in the strong sense — that each task needs to be optimised in total isolation — then it would be a longer, slower road to a really powerful humanlike system.

What I think is really happening though that each specific task (eg. coding) is having large spillover effects on other areas (eg. helping them to be better at extended verbal reasoning even when not writing any code). The AI labs can't do everything at once, so they're focusing where:

- It's easy to generate more data and measure results (coding, maths etc.) - There's a relative lack of good data in the existing training corpus (eg. good agentic reasoning logic - the kinds of internal monologs that humans rarely write down) - Areas where it would be immediately useful for the models to get better in a targeted way (eg. agentic tool-use; developing great hypothesis generation instincts in scientific fields like algorithm design, drug discovery and ML research)

By the time those tasks are optimised, I suspect the spill over effects will be substantial and the models will generally be much more capable.

Beyond that, the labs are all pretty open about the fact that they want to use the resulting AI talents for coding, reasoning and research skills to accelerate their own research. If that works (definitely not obvious yet) then finding ways to train a much broader array of skills could be much faster because that process itself would be increasingly automated.