This is kinda frustrating to read. The style is very busy, and it lacks a clear structure. It's basically an information dump without any acknowledgment of what's important or not. Big-O notation is provided for a lot of operations where you wouldn't really care about Big-O (in a system where calls to an LLM dominate, this is most operations). Big picture story about how Claude Code actually works, as in what happens when I type in a prompt (which I'm very much interested, given how much I use it) is lacking. Some diagrams are so nonsensical they become funny. Look at this: https://southbridge-research.notion.site/Prompt-Engineering-... In general, the prompt engineering page, which deserves maybe the most detailed treatment, is just a dump of prompts and LLM bullet point filler.
I don't want to be overly negative, but I think it's only fair given the author hasn't graced us with their own thoughts, instead offloading the actual writing to an LLM.
Claude Code with Sonnet 4 is so good I've stopped using Aider. This has been hugely productive. I've been able to write agents that Claude Code can spawn and call out to for other models, even.
It's much better at breaking down tasks on its own. All the tool use stuff is also deeply integrated. So I can reliably make a plan with Claude Code, then have it keep working on implementing until all tests pass.
Claude Code is "agentic". Aider isn't. It can plan, use external tools, run the compiler, tests, linters, etc. You can do some of it with Aider, too, but Claude is more independent. The downside is that it can get very expensive, very fast.
Could you briefly explain your workflow? I use Zed’s agent mode and I don’t really understand how people are doing it purely through the CLI. How do you get a decent workflow where you can approve individual hunks? Aren’t you missing out on LSP help doing it in the CLI?
Claude code has a VS Code plugin now that lets you view and approve diffs in the editor. Before it did that, I really don't understand how people got anything of substance done because it simply isn't reliable enough over large codebases.
claude code churns away in a terminal, I have the git repository open in emacs with auto-revert-mode enabled so it reloads files if they're changed under it.
I view the files, and then review the changes in magit, and either approve some or all of them and commit them, or tell claude to do some thing else.
Better start now! It’s incredible and unbelievable how productive it is. In my opinion it still takes someone with a staff level of engineering experience to guide it through the hard stuff, but it does in a day with just me what multiple product teams would take months to do, and better.
I’m building a non-trivial platform as a solo project/business and have been working on it since about January. I’ve gotten more done in two nights than I did in 3 months.
I’m sure there are tons of arguments and great points against what I just said, but it’s my current reality and I still can’t believe it. I shelled out the $100/mo after one night of blowing through the $20 credits I used as a trial.
It does struggle with design and front end. But don’t we all.
Have you been able to interface Claude Code with Gemini 2.5 Pro? I'm finding that Gemini 2.5 Pro is still better at solving certain problems and architecture and it would be great to be able to consult directly in CC.
I do it indirectly. Gemini is my architecture goto. Claude Code for execution. It's just way more efficient to feed large portions of codebase at once to Gemini, pump out a plan and feed it to Claude Code. https://x.com/backnotprop/status/1929020702453100794
Well a quick hack is to tell Claude Code to make "AI!" comments in the code which Aider can be configured to watch for, then Gemini 2.5 Pro can do those tasks. Yes I really like Gemini still too
No, it's completely useless, and puts the entire rest of the analysis in a bad light.
LLMs have next to no understanding of their own internal processes. There's a significant amount of research that demonstrates this. All explanations of an internal thought process in an LLM are completely reverse engineered to fit the final answer (interestingly, humans are also prone to this – seen especially in split brain experiments).
In addition, the degree to which the author must have prompted the LLM to get it to anthropomorphize this hard makes the rest of the project suspect. How many of the results are repeated human prompting until the author liked the results, and how many come from actual LLM intelligence/analysis skill?
By saying that's its gold mine, I think OP meant that's it's funny, not that it brings valuable insight.
ie: THEY KNOW -> that made me laugh
and as the article said
"an LLM who just spent thousands of words explaining why they're not allowed to use thousands of words", its just funny to read.
The fact that they produce this as “default” response is an interesting insight regardless of its internal mechanisms. I don’t understand my neurons but can still articulate how I feel
It's sure phrased like one, but I'd be careful to attribute LLM thought process to what it says it's thinking. LLMs are experts at working backwards to justify why they came to an answer, even when it's entirely fabricated
I would go further and say it's _always_ fabricated. LLMs are no better able to explain their inner workings than you are able to explain which neurons are firing for a particular thought in your head.
Note, this isn't a statement on the usefulness of LLMs, just their capability. An LLM may eventually be given a tool to enable it to introspect, but IMO its not natively possible with the LLM architectures today.
Right… because these things are trained on sci-fi and so when asked to describe an internal monologue they create text that reads like an internal monologue from a sci-fi character.
Maybe there’s genuine sentience there, maybe not. Maybe that text explains what’s happening, maybe not.
also, i will say, (if we can trust the findings in these notes are relatively accurate of the real implementation) is a PERFECT example of the real level of complexity used in cutting edge configuration of using LLM... its not just some complex fancy prompt you give to a model in a chat window... there is so much important stuff happening behind the scenes... though i suppose the people who complain about LLMs hallucinating / screwing up havent tried claude code or any agentic work flows - or, it could be their architecture / code is so poorly written and poorly organized that even the LLM itself struggles to modify it properly
> or, it could be their architecture / code is so poorly written and poorly organized that even the LLM itself struggles to modify it properly
You wrote this like this is some rare occurrence, and not a description of a bulk of the production code that exists today, even at high level tech companies.
It sees everything it needs to in one pass, no extra reasoning or instruction tokens around things like MCP that abstract and create hops to simple understanding of where things are at.
There is something here about the native filesystem and tooling. & some type of insight into what agentic software engineering will look like - I mostly feel like an orchestrator, or validator in the terminal window next to Claude Code where i run tests/related things.
I was never a great terminal developer, I cant even type right - but Claude Code by far provides the best software engineering interface in there terms of LLM/agent UX.
I don't want to be overly negative, but I think it's only fair given the author hasn't graced us with their own thoughts, instead offloading the actual writing to an LLM.
I view the files, and then review the changes in magit, and either approve some or all of them and commit them, or tell claude to do some thing else.
it works astonishingly well.
I’m building a non-trivial platform as a solo project/business and have been working on it since about January. I’ve gotten more done in two nights than I did in 3 months.
I’m sure there are tons of arguments and great points against what I just said, but it’s my current reality and I still can’t believe it. I shelled out the $100/mo after one night of blowing through the $20 credits I used as a trial.
It does struggle with design and front end. But don’t we all.
LLMs have next to no understanding of their own internal processes. There's a significant amount of research that demonstrates this. All explanations of an internal thought process in an LLM are completely reverse engineered to fit the final answer (interestingly, humans are also prone to this – seen especially in split brain experiments).
In addition, the degree to which the author must have prompted the LLM to get it to anthropomorphize this hard makes the rest of the project suspect. How many of the results are repeated human prompting until the author liked the results, and how many come from actual LLM intelligence/analysis skill?
and as the article said "an LLM who just spent thousands of words explaining why they're not allowed to use thousands of words", its just funny to read.
You're stuck on the anthropomorphize semantics, but that wasn't the purpose of the exercise.
I would go further and say it's _always_ fabricated. LLMs are no better able to explain their inner workings than you are able to explain which neurons are firing for a particular thought in your head.
Note, this isn't a statement on the usefulness of LLMs, just their capability. An LLM may eventually be given a tool to enable it to introspect, but IMO its not natively possible with the LLM architectures today.
It sounds a lot like like the Murderbot character in the AppleTV show!
Maybe there’s genuine sentience there, maybe not. Maybe that text explains what’s happening, maybe not.
FireFox 113.0.2, how come?
You wrote this like this is some rare occurrence, and not a description of a bulk of the production code that exists today, even at high level tech companies.
It sees everything it needs to in one pass, no extra reasoning or instruction tokens around things like MCP that abstract and create hops to simple understanding of where things are at.
I was never a great terminal developer, I cant even type right - but Claude Code by far provides the best software engineering interface in there terms of LLM/agent UX.
It is good because it highlights the relevant aspects of the design and you can use this, plus some other resources, to replicate the idea.