I've come to view LLMs as a consulting firm where, for each request, I have a 50% chance of getting either an expert or an intern writing my code, and there's no way to tell which.
Sometimes I accept this, and I vibe-code, when I don't care about the result. When I do care about the result, I have to read every line myself. Since reading code is harder than writing it, this takes longer, but LLMs have made me too lazy to write code now, so that's probably the only alternative that works.
I have to say, though, the best thing I've tried is Cursor's autocomplete, which writes 3-4 lines for you. That way, I can easily verify that the code does what I want, while still reaping the benefit of not having to look up all the APIs and function signatures.
I've also had a similar experience. I have become too lazy since I started vibe-coding. My coding has transitioned from coder to code reviewer/fixer vey quickly. Overall I feel like it's a good thing because the last few years of my life has been a repetition of frontend components and api endpoints, which to me has become too monotonous so I am happy to have AI take over that grunt work while I supervise.
> My coding has transitioned from coder to code reviewer/fixer vey quickly. Overall I feel like it's a good thing
Until you lose access to the LLM and find your ability has atrophied to the point you have to look up the simplest of keywords.
> the last few years of my life has been a repetition of frontend components and api endpoints, which to me has become too monotonous
It’s a surprise that so many people have this problem/complaint. Why don’t you use a snippet manager?! It’s lightweight, simple, fast, predictable, offline, and includes the best version of what you learned. We’ve had the technology for many many years.
There is a way of doing things that sits between autocomplete and vibe coding. If you use these tools effectively, you learn how to prop them up with context, so make sure they won't start imaging how things should work, then you make it plan a solution, and if you have the time, you watch it implement it and approve as you go. Sometimes you stop it, make correction, and you go on. While it's doing its thing, you can plan the next task. Sometimes I do it in larger chunks, where I auto-accept changes/tool uses, but even in that, I only allow it to do one small task as a time, something that takes me minutes to review.
> I have a 50% chance of getting either an expert or an intern writing my code
The way I describe it is almost gambling with your time. Every time I want to reach for the Cline extension in VSCode, I always ask myself "if this gamble worth it?" and "what are my odds for this gamble?".
For some things like simple refactoring I'm usually getting great odds so I use AI, but I would say at least 5-6 times last week I've thought about it and ended up doing it by hand as the odds were not in my favor.
One thing I've picked up using AI over the past few months is this sense of what it can and can't do. For some things I'm like "yeah it can do this no problem" but for other tasks I find myself going "better do this by hand, AI will just fuck it up"
The multi-line autocomplete feels like the right middle ground (for me) when working in an existing code base with well established patterns. Adding new functionality is mostly a matter of scaffolding, commenting as I go, typing the first few characters of a code block and tabbing my way through the bulk of the code.
I still have doubts but I am warming up to Claude Code. Sometimes it gets in these ruts where it tries something wrong, gets shown it's wrong, keeps digging, and can't even explain how its failed suggestions could even work. Other times, it seems to reproduce the kinds of insights (and even judgment) that I'd expect out of an actual practitioner. After suggesting code to fix problems, I always ask it to explain the context in more detail. I was recently using it help to develop a Qt C++ application, and ended up getting this interaction from it:
> Can you speculate on why this problem was only happening with these embedded buttons? Is this a flaw in Qt, or is this application trying to do something unusual?
Why this problem occurred: This is a Qt design issue, not a flaw per se. Qt's item delegate editing model assumes editors are simple widgets (line edits, spin boxes, combo boxes) that don't spawn modal dialogs. When a modal dialog opens:
1. The modal dialog steals focus from the editor button
2. Qt's QAbstractItemView sees the FocusOut event and assumes editing is complete
3. It calls QStyledItemDelegate::destroyEditor() to clean up the editor
4. The button gets deleted while the color dialog is still running
5. When the color dialog closes, it tries to interact with the deleted button
This application is doing something unusual - using a button that spawns a modal dialog as a table cell editor. Most Qt delegates use immediate editing widgets (text fields, dropdowns) that don't require modal interactions.
The eventFilter solution works by intercepting the FocusOut event that triggers editor destruction, essentially telling Qt "I'm still editing, don't destroy me yet."
> Is there a danger of leaked Qt resources now that we are blocking destruction?
Yes, that's a valid concern. By blocking FocusOut events, we're preventing Qt's normal editor cleanup mechanism. However, the risk is likely minimal because:
1. Qt still cleans up eventually - The editor will be destroyed when the table view itself is destroyed or when editing truly ends (e.g., clicking elsewhere, pressing Enter/Escape)
2. Short-lived editors - These editor widgets are typically very short-lived (created when editing starts, destroyed when editing ends)
3. Parent widget cleanup - Since the button has a parent widget, Qt's parent-child ownership will clean it up when the parent is destroyed
To be completely safe, we could add a more targeted approach - only block FocusOut events while a color dialog is actively open, rather than blocking all FocusOut events. We could track when the modal dialog is shown/hidden and only filter during that window.
But for this use case, the current solution should be fine since:
- Color picking is infrequent
- The editor lifetime is still bounded by the table view's normal lifecycle
- Qt's parent-child cleanup provides a safety net
The memory usage impact should be negligible compared to the complexity of implementing more sophisticated tracking.
-----
I've definitely had less-satisfying discussions over bugs with actual human senior software engineers than this.
It seems to be just playing the “yes and” improv game with you. You might want to also try prompting it against the same suggestions and see if it changes to follow your lead or continues to hold the original opinion.
I believe choosing a well known problem space in a well known language certainly influenced a lot of the behavior. AIs usefulness is correlated strongly with its training data and there’s no doubt been a significant amount of data about both the problem space and Python.
I’d love to see how this compares when either the problem space is different or the language/ecosystem is different.
I think you are correct. I work in game dev. Almost all code is in C/C++ (with some in Python and C#).
LLMs are nothing more than rubber ducking in game dev. The code they generate is often useful as a starting point or to lighten the mood because it's so bad you get a laugh. Beyond that it's broadly useless.
I put this down to the relatively small number of people who work in game dev resulting in relatively small number of blogs from which to "learn" game dev.
Game Dev is a conservative industry with a lot of magic sauce hidden inside companies for VERY good reasons.
One of my test queries for AI models is to ask it for an 8 bit asm function to do something that was invented recently enough that there is unlikely to be an implementation yet.
Multiplying two 24 bit posits in 8-bit Avr for instance. No models have succeeded yet, but usually because they try and put more than 8 bits into a register. Algorithmically it seems like they are on the right track but they don't seem to be able to hold the idea that registers are only 8-bits through the entirety of their response.
> Although in fairness this was a year ago on GPT 3.5 IIRC
GPT3.5 was impressive at the time, but today's SOTA (like GPT 5 Pro) are almost night-and-difference both in terms of just producing better code for wider range of languages (I mostly do Rust and Clojure, handles those fine now, was awful with 3.5) and more importantly, in terms of following your instructions in user/system prompts, so it's easier to get higher quality code from it now, as long as you can put into words what "higher quality code" means for you.
I write Haskell with Claude Code and it's got remarkably good recently. We have some code at work that uses STM to have what is essentially a mutable state machine. I needed to split a state transition apart, and it did an admirable job. I had to intervene once or twice when it was going down a valid, but undesirable approach. This almost one shot performance was already a productivity boost, but didn't quite build. What I find most impressive now is the "fix" here is to literally have Claude run the build and see the errors. While GHC errors are verbose and not always the best it got everything building in a few more iterations. When it later got a test failure, I suggested we add a bit more logging - so it logged all state transitions, and spotted the unexpected transition and got the test passing. We really are a LONG way away from 3.5 performance.
Post-training in all frontier models has improved significantly wrt to programming language support. Take Elexir, which LLMs could barely handle a test ago, but now support has gotten really good
If I was told I'd be working with a fellow programmer who would make all the mistakes listed in Section 5 of the article, I'd have to say "no thanks". Yet the author ends with "I don’t think I will ever code again without the assistance of an AI model". He's a lot more thick-skinned than I.
What would you expect from "AI guy vibing AI code for AI application"? Marco warned you about the "AI echo chamber" from the outset - and he kept his promise :-)
Some people also value programs for their productive ends rather than value them for the process of writing them in a pleasing way. Personally, I've been getting more done than ever with Claude Code. That I am able to work just a few minutes at a time then let the machine go is really nice as a parent. For those of us who don't program for a day job, but need programs for our day job, Claude and friends have completely changed what's possible.
Even those of us who do program for a day job can end up in a situation where focus time is hard to come by. The more senior you are in an IC role, the more likely it is you have more demands on your time than there are hours in the day. I find that Claude (via Warp) has helped me accomplish things that otherwise wouldn't get done because I'm bouncing from meeting to meeting or answering the nth dm.
Great article, though I'm still reading it as it's a mammoth read!
A side note: as it's been painfully pointed out to me, "vibe coding" means not reading the code (ever!). We need a term for coding with LLMs exclusively, but also reviewing the code they output at each step.
What stands out for me, is that it was all possible thanks to the fact that the AI operator/conversationalist had enough knowledge to, more or less write, it all by hand, if he chose to.
Probably it was said many times already, but it will rather be the competition between programmers with AI and programmers without one, rather than no programmers with AI.
In particular, I love this part:
"I had serious doubts about the feasibility and efficiency of using inherently ambiguous natural languages as (indirect) programming tools, with a machine in between doing all the interpretation and translation toward artificial languages endowed with strict formal semantics. No more doubts: LLM-based AI coding assistants are extremely useful, incredibly powerful, and genuinely energising.
But they are fully useful and safe only if you know what you are doing and are able to check and (re)direct what they might be doing — or have been doing unbeknownst to you. You can trust them if you can trust yourself."
Which isn’t really “vibe coding” as it’s been promoted, i.e. a way for non-programmers to just copy and paste their way to fully working software systems.
It’s a very powerful tool but needs to be used by someone with the expertise to find the flaws.
To be clear, this was not a vibe coding exercise, despite the title. The author supervised and reviewed the code changes at every step, caught mistakes and sub-optimal solutions, and worked with the LLM to fix and improve those problems.
This is not someone who just said "build me X", left it to run for a while, and then accepted whatever it wrote without reading it.
(I'm not criticizing the article's author here. It was an excellent, thoughtful read, and I think an article that was actually about something vibe-coded would be boring and not really teach me anything useful.)
> Also, these assistants (for now) appear to exhibit no common sense about what is “much”, “little”, “exceptional”, “average”, etc. For example, after measuring a consumption of 3.5GB of memory (!!) for solving a 3-disk problem (due to a bug), the assistant declared all was well...
Sometimes I accept this, and I vibe-code, when I don't care about the result. When I do care about the result, I have to read every line myself. Since reading code is harder than writing it, this takes longer, but LLMs have made me too lazy to write code now, so that's probably the only alternative that works.
I have to say, though, the best thing I've tried is Cursor's autocomplete, which writes 3-4 lines for you. That way, I can easily verify that the code does what I want, while still reaping the benefit of not having to look up all the APIs and function signatures.
Until you lose access to the LLM and find your ability has atrophied to the point you have to look up the simplest of keywords.
> the last few years of my life has been a repetition of frontend components and api endpoints, which to me has become too monotonous
It’s a surprise that so many people have this problem/complaint. Why don’t you use a snippet manager?! It’s lightweight, simple, fast, predictable, offline, and includes the best version of what you learned. We’ve had the technology for many many years.
Reading bad code is harder than writing bad code. Reading good code is easier than writing good code.
The way I describe it is almost gambling with your time. Every time I want to reach for the Cline extension in VSCode, I always ask myself "if this gamble worth it?" and "what are my odds for this gamble?".
For some things like simple refactoring I'm usually getting great odds so I use AI, but I would say at least 5-6 times last week I've thought about it and ended up doing it by hand as the odds were not in my favor.
One thing I've picked up using AI over the past few months is this sense of what it can and can't do. For some things I'm like "yeah it can do this no problem" but for other tasks I find myself going "better do this by hand, AI will just fuck it up"
isn't that the same as delegated task to jr developer but you still have to check their work as sr?
Deleted Comment
> Can you speculate on why this problem was only happening with these embedded buttons? Is this a flaw in Qt, or is this application trying to do something unusual?
> Is there a danger of leaked Qt resources now that we are blocking destruction? -----I've definitely had less-satisfying discussions over bugs with actual human senior software engineers than this.
I’d love to see how this compares when either the problem space is different or the language/ecosystem is different.
It was a great read regardless!
LLMs are nothing more than rubber ducking in game dev. The code they generate is often useful as a starting point or to lighten the mood because it's so bad you get a laugh. Beyond that it's broadly useless.
I put this down to the relatively small number of people who work in game dev resulting in relatively small number of blogs from which to "learn" game dev.
Game Dev is a conservative industry with a lot of magic sauce hidden inside companies for VERY good reasons.
Multiplying two 24 bit posits in 8-bit Avr for instance. No models have succeeded yet, but usually because they try and put more than 8 bits into a register. Algorithmically it seems like they are on the right track but they don't seem to be able to hold the idea that registers are only 8-bits through the entirety of their response.
Although in fairness this was a year ago on GPT 3.5 IIRC
GPT3.5 was impressive at the time, but today's SOTA (like GPT 5 Pro) are almost night-and-difference both in terms of just producing better code for wider range of languages (I mostly do Rust and Clojure, handles those fine now, was awful with 3.5) and more importantly, in terms of following your instructions in user/system prompts, so it's easier to get higher quality code from it now, as long as you can put into words what "higher quality code" means for you.
Deleted Comment
A side note: as it's been painfully pointed out to me, "vibe coding" means not reading the code (ever!). We need a term for coding with LLMs exclusively, but also reviewing the code they output at each step.
BASE: Brain And Silicon Engineering
CLASS: Computer/Llm-Assisted Software Specification
STRUCT: Scripting Through Recurrent User/Computer Teamup
ELSE: Electronically Leveraged Software Engineering
VOID: Very Obvious Intelligence Deficit
Okay maybe not that last one
Prediction: arguments over the definition will ensue
It doesn't imply AI, but I don't distinguish between AI-assisted and pre-AI coding, just vibe-coding as I think thats the important demarcation now.
Deleted Comment
"Lets prompt up a new microservice for this"
"What have you been prompting lately?"
"Looking at commits, prompt coding is now 50% of your output. Have a raise"
Probably it was said many times already, but it will rather be the competition between programmers with AI and programmers without one, rather than no programmers with AI.
In particular, I love this part:
"I had serious doubts about the feasibility and efficiency of using inherently ambiguous natural languages as (indirect) programming tools, with a machine in between doing all the interpretation and translation toward artificial languages endowed with strict formal semantics. No more doubts: LLM-based AI coding assistants are extremely useful, incredibly powerful, and genuinely energising.
But they are fully useful and safe only if you know what you are doing and are able to check and (re)direct what they might be doing — or have been doing unbeknownst to you. You can trust them if you can trust yourself."
Which isn’t really “vibe coding” as it’s been promoted, i.e. a way for non-programmers to just copy and paste their way to fully working software systems.
It’s a very powerful tool but needs to be used by someone with the expertise to find the flaws.
What it does is pretty simple. You give it a problem, setup enviornment with libraries and all.
It continuously makes changes to the program, then checks it output.
And iteratively improves it.
For example, we used it to build a new method to apply diffs generated by LLMs to files.
As different models are good at different things, we managed to run it against models to figure out which method performs best.
Can a human do it? I doubt.
This is not someone who just said "build me X", left it to run for a while, and then accepted whatever it wrote without reading it.
(I'm not criticizing the article's author here. It was an excellent, thoughtful read, and I think an article that was actually about something vibe-coded would be boring and not really teach me anything useful.)
> Also, these assistants (for now) appear to exhibit no common sense about what is “much”, “little”, “exceptional”, “average”, etc. For example, after measuring a consumption of 3.5GB of memory (!!) for solving a 3-disk problem (due to a bug), the assistant declared all was well...
That describes a good portion of my coworkers.