Generative AI coding tools and agents do not work for me

> Another common argument I've heard is that Generative AI is helpful when you need to write code in a language or technology you are not familiar with. To me this also makes little sense.

I'm not sure I get this one. When I'm learning new tech I almost always have questions. I used to google them. If I couldn't find an answer I might try posting on stack overflow. Sometimes as I'm typing the question their search would finally kick in and find the answer (similar questions). Other times I'd post the question, if it didn't get closed, maybe I'd get an answer a few hours or days later.

Now I just ask ChatGPT or Gemini and more often than not it gives me the answer. That alone and nothing else (agent modes, AI editing or generating files) is enough to increase my output. I get answers 10x faster than I used to. I'm not sure what that has to do with the point about learning. Getting answers to those question is learning, regardless of where the answer comes from.

plasticeagle · 8 months ago

ChatGPT and Gemini literally only know the answer because they read StackOverflow. Stack Overflow only exists because they have visitors.

What do you think will happen when everyone is using the AI tools to answer their questions? We'll be back in the world of Encyclopedias, in which central authorities spent large amounts of money manually collecting information and publishing it. And then they spent a good amount of time finding ways to sell that information to us, which was only fair because they spent all that time collating it. The internet pretty much destroyed that business model, and in some sense the AI "revolution" is trying to bring it back.

Also, he's specifically talking about having a coding tool write the code for you, he's not talking about using an AI tool to answer a question, so that you can go ahead and write the code yourself. These are different things, and he is treating them differently.

socalgal2 · 8 months ago

> ChatGPT and Gemini literally only know the answer because they read StackOverflow. Stack Overflow only exists because they have visitors.

I know this isn't true because I work on an API that has no answers on stackoverflow (too new), nor does it have answers anywhere else. Yet, the AI seems to able to accurately answer many questions about it. To be honest I've been somewhat shocked at this.

semiquaver · 8 months ago

> ChatGPT and Gemini literally only know the answer because they read StackOverflow

Obviously this isn’t true. You can easily verify this by inventing and documenting an API and feeding that description to an LLM and asking it how to use it. This works well. LLMs are quite good at reading technical documentation and synthesizing contextual answers from it.

AlwaysRock · 8 months ago

> ChatGPT and Gemini literally only know the answer because they read StackOverflow. Stack Overflow only exists because they have visitors.

I mean... They also can read actual documentation. If I'm working on any api work or a language I'm not familiar with, I ask the LLM to include the source they got their answer from and use official documentation when possible.

That lowers the hallucination rate significantly and also lets me ensure said function or code actually does what the llm reports it does.

In theory, all stackoverflow answers are just regurgitated documentation, no?

erikerikson · 8 months ago

I broadly agree that cutting new knowledge will need to continue being done and that overuse of LLMs could undermine that, yet... When was the last time you paid to read an APIs' docs? It costs money for companies to make those too.

olmo23 · 8 months ago

Where does the knowledge come from? People can only post to SO if they've read the code or the documentation. I don't see why LLMs couldn't do that.

reaperducer · 8 months ago

We'll be back in the world of Encyclopedias

On a related note, I recently learned that you can still subscribe to the Encyclopedia Britannica. It's $9/month, or $75/year.

Considering the declining state of Wikipedia, and the untrustworthiness of A.I., I'm considering it.

kypro · 8 months ago

The idea that LLMs can only spew out text they've been trained on is a fundamental miss-understanding of how modern backprop training algorithms work. A lot of work goes into refining training algorithms to preventing overfitting of the training data.

Generalisation is something that neural nets are pretty damn good at, and given the complexity of modern LLMs the idea that they cannot generalise the fairly basic logical rules and patterns found in code such that they're able provide answers to inputs unseen in the training data is quite an extreme position.

CamperBob2 · 8 months ago

We'll start writing documentation for primary consumption by LLMs rather than human readers. The need for sites like SO will not vanish overnight but it will diminish drastically.

socalgal2 · 8 months ago

To add, another experience I had. I was using an API I'm not that familiar with. My program was crashing. Looking at the stack trace I didn't see why. Maybe if I had many months experience with this API it would be obvious but it certainly wasn't to me. For fun I just copy and pasted the stack trace into Gemini. ~60 frames worth of C++. It immediately pointed out the likely cause given the API I was using. I fixed the bug with a 2 line change once I had that clue from the AI. That seems pretty useful to me. I'm not sure how long it would have taken me to find it otherwise since, as I said, I'm not that familiar with that API.

nottorp · 8 months ago

You remember when Google used to do the same thing for you way before "AI"?

Okay, maybe sometimes the post about the stack trace was in Chinese, but a plain search used to be capable of giving the same answer as a LLM.

It's not that LLMs are better, it's search that got entshittified.

adriancr1 · 8 months ago

You missed out on developing debugging skills.

Analyzing crash dumps and figuring out what's going on is a pretty useful skill.

BlackFly · 8 months ago

One of the many ways that search got worse over time was the promotion of blog spam over actual documentation. Generally, I would rather have good API documentation or a user guide that leads me through the problem so that next time I know how to help myself. Reading through good API documentation often also educates you about the overall design and associated functionality that you may need to use later. Reading the manual for technology that you will be regularly using is generally quite profitable.

Sometimes, a function doesn't work as advertised or you need to do something tricky, you get a weird error message, etc. For those things, stackoverflow could be great if you could find someone who had a similar problem. But the tutorial level examples on most blogs might solve the immediate problem without actually improving your education.

It would be similar to someone solving your homework problems for you. Sure you finished your homework, but that wasn't really learning. From this perspective, ChatGPT isn't helping you learn.

blueflow · 8 months ago

You parent searches for answers, you search for documentation. Thats why AI works for him and not for you.

turtlebits · 8 months ago

It's perfect for small boilerplate utilities. If I need a browser extension/tampermonkey script, I can get up and running quickly without having to read docs/write manifests. These are small projects where without AI, I wouldn't have bothered to even start.

At its least, AI can be extremely useful for autocompleting simple code logic or automatically finding replacements when I'm copying code/config and making small changes.

raxxorraxor · 8 months ago

For anything non-trivial you have to verify the results.

I disabled AI autocomplete and cannot understand how people can use it. It was mostly an extra key press on backspace for me.

That said, learning new languages is possible without searching anything. With a local model, you can do that offline and have a vast library of knowledge at hand.

The Gemini results integrated in Google are very bad as well.

I don't see the main problem to be people just lazily asking AI for how to use the toilet, but that real knowledge bases like stack overflow and similar will vanish because of lacking participation.

perrygeo · 8 months ago

> Getting answers to those question is learning, regardless of where the answer comes from.

Sort of. The process of working through the question is what drives learning. If you just receive the answer with zero effort, you are explicitly bypassing the brain's learning mechanism.

There's huge difference between your workflow and fully Agentic AIs though.

Asking an AI for the answer in the way you describe isn't exactly zero effort. You need to formulate the question and mold the prompt to get your response, and integrate the response back into the project. And in doing so you're learning! So YOUR workflow has learning built in, because you actually use your brain before and after the prompt.

But not so with vibe coding and Agentic LLMs. When you hit submit and get the tokens automatically dumped into your files, there is no learning happening. Considering AI agents are effectively trying to remove any pre-work (ie automating prompt eng) and post-work (ie automating debugging, integrating), we can see Agentic AI as explicitly anti-learning.

Here's my recent vibe coding anecdote to back this up. I was working on an app for an e-ink tablet dashboard and the tech stack of least resistance was C++ with QT SDK and their QML markup language with embedded javascript. Yikes, lots of unfamiliar tech. So I tossed the entire problem at Claude and vibe coded my way to a working application. It works! But could I write a C++/QT/QML app again today - absolutely not. I learned almost nothing. But I got working software!

Eisenstein · 8 months ago

The logical conclusion of this is 'the AI just solves the problem by coding without telling you about it'. If we think about 'what happens when everyone vibe-codes to solve their problems' then we get to 'the AI solves the problem for you, and you don't even see the code'.

Vibe-coding is just a stop on the road to a more useful AI and we shouldn't think of it as programming.

PeterStuer · 8 months ago

I love leaning new things. With ai I am learning more and faster.

I used to be on the Microsoft stack for decades. Windows, Hyper-V, .NET, SQL Server ... .

Got tired of MS's licensing BS and I made the switch.

This meant learning Proxmox, Linux, Pangolin, UV, Python, JS, Bootstrap, NGinx, Plausible, SQLite, Postgress ...

Not all of these were completely new, but I had never dove in seriously.

Without AI, this would have been a long and daunting project. AI made this so much smoother. It never tires of my very basic questions.

It does not always answer 100% correct the first time (tip: paste in the docs of specific version of the thing you are trying to figure out as it sometimes has out-of-date or mixed version knowledge), but most often can be nudged and prodded to a very helpfull result.

AI is just an undeniably superior teacher than Google or Stack Overflow ever was. You still do the learning, but the AI is great in getting you to learn.

rootnod3 · 8 months ago

I might be an outlier, but I much prefer reading the documentation myself. One of the reasons I love using FreeBSD and OpenBSD as daily drivers. The documentation is just so damn good. Is it a pain in the ass at the beginning? Maybe. But I require way less documentation lookups over time and do not have to rely on AI for that.

Don't get me wrong, I tried. But even when pasting the documentation in, the amount of times it just hallucinated parameters and arguments that were not even there were such a huge waste of time, I don't see the value in it.

nikanj · 8 months ago

And ChatGPT never closes your question without answer because it (falsely) thinks it's a duplicate of a different question from 13 years ago

nottorp · 8 months ago

But it does give you a ready to copy paste answer instead of a 'teach the man how to fish' answer.

rich_sasha · 8 months ago

I sort of disagree with this argument in TFA, as you say, though the rest of the article highlights a limitation. If I'm unfamiliar with the API, I can't judge whether the answer is good.

There is a sweet spot of situations I know well enough to judge a solution quickly, but not well enough to write code quickly, but that's a rather narrow case.

greybox · 8 months ago

I trust chatgpt and gemini a lot less than stackoverflow. On stackoverflow I can see the context that the answer to the original question was given in. AI does not do this. I've asked chatgpt questions about cmake for instance that it got subtly wrong, if I had not noticed this it would have cost me aa lot of time.

thedelanyo · 8 months ago

So AI is basically best as a search engine.

jrm4 · 8 months ago

As I've said a bunch.

AI is a search engine that can also remix its results, often to good effect.

groestl · 8 months ago

That's right.

antisthenes · 8 months ago

Alwayshasbeen.jpg meme.

I mean yes, current large models are essentially compressing incredible amounts of content into something manageable by a single Accelerator/GPU, and making it available for retrieval through inference.

cess11 · 8 months ago

I mean, it's just a compressed database with a weird query engine.

0x500x79 · 8 months ago

For one-offs, sure! Go for it. For production/things you will have to manage long-term I would recommend learning some of the space given the output of AI and your capability to surpass that pretty quickly.

yard2010 · 8 months ago

I think the main issue here is trust. When you google something you develop a sense for bullshit so you can "feel" the sources and weigh them accordingly. Using a chat bot, this bias doesn't hold, so you don't know what is just SEO bullshit reiterated in sweet words and what's not.

To some degree, traditional coding and AI coding are not the same thing, so it's not surprising that some people are better at one than the other. The author is basically saying that he's much better at coding than AI coding.

But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize. You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.

With that said, I'm still very skeptical of letting the AI drive the majority of the software work, despite meeting people who swear it works. I personally am currently preferring "let the AI do most of the grunt work but get good at managing it and shepherding the high level software design".

It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.

dspillett · 8 months ago

> To some degree, traditional coding and AI coding are not the same thing

LLM-based¹ coding, at least beyond simple auto-complete enhancements (using it directly & interactively as what it is: Glorified Predictive Text) is more akin to managing a junior or outsourcing your work. You give a definition/prompt, some work is done, you refine the prompt and repeat (or fix any issues yourself), much like you would with an external human. The key differences are turnaround time (in favour of LLMs), reliability (in favour of humans, though that is mitigated largely by the quick turnaround), and (though I suspect this is a limit that will go away with time, possibly not much time) lack of usefulness for "bigger picture" work.

This is one of my (several) objections to using it: I want to deal with and understand the minutia of what I am doing, I got into programming, database bothering, and infrastructure kicking, because I enjoyed it, enjoyed learning it, and wanted to do it. For years I've avoided managing people at all, at the known expense of reduced salary potential, for similar reasons: I want to be a tinkerer, not a manager of tinkerers. Perhaps call me back when you have an AGI that I can work alongside.

--------

[1] Yes, I'm a bit of a stick-in-the-mud about calling these things AI. Next decade they won't generally be considered AI like many things previously called AI are not now. I'll call something AI when it is, or very closely approaches, AGI.

rwmj · 8 months ago

Another difference if your junior will, over time, learn, and you'll also get a sense of whether you can trust them. If after a while they aren't learning and you can't trust them, you get rid of them. GenAI doesn't gain knowledge in the same way, and you're always going to have the same level of trust in it (which in my experience is limited).

Also if my junior argued back and was wrong repeatedly, that's be bad. Lucky that has never happened with AIs ...

danielbln · 8 months ago

> I want to be a tinkerer, not a manager of tinkerers.

We all want many things, doesn't mean someone will pay you for it. You want to tinker? Great, awesome, more power to you, tinker on personal projects to your heart's content. However, if someone pays you to solve a problem, then it is our job to find the best, most efficient way to cleanly do it. Can LLMs do this on their own most of the time? I think not, not right now at least. The combination of skilled human and LLM? Most likely, yes.

thefz · 8 months ago

> I want to deal with and understand the minutia of what I am doing, I got into programming, database bothering, and infrastructure kicking, because I enjoyed it, enjoyed learning it, and wanted to do it

A million times yes.

And we live in a time in which people want to be called "programmers" because it's oh-so-cool but not doing the work necessary to earn the title.

mitthrowaway2 · 8 months ago

The skill ceiling might be "high" but it's not like investing years of practice to become a great pianist. The most experienced AI coder in the world has about three years of practice working this way, much of which is obsoleted because the models have changed to the point where some lessons learned on GPT 3.5 don't transfer. There aren't teachers with decades of experience to learn from, either.

freehorse · 8 months ago

Moreover, the "ceiling" may still be below the "code works" level, and you have no idea when you start if it is or not.

dr_dshiv · 8 months ago

It’s mostly attitude that you are learning. Playfulness, persistence and a willingness to start from scratch again and again.

notnullorvoid · 8 months ago

Is it a skill worth learning though? How much does the output quality improve? How transferable is it across models and tools of today, and of the future?

From what I see of AI programming tools today, I highly doubt the skills developed are going to transfer to tools we'll see even a year from now.

vidarh · 8 months ago

Given I see people insisting these tools don't work for them at all, and some of my results recently include spitting out a 1k line API client with about 5 brief paragraphs of prompts, and designing a website (the lot, including CSS, HTML, copy, database access) and populating the directory on it with entries, I'd think the output quality improves a very great deal.

From what I see of the tools, I think the skills developed largely consists of skills you need to develop as you get more senior anyway, namely writing detail-oriented specs and understanding how to chunk tasks. Those skills aren't going to stop having value.

serpix · 8 months ago

Regarding using AI tools for programming it is not a one-for-all choice. You can pick a grunt work task such as "Tag every such and such terraform resource with a uuid" and let it do just that. Nothing to do with quality but everything to do with a simple task and not having to bother with the tedium.

npilk · 8 months ago

Maybe this is yet another application of the bitter lesson. It's not worth learning complex processes for partnering with AI models, because any productivity gains will pale in comparison to the performance improvement from future generations.

jyounker · 8 months ago

Describing things in enough detail that someone else can implement them is a pretty important skill. Learning how to break up a large project into smaller tasks that you can then delegate to others is also a pretty important skill.

stitched2gethr · 8 months ago

It will very soon be the only way.

skydhash · 8 months ago

> But it's important to realize that AI coding is itself a skill that you can develop. It's not just , pick the best tool and let it go. Managing prompts and managing context has a much higher skill ceiling than many people realize

No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.

> It's a tiny bit like drawing vs photography and if you look through that lens it's obvious that many drawers might not like photography.

While they share a lot of principles (around composition, poses,...), they are different activities with different output. No one conflates the two. You don't draw and think you're going to capture a moment in time. The intent is to share an observation with the world.

furyofantares · 8 months ago

> No, it's not. It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up). But it's not like GDB or using UNIX as a IDE where you need a whole book to just get started.

The skill floor is something you can pick up in a few minutes and find it useful, yes. I have been spending dedicated effort toward finding the skill ceiling and haven't found it.

I've picked up lots of skills in my career, some of which were easy, but some of which required dedicated learning, or practice, or experimentation. LLM-assisted coding is probably in the top 3 in terms of effort I've put into learning it.

I'm trying to learn the right patterns to use to keep the LLM on track and keeping the codebase in check. Most importantly, and quite relevant to OP, I'd like to use LLMs to get work done much faster while still becoming an expert in the system that is produced.

Finding the line has been really tough. You can get a LOT done fast without this requirement, but personally I don't want to work anywhere that has a bunch of systems that nobody's an expert in. On the flip side, as in the OP, you can have this requirement and end up slower by using an LLM than by writing the code yourself.

oxidant · 8 months ago

I do not agree it is something you can pick up in an hour. You have to learn what AI is good at, how different models code, how to prompt to get the results you want.

If anything, prompting well is akin to learning a new programming language. What words do you use to explain what you want to achieve? How do you reference files/sections so you don't waste context on meaningless things?

I've been using AI tools to code for the past year and a half (Github Copilot, Cursor, Claude Code, OpenAI APIs) and they all need slightly different things to be successful and they're all better at different things.

AI isn't a panacea, but it can be the right tool for the job.

viraptor · 8 months ago

> It's something you can pick in a few minutes

You can start in a few minutes, sure. (Also you can start using gdb in minutes) But GP is talking about the ceiling. Do you know which models work better for what kind of task? Do you know what format is better for extra files? Do you know when it's beneficial to restart / compress context? Are you using single prompts or multi stage planning trees? How are you managing project-specific expectations? What type of testing gives better results in guiding the model? What kind of issues are more common for which languages?

Correct prompting these days what makes a difference in tasks like SWE-verified.

sagarpatil · 8 months ago

Yeah, you can’t do sh*t in an hour. I spend a good 6-8 hours every day using Claude Code, and I actually spend an hour every day trying new AI tools, it’s a constant process.

Here’s what my today’s task looks like: 1. Test TRAE/Refact.ai/Zencoder: 70% on SWE verified 2. https://github.com/kbwo/ccmanager: use git tree to manage multiple Claude Code sessions 3. https://github.com/julep-ai/julep/blob/dev/AGENTS.md: Read and implement 4. https://github.com/snagasuri/deebo-prototype: Autonomous debugging agent (MCP) 5. https://github.com/claude-did-this/claude-hub: connects Claude Code to GitHub repositories.

__MatrixMan__ · 8 months ago

It definitely takes more than minutes to discover the ways that your model is going to repeatedly piss you off and set up guardrails to mitigate those problems.

JimDabell · 8 months ago

> It's something you can pick in a few minutes (or an hour if you're using more advanced tooling, mostly spending it setting things up).

This doesn’t give you any time to experiment with alternative approaches. It’s equivalent to saying that the first approach you try as a beginner will be as good as it possibly gets, that there’s nothing at all to learn.

dingnuts · 8 months ago

> You might prefer manual coding, but you might just be bad at AI coding and you might prefer it if you improved at it.

ok but how much am I supposed to spend before I supposedly just "get good"? Because based on the free trials and the pocket change I've spent, I don't consider the ROI worth it.

qinsig · 8 months ago

Avoid using agents that can just blow through money (cline, roocode, claudecode with API key, etc).

Instead you can get comfortable prompting and managing context with aider.

Or you can use claude code with a pro subscription for a fair amount of usage.

I agree that seeing the tools just waste several dollars to just make a mess you need to discard is frustrating.

goalieca · 8 months ago

And how often do your prompting skills change as the models evolve.

badsectoracula · 8 months ago

It wont be the hippest of solutions, but you can use something like Devstral Small with a full open source setup to get experimenting with local LLMs and a bunch of tools - or just chat with it with a chat interface. I did pingponged between Devstral running as a chat interface and my regular text editor some time ago to make a toy project of a raytracer [0] (output) [1] (code).

While it wasn't the fanciest integration (nor the best of codegen), it was good enough to "get going" (the loop was to ask the LLM do something, then me do something else in the background, then fix and merge the changed it did - even though i often had to fix stuff[2], sometimes it was less of a hassle than if i had to start from scratch[3]).

It can give you a vague idea that with more dedicated tooling (i.e. something that does automatically what you'd do by hand[4]) you could do more interesting things (combining with some sort of LSP functionality to pass function bodies to the LLM would also help), though personally i'm not a fan of the "dedicated editor" that seems to be used and i think something more LSP-like (especially if it can also work with existing LSPs) would be neat.

IMO it can be useful for a bunch of boilerplate-y or boring work. The biggest issue i can see is that the context is too small to include everything (imagine, e.g., throwing the entire Blender source code in an LLM which i don't think even the largest of cloud-hosted LLMs can handle) so there needs to be some external way to store stuff dynamically but also the LLM to know that external stuff are available, look them up and store stuff if needed. Not sure how exactly that'd work though to the extent where you could -say- open up a random Blender source code file, point to a function, ask the LLM to make a modification, have it reuse any existing functions in the codebase where appropriate (without you pointing them out) and then, if needed, have the LLM also update the code where the function you modified is used (e.g. if you added/removed some argument or changed the semantics of its use).

[0] https://i.imgur.com/FevOm0o.png

[1] https://app.filen.io/#/d/e05ae468-6741-453c-a18d-e83dcc3de92...

[2] e.g. when i asked it to implement a BVH to speed up things it made something that wasn't hierarchical and actually slowed down things

[3] the code it produced for [2] was fixable to do a simple BVH

[4] i tried a larger project and wrote a script that `cat`ed and `xclip`ed a bunch of header files to pass to the LLM so it knows the available functions and each function had a single line comment about what it does - when the LLM wrote new functions it also added that comment. 99% of these oneliner comments were written by the LLM actually.

grogenaut · 8 months ago

how much time did you spend learning your last language to become comfortable with it?

stray · 8 months ago

You're going to spend a little over $1k to ramp up your skills with AI-aided coding. It's dirt cheap in the grand scheme of things.

lexandstuff · 8 months ago

Great article. The other thing that you miss out on when you don't write the code yourself is that sense of your subconscious working for you. Writing code has a side benefit of developing a really strong mental model of a problem, that kinda gets embedded in your neurons and pays dividends down the track, when doing stuff like troubleshooting or deciding on how to integrate a new feature. You even find yourself solving problems in your sleep.

I haven't observed any software developers operating at even a slight multiplier from the pre-LLM days at the organisations I've worked at. I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.

nerevarthelame · 8 months ago

> I think people are getting addicted to not having to expend brain energy to solve problems, and they're mistaking that for productivity.

I think that's a really elegant way to put it. Google Research tried to measure LLM impacts on productivity in 2024 [1]. They gave their subjects an exam and assigned them different resources (a book versus an LLM). They found that the LLM users actually took more time to finish than those who used a book, and that only novices on the subject material actually improved their scores when using an LLM.

But the participants also perceived that they were more accurate and efficient using the LLM, when that was not the case. The researchers suggested that it was due to "reduced cognitive load" - asking an LLM something is easy and mostly passive. Searching through a book is active and can feel more tiresome. Like you said: people are getting addicted to not having to expend brain energy to solve problems, and mistaking that for productivity.

[1] https://storage.googleapis.com/gweb-research2023-media/pubto...

wiseowise · 8 months ago

You’re twisting results. Just because they took more time doesn’t mean their productivity went down. On the contrary, if you can perform expert task with much less mental resources (which 99% of orgs should prioritize for) then it is an absolute win. Work is extremely mentally draining and soul crushing experience for majority of people, if AI can lower that while maintaining roughly same result with subjects allocating only, say, 25% of their mental energy – that’s an amazing win.

AstroBen · 8 months ago

> not having to expend brain energy to solve problems, and they're mistaking that for productivity

Couldn't this result in being able to work longer for less energy, though? With really hard mentally challenging tasks I find I cap out at around 3-4 hours a day currently

Like imagine if you could walk at running speed. You're not going faster.. but you can do it for way longer so your output goes up if you want it to

Mentlo · 8 months ago

There’s software domains where there’s very little work past solving the business problem. And there’s software domains where once you’ve architected the solution, there isn’t much problem solving to continue, there’s just a long slog of stuff to write.

The later is not making any neuron embedding tradeoff when they hand of the slog to agents.

There’s a lot of software development in that latter category.

waprin · 8 months ago

jumploops · 8 months ago

> It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself, if not more.

As someone who uses Claude Code heavily, this is spot on.

LLMs are great, but I find the more I cede control to them, the longer it takes to actually ship the code.

I’ve found that the main benefit for me so far is the reduction of RSI symptoms, whereas the actual time savings are mostly over exaggerated (even if it feels faster in the moment).

adriand · 8 months ago

Do you have to review the code? I’ll be honest that, like the OP theorizes, I often just spot review it. But I also get it to write specs (often very good, in terms of the ones I’ve dug into), and I always carefully review and test the results. Because there is also plenty of non-AI code in my projects I didn’t review at all, namely, the myriad open source libraries I’ve installed.

Yes, I’m actually working on an another project with the goal of never looking at the code.

For context, it’s just a reimplementation of a tool I built.

Let’s just say it’s going a lot slower than the first time I built it by hand :)

hatefulmoron · 8 months ago

It depends on what you're doing. If it's a simple task, or you're making something that won't grow into something larger, eyeballing the code and testing it is usually perfect. These types of tasks feel great with Claude Code.

If you're trying to build something larger, it's not good enough. Even with careful planning and spec building, Claude Code will still paint you into a corner when it comes to architecture. In my experience, it requires a lot of guidance to write code that can be built upon later.

The difference between the AI code and the open source libraries in this case is that you don't expect to be responsible for the third-party code later. Whether you or Claude ends up working on your code later, you'll need it to be in good shape. So, it's important to give Claude good guidance to build something that can be worked on later.

cbsmith · 8 months ago

There's an implied assumption here that code you write yourself doesn't need to be reviewed from a context different from the author's.

There's an old expression: "code as if your work will be read by a psychopath who knows where you live" followed by the joke "they know where you live because it is future you".

Generative AI coding just forces the mindset you should have had all along: start with acceptance criteria, figure out how you're going to rigorously validate correctness (ideally through regression tests more than code reviews), and use the review process to come up with consistent practices (which you then document so that the LLM can refer to it).

It's definitely not always faster, but waking up in the morning to a well documented PR, that's already been reviewed by multiple LLMs, with successfully passing test runs attached to it sure seems like I'm spending more of my time focused on what I should have been focused on all along.

Terr_ · 8 months ago

There's an implied assumption here that developers who end up spending all their time reviewing LLM code won't lose their skills or become homicidal. :p

ramraj07 · 8 months ago

That's a great perspective but its possible you're in a thread where no one wants to believe AI actually helps with coding.

hooverd · 8 months ago

Is anybody doing cool hybrid interfaces? I don't actually want to do everything in conversational English, believe it or not.

My workflow is to have spec files (markdown) for any changes I’m making, and then use those to keep Claude on track/pull out of the trees.

Not super necessary for small changes, but basically a must have for any larger refactors or feature additions.

I usually use o3 for generating the specs; also helpful for avoiding context pollution with just Claude Code.

bdamm · 8 months ago

Isn't that what Windsurf or Cursor are?

mleonhard · 8 months ago

I solved my RSI symptoms by keeping my arms warm all the time, while awake or asleep. Maybe that will work for you, too?

My issue is actually due to ulnar nerve compression related to a plate on my right clavicle.

Years of PT have enabled me to work quite effectively and minimize the flare ups :)

Deleted Comment

I always use Claude Code to debug issues, there’s no point in trying to do this yourself when AI can fix it in minutes (easy to verify if you write tests first) o3 with new search can do things in 5 mins that will take me at least 30 mins if I’m very efficient. Say what you want but the time savings is real.

layer8 · 8 months ago

Tests can never verify the correctness of code, they only spot-check for incorrectness.

susshshshah · 8 months ago

How do you know what tests to write if you don’t understand the code?

> What I think happens is that these people save time because they only spot review the AI generated code, or skip the review phase altogether, which as I said above would be a deal breaker for me.

In my experience it's that they dump the code into a pull request and expect me to review it. So GenAI is great if someone else is doing the real work.

anelson · 8 months ago

I’ve experienced this as well. If management is not competent they can’t tell (or don’t want to hear) when a “star” performer is actually a very expensive wrapper around a $20/mo cursor subscription.

Unlike the author of the article I do get a ton of value from coding agents, but as with all tools they are less than useless when wielded incompetently. This becomes more damaging in an org that already has perverse incentives which reward performative slop over diligent and thoughtful engineering.

Git blame can do a lot in those situations. Find the general location of the bug, then assign everyone that has touched it to the ticket.

I don't understand this, the buck stops with the PR submitter. If they get repeated feedback about their PRs that are just passed-through AI slop, then the team lead or whatever should give them a stern talking to.

pera · 8 months ago

That would be a reasonable thing to do, unfortunately this doesn't always happen. Say for example that your company is quite behind schedule and decides to pay some cheap contractors to work on anything that doesn't require domain expertise: In 2025 these cheap contractors will 100% vibe code their way through their assigned tickets. They will open PRs that look "nearly there" and basically hope for all green checks in your CI/CD pipeline. If that doesn't happen then they will try to bruteforce^W vibe code the PR for a couple of hours. If it still doesn't pass then claim that the PR is ready but there is something wrong for example with an external component which they can't touch due to contractual reasons...

One of the most bizarre experiences I have had over this past year was dealing with a developer who would screen share a ChatGPT session where they were trying to generate a test payload with a given schema, getting something that didn't pass schema validation, and then immediately telling me that there must be a bug in the validator (from Apache foundation). I was truly out of words.

marssaxman · 8 months ago

So far as I can tell, generative AI coding tools make the easy part of the job go faster, without helping with the hard part of the job - in fact, possibly making it harder. Coding just doesn't take that much time, and I don't need help doing it. You could make my coding output 100x faster without materially changing my overall productivity, so I simply don't bother to optimize there.

nsonha · 8 months ago

No software engineer needs any help if they keep working in the same stack and problem domain that they already know front to back after a few years doing the same thing. They wouldn't need any coding tool even. But that a pretty useless thing to say. To each their own.

resource_waste · 8 months ago

I have it write algorithms, explain why my code isnt working, write API calls, or make specific functions.

The entire code? Not there, but with debuggers, I've even started doing that a bit.

Jonovono · 8 months ago

Are you a plumber perhaps?

kevinventullo · 8 months ago

I’m not sure I follow the question. I think of plumbing as being the exact kind of verbose boilerplate that LLM’s are quite good at automating.

In contrast, when I’m trying to do something truly novel, I might spend days with a pen and paper working out exactly what I want to do and maybe under an hour coding up the core logic.

On the latter type of work, I find LLM’s to be high variance with mostly negative ROI. I could probably improve the ROI by developing a better sense of what they are and aren’t good at, but of course that itself is rapidly changing!

worik · 8 months ago

I am.

That is the mental model I have for the work (computer programing) i like to do and am good at.

Plumbing

Not if I can help it, no; I don't have the patience.

tptacek · 8 months ago

I'm fine with anybody saying AI agents don't work for their work-style and am not looking to rebut this piece, but I'm going to take this opportunity to call something out.

The author writes "reviewing code is actually harder than most people think. It takes me at least the same amount of time to review code not written by me than it would take me to write the code myself". That sounds within an SD of true for me, too, and I had a full-time job close-reading code (for security vulnerabilities) for many years.

But it's important to know that when you're dealing with AI-generated code for simple, tedious, or rote tasks --- what they're currently best at --- you're not on the hook for reading the code that carefully, or at least, not on the same hook. Hold on before you jump on me.

Modern Linux kernels allow almost-arbitrary code to be injected at runtime, via eBPF (which is just a C program compiled to an imaginary virtual RISC). The kernel can mostly reliably keep these programs from crashing the kernel. The reason for that isn't that we've solved the halting problem; it's that eBPF doesn't allow most programs at all --- for instance, it must be easily statically determined that any backwards branch in the program runs for a finite and small number of iterations. eBPF isn't even good at determining that condition holds; it just knows a bunch of patterns in the CFG that it's sure about and rejects anything that doesn't fit.

That's how you should be reviewing agent-generated code, at least at first; not like a human security auditor, but like the eBPF verifier. If I so much as need to blink when reviewing agent output, I just kill the PR.

If you want to tell me that every kind of code you've ever had to review is equally tricky to review, I'll stipulate to that. But that's not true for me. It is in fact very easy to me to look at a rote recitation of an idiomatic Go function and say "yep, that's what that's supposed to be".

sensanaty · 8 months ago

But how is this a more efficient way of working? What if you have to have it open 30 PRs before 1 of them is acceptable enough to not outright ignore? It sounds absolutely miserable, I'd rather review my human colleague's work because in 95% of cases I can trust that it's not garbage.

The alternative where I boil a few small lakes + a few bucks in return for a PR that maybe sometimes hopefully kinda solves the ticket sounds miserable. I simply do not want to work like that, and it doesn't sound even close to efficient or speedier or anything like that, we're just creating extra work and extra waste for literally no reason other than vague marketing promises about efficiency.

kasey_junk · 8 months ago

If you get to 2 or 3 and it hasn’t done what you want you fall back to writing it yourself.

But in my experience this is _signal_. If the ai cant get to it with minor back and forth then something needs work, your understanding, the specification, the tests, your code factoring etc.

The best case scenario is your agent one shots the problem. But close behind that is that your agent finds a place where a little cleanup makes everybody’s life easier you, your colleagues and the bot. And your company is now incentivized to invest in that.

The worse case is you took the time to write 2 prompts that didn’t work.

smaudet · 8 months ago

I guess my challenge is that "if it was a rote recitation of an idiomatic go function", was it worth writing?

There is a certain, style, lets say, of programming, that encourages highly non re-usable code that is both at once boring and tedious, and impossible to maintain and thus not especially worthwhile.

The "rote code" could probably have been expressed, succinctly, in terms that border on "plain text", but with more rigueur de jour, with less overpriced, wasteful, potentially dangerous models in-between.

And yes, machines like the eBPF verifier must follow strict rules to cut out the chaff, of which there is quite a lot, but it neither follows that we should write everything in eBPF, nor does it follow that because something can throw out the proverbial "garbage", that makes it a good model to follow...

Put another way, if it was that rote, you likely didn't need nor benefit from the AI to begin with, a couple well tested library calls probably sufficed.

sesm · 8 months ago

I would put it differently: when you already have a mental model of what the code is supposed to do and how, then reviewing is easy: just check that the code conforms to that model.

With an arbitrary PR from a colleague or security audit, you have to come up with mental model first, which is the hardest part.

Yes. More things should be rote recitations. Rote code is easy to follow and maintain. We get in trouble trying to be clever (or DRY) --- especially when we do it too early.

Important tangential note: the eBPF verifier doesn't "cut out the chaff". It rejects good, valid programs. It does not care that the programs are valid or good; it cares that it is not smart enough to understand them; that's all that matters. That's the point I'm making about reviewing LLM code: you are not on the hook for making it work. If it looks even faintly off, you can't hurt the LLM's feelings by killing it.

kenjackson · 8 months ago

I can read code much faster than I can write it.

This might be the defining line for Gen AI - people who can read code faster will find it useful and those that write faster then they can read won’t use it.

globnomulous · 8 months ago

> I can read code much faster than I can write it.

I have known and worked with many, many engineers across a wide range of skill levels. Not a single one has ever said or implied this, and in not one case have I ever found it to be true, least of all in my own case.

I don't think it's humanly possible to read and understand code faster than you can write and understand it to the same degree of depth. The brain just doesn't work that way. We learn by doing.

autobodie · 8 months ago

I think that's wrong. I only have to write code once, maybe twice. But when using AI agents, I have to read many (5? 10? I will always give up before 15) PRs before finding one close enough that I won't have to rewrite all of it. This nonsense has not saved me any time, and the process is miserable.

I also haven't found any benefit in aiming for smaller or larger PRs. The aggregare efficiency seems to even out because smaller PRs are easier to weed through but they are not less likely to be trash.

Why would you review agent generated code any differently than human generated code?

Because you don't care about the effort the agent took and can just ask for a do-over.

monero-xmr · 8 months ago

I mostly just approve PRs because I trust my engineers. I have developed a 6th sense for thousand-line PRs and knowing which 100-300 lines need careful study.

Yes I have been burned. But 99% of the time, with proper test coverage it is not an issue, and the time (money) savings have been enormous.

"Ship it!" - me

theK · 8 months ago

I think this points out the crux of the difference of collaborating with other devs vs collaborating with am AI. The article correctly States that the AI will never learn your preferences or idiosyncrasies of the specific projects/company etc because it effectively is amnesic. You cannot trust the AI the same you trust other known collaborators because you don't have a real relationship with it.

Haha, doing this with AI will bury you in a very deep hole.

112233 · 8 months ago

This is radical and healthy way to do it. Obviously wrong — reject. Obviously right — accept. In any other case — also reject, as non-obvious.

I guess it is far removed from the advertized use case. Also, I feel one would be better off having auto-complete powered by LLM in this case.

bluefirebrand · 8 months ago

> Obviously right — accept.

I don't think code is ever "obviously right" unless it is trivially simple

Auto-complete means having to babysit it.

The more I use this, the longer the LLM will be working before I even look at the output any more than maybe having it chug along on another screen and occasionally glance over.

My shortest runs now usually takes minutes of the LLM expanding my prompt into a plan, writing the tests, writing the code, linting its code, fixing any issues, and write a commit message before I even review things.

I don't find this to be the case. I've used (and hate) autocomplete-style LLM code generation. But I can feed 10 different tasks to Codex in the morning and come back and pick out the 3-4 I think might be worth pursuing, and just re-prompt the 7 I kill. That's nothing like interactive autocomplete, and drastically faster than than I could work without LLM assistance.

For simple tedious or rote tasks, I have templates bound to hotkeys in my IDE. They even come with configurable variable sections that you can fill in afterwards, or base on some highlighted code before hitting the hot key. Also, its free

danieltanfh95 · 8 months ago

AI models are fundamentally trained on patterns from existing data - they learn to recognize and reproduce successful solution templates rather than derive solutions from foundational principles. When faced with a problem, the model searches for the closest match in its training experience rather than building up from basic assumptions and logical steps.

Human experts excel at first-principles thinking precisely because they can strip away assumptions, identify core constraints, and reason forward from fundamental truths. They might recognize that a novel problem requires abandoning conventional approaches entirely. AI, by contrast, often gets anchored to what "looks similar" and applies familiar frameworks even when they're not optimal.

Even when explicitly prompted to use first-principles analysis, AI models can struggle because:

- They lack the intuitive understanding of when to discard prior assumptions

- They don't naturally distinguish between surface-level similarity and deep structural similarity

- They're optimized for confident responses based on pattern recognition rather than uncertain exploration from basics

This is particularly problematic in domains requiring genuine innovation or when dealing with edge cases where conventional wisdom doesn't apply.

Context poisoning, intended or not, is a real problem that humans are able to solve relatively easily while current SotA models struggle.

adastra22 · 8 months ago

So are people. People are trained on existing data and learn to reproduce known solutions. They also take this to the meta level—a scientist or engineer is trained on methods for approaching new problems which have yielded success in the past. AI does this too. I’m not sure there is actually a distinction here..

Of course there is. Humans can pattern match as a means to save time. LLM pattern match as the only mode of communication and “thought”.

Humans are also not as susceptible to context poisoning, unlike llms.

esailija · 8 months ago

There is a difference between extrapolating from just a few examples vs interpolating between trillion examples