I found myself agreeing with quite a lot of this article.
I'm a pretty huge proponent for AI-assisted development, but I've never found those 10x claims convincing. I've estimated that LLMs make me 2-5x more productive on the parts of my job which involve typing code into a computer, which is itself a small portion of that I do as a software engineer.
That's not too far from this article's assumptions. From the article:
> I wouldn't be surprised to learn AI helps many engineers do certain tasks 20-50% faster, but the nature of software bottlenecks mean this doesn't translate to a 20% productivity increase and certainly not a 10x increase.
I think that's an under-estimation - I suspect engineers that really know how to use this stuff effectively will get more than a 0.2x increase - but I do think all of the other stuff involved in building software makes the 10x thing unrealistic in most cases.
Yeah. I just need to babysit it too much. Take copilot, it gives good suggestions and blows me away sometimes with a block of code which is exactly what I'd type. But actively letting it code (at least with gpt4.1 or gpt4o) just doesn't work well enough for me. Half of the time it doesn't even compile, and after fixing that it's just not really correctly working either. I'd expect it to work like a very junior programmer, but it works like a very drunk senior programmer that isn't listening to you very well at all.
> But actively letting it code (at least with gpt4.1 or gpt4o)
It's funny, Github Copilot puts these models in the 'bargin bin' (they are free in 'ask' mode, whereas the other models count against your monthly limit of premium requests) and it's pretty clear why, they seem downright nerfed. They're tolerable for basic questions but you wouldn't use them if price weren't a concern.
Brandwise, I don't think it does OpenAI any favors to have their models be priced as 'worthless' compared to the other models on premium request limits.
With something like Devin, where it integrates directly with your repo and generates documentation based on your project(s), it's much more productive to use as an agent. I can delegate like 4-5 small tasks that would normally take me a full day or two (or three) of context switching and mental preparation, and knock them out in less than a day because it did 50-80% of the work, leaving only a few fixes or small pivot for me to wrap them up.
This alone is where I get a lot of my value. Otherwise, I'm using Cursor to actively solve smaller problems in whatever files I'm currently focused on. Being able to refactor things with only a couple sentences is remarkably fast.
The more you know about your language's features (and their precise names), and about higher-level programming patterns, the better time you'll have with LLMs, because it matches up with real documentation and examples with more precision.
>I'd expect it to work like a very junior programmer, but it works like a very drunk senior programmer that isn't listening to you very well at all.
Best analogy I've ever heard and it's completely accurate. Now, back to work debugging and finishing a vibe coded application I'm being paid to work on.
I think there are three factors to this: 1. What to code (longer, more specific prompts are better but take longer to write), and 2. How to code it (specify languages, libraries, APIs, etc.) And if you're trying to write code that uses a newer version of a library that works differently from what's most commonly documented, it's a long uphill battle of constantly reminding the LLM of the new changes.
If you're not specific enough, it will definitely spit out a half-baked pseudocode file where it expects you to fill in the rest. If you don't specify certain libraries, it'll use whatever is featured in the most blogspam. And if you're in an ecosystem that isn't publicly well-documented, it's near useless.
Two other observations I've found working with ChatGPT and Copilot:
First, until I can re-learn boundaries, they are a fiasco for work-life balance. It's way too easy to have a "hmm what if X" thought late at night or first thing in the morning, pop off a quick ticket from my phone, assign to Copilot, and then twenty minutes later I'm lying in bed reviewing a PR instead of having a shower, a proper breakfast, and fully entering into work headspace.
And on a similar thread, Copilot's willingness to tolerate infinite bikeshedding and refactoring is a hazard for actually getting stuff merged. Unlike a human colleague who loses patience after a round or two of review, Copilot is happy to keep changing things up and endlessly iterating on minutiae. Copilot code reviews are exhausting to read through because it's just so much text, so much back and forth, every little change with big explanations, acknowledgments, replies, etc.
if I want to throw a shuriken abiding to some artificial, magic Magnus force like in the movie wanted, both chatGpt and Claude let me down, using pygame. what if I wanted c-level performance or if I wanted to use zig? burp.
It works like the average Microsoft employee, like some doped version of an orange wig wearer who gets votes because his daddys kept the population as dumb as it gets after the dotcom x Facebook era. in essence, the ones to be disappointed by are the Chan-Zuckerbergs of our time. there was a chance, but there also was what they were primed for
To date, I've not been able to effectively use Copilot in any projects.
The suggestions were always unusably bad. The /fix were always obviously and straight up false unless it was a super silly issue.
Claude Code with Opus model on the other hand was mind-blowing to me and made me change my mind on almost everything wrt my opinion of LLMs for coding.
You still need to grow the skill of how to build the context and formulate the prompt, but the buildin execution loop is a complete game changer and I didn't realize that until I actually used it effectively on a toy project myself.
MCP in particular was another thing I always thought was massively over hyped, until I actually started to use some in the same toy project.
Frankly, the building blocks already exist at this point to make a vast majority of all jobs redundant (and I'm thinking about all grunt work office jobs, not coding in particular). The tooling still need to be created, so I'm not seeing a short term realization (<2 yrs), but medium term (5+yrs)?
You should expect most companies to let people go at staggering numbers, with only small amounts of highly skilled people left to administer the agents
(1) for my day job, it doesn't make me super productive with creation, but it does help with discovery, learning, getting myself unstuck, and writing tedious code.
(2) however, the biggest unlock is it makes working on side projects __immensely__ easier. Before AI I was always too tired to spend significant time on side projects. Now, I can see my ideas come to life (albeit with shittier code), with much less mental effort. I also get to improve my AI engineering skills without the constraint of deadlines, data privacy, tool constraints etc..
2 heavily resonates with me. Simon Wilson made the point early on that AI makes him more ambitious with his side projects, and I heavily agree. Suddenly lots of things that seemed more or less un-feasible are now not only do-able, but can actually meet or exceed your own assumptions for them.
Being able to sit down after a long way of work and ask an AI model to implement some bug or feature on something while you relax and _not_ type code is a major boon. It is able to immediately get context and be productive even when you are not.
> (1) for my day job, it doesn't make me super productive with creation, but it does help with discovery, learning, getting myself unstuck, and writing tedious code
I hear this take a lot but does it really make that much of an improvement over what we already had with search engines, online documentation and online Q&A sites?
If my work involves doing a bit of tooling and improve the testing and documenting them, I find myself having much lesser resistance and I'm rather happy to give it off to an AI agent.
I haven't begun doing side projects or projects for self, yet. But I did go down the road of finding out what would be needed to do something I wished existed. It was much easier to explore and understand the components and I might have a decent chance at a prototype.
The alternative to this would have been to ask people around or formulate extensively researched questions for online forums, where I'd expect to get half cryptic answers (and a jibe at my ignorance every now and then) at a pace that I would take years before I had something ready.
I see the point for AI as a prototyping and brainstorming tool. But I doubt we are at a point where I would be comfortable pushing changes to a production environment without giving 3x the effort in reviewing. Since there's a chance of the system hallucinating, I have a genuine fear that it would seem accurate, but what it would do is something really really stupid.
#2 is the reason I keep paying for Claude Code Pro.
For 20 a month I can get my stupid tool and utility ideas from "it would be cool if I could..." to actual "works well enough for me" -tools in an evening - while I watch my shows at the same time.
After a day at work I don't have the energy to start digging through, say, OpenWeather's latest 3.0 API and its nuances and how I can refactor my old code to use the new API.
Claude did it in maybe one episode of What We Do in the Shadows :D I have a hook that makes my computer beep when Claude is done or pauses for a question, so I can get back, check what it did and poke it forward.
#2 I expect to wind up as a huge win professionally as well. It lowers the investment for creating an MVP or experimental/exploratory project from weeks to hours or days. That ability to try things that might have been judged too risky for a team previously will be pretty amazing.
I do also believe that those who are often looked at or referred to as 10x engineers will maybe only see a marginal productivity increase.
The smartest programmer I know is so impressive mainly for two reasons: first, he seems to have just an otherworldly memory and seems to kind of have absolutely every little feature and detail of the programming languages he uses memorized. Second, his real power is really in cognitive ability, or the ability to always quickly and creatively come up with the smartest and most efficient yet elegant and clean solution to any given problem. Of course somewhat opinionated but in a good way. Funnily he often wouldn't know the academic/common name for some algorithm he arrived at but it just happened to be what made sense to him and he arrived at it independently. Like a talented musician with perfect pitch who can't read notation or doesn't know theory yet is 10x more talented than someone who has studied it all.
When I pair program with him, it's evident that the current iteration of AI tools is not as quick or as sharp. You could arrive at similar solutions but you would have to iterate for a very long time. It would actually slow that person down significantly.
However, there is such a big spectrum of ability in this field that I could actually see this increasing for example my productivity by 10x. My background/profession is not in software engineering but when I do it in my free time the perfectionist tendencies make me work very slowly. So for me these AI tools are actually cool for generating the first crappy proof of concepts for my side projects/ideas, just to get something working quickly.
I like the quip that AI raises the floor not the ceiling. I think it helps the bottom 20% perform more like the middle 50% but doesn't do much for people at the top.
AI is strong in different places, and if it keeps on being strong in certain ways then people very soon won't be able to keep up. For example, extreme horizontal knowledge and the ability to digest new information almost instantly. That's not something anyone can do. We don't try to compete against computers on raw calculation, and soon we won't compete on this one either. We simply won't even think to compare.
People keep focusing on general intelligence style capabilities but that is the golden grail. The world could go through multiple revolutions before finding that golden grail, but even before then everything would have changed beyond recognition.
So write an integration over the API docs I just copy-pasted.
Thanks for the comment Simon! This is honestly the first one I've read where it feels like someone actually read the article. I'm totally open to the idea that some people, especially those working on the languages/tools that LLMs are good at, are indeed getting a 2x improvement in certain parts of their job.
Something I have realized about Hacked News is that most of the comments on any given article are from people who are responding to the headline without actually clicking through and reading it!
This is particularly true for headlines like this one which stand alone as statements.
I think even claims of 2-5x are highly suspect. It would imply that if your team is using AI then all else equal they accomplish 2-5 times as much in a quarter. I don't know about you but I'm certainly not seeing this and most people on my team use AI.
[And to those saying we're using it wrong... well I can't argue with something that's not falsifiable]
My company is all in on LLMs and honestly the improvement seems to be like 0.9x to 1.2x depending on the project. None of them are moving at break neck speed and many projects are just as bogged down by complexity as ever. Pretty big (3000+) company with a large mature codebase in multiple languages. For god knows how much money spent on it
10x sounds nice which is probably why it stuck, but it came from actual research which found the difference was larger than 10x - but also they were measuring between best and worst, not best and average as it's used nowadays.
It highly depends on the circumstances. In over 30 years in the industry I met 3 people that were many times more productive than everyone else around them, even more than 10 times. What does this translate to? Well, there are some extraordinary people around, very rare and you cannot count on finding some and, when you find them, it is almost impossible to retain them because management and HR never agree to pay them enough to stay around.
I’ve found I do get small bursts of 10x productivity when trying to prototype an idea - much of the research on frameworks and such just goes away. Of course that’s usually followed by struggling to make a seemingly small change for an hour or two. It seems like the 10x number is just classic engineers underestimating tasks - making estimates based on peak productivity that never materializes.
I have found for myself it helps motivate me, resulting in net productivity gain from that alone. Even when it generates bad ideas, it can get me out of a rut and give me a bias towards action. It also keeps me from procrastinating on icky legacy codebases.
> engineers that really know how to use this stuff effectively
I guess this is still the "caveat" that can keep the hype hopes going. But I've found at a team velocity level, with our teams, where everyone is actively using agentic coding like Claude Code on the daily, we actually didn't see an increase in team velocity yet.
I'm curious to hear anecdotal from other teams, has your team seen velocity increase since it adopted agentic AI?
Same here. I have a colleague that is completely enamored with these agents. Uses them for everything he can, not just coding. Commit messages, opening PRs, Linear tickets, etc. Basically, he uses agents for everything he can. But the productivity gain is just not there. He's about as fast or rather as slow as he was before. And to a degree I think this goes for the whole team. It's the oxymoron of AI: more code, more documentation, more text, more of everything generated than ever, but the effect is that this means more complexity, more PRs to review, more bugs, more stuff to know and understand, ... We are all still learning how to use these agents effectively. And the particular developer's effect can and does multiply as everything else with GenAI. Was he a bit sloppy before, not covering various edge-cases and used quick-and-dirty shortcuts? Then this remains true for the code he produces using agents. And to those, who claim that "by using more agents I will gain 10x productivity" I say please read a certain book about how just adding developers to a project makes it even more delayed. The resemblance of team/project leadership -> developers dynamic is truly uncanny.
I agree. I'm a big fan/proponent of AI assisted development (though nowhere near your amount of experience with it). And I think that 2x-10x speed up can be true, depending on what you mean exactly and what your task is exactly.
This article thinks that most people who say 10x productivity are claiming 10x speedup on end-to-end delivering features. If that's indeed what someone is saying, they're most of the time quite simply wrong (or lying).
But I think some people (like me) aren't claiming that. Of course the end to end product process includes a lot more work than just the pure coding aspect, and indeed none of those other parts are getting a 10x speedup right now.
That said, there are a few cases where this 10x end-to-end is possible. E.g. when working alone, especially on new things but not only - you're skipping a lot of this overhead. That's why smaller teams, even solo teams, are suddenly super interesting - because they are getting a bigger speedup comparatively speaking, and possibly enough of one to be able to rival larger teams.
Programmers are notoriously bad about making estimates. Sure it sped something up 10x, but did you consider those 10 tries using AI that didn't pan out? You're not even breaking even, you are losing time.
My experience with GenAI is that it's a significant improvement to Stack Overflow, and generally as capable as someone hired right out of college.
If I'm using it to remember the syntax or library for something I used to know how to do, it's great.
If I'm using it to explore something I haven't done before, it makes me faster, but sometimes it lies to me. Which was also true of Stack Overflow.
But when I ask it to so something fairly complex on it's own, it usually tips over. I've tried a bunch of tests with a bunch of models, and it never quite gets it right. Sometimes it's minor stuff that I can fix if I bang on it long enough, and sometimes it's a steaming pile that I end up tossing in the garbage.
For example, I've asked it to code me a web-based calculator, or a 3D model of the solar system using WebGL, and none of the models I've tried have been able to do either.
I wonder if a better metric would be developer happiness? Instead of being 2x or 5x more productive, what if we looked at what a developer enjoyed doing and figured out how to use AI for everything else?
> I've estimated that LLMs make me 2-5x more productive on the parts of my job which involve typing code into a computer, which is itself a small portion of that I do as a software engineer.
I think that the key realization is that there are tasks where LLMs excel and might even buy you 10x productivity, whereas some tasks their contribution might even be net negative.
LLM are largely excellent at writing and refactoring unit tests, mainly because their context is very limited (i.e., write a method in a class that calls this specific method of this specific class a specific way and check the output) and their output is very repetitive (i.e., write isolated methods in standalone classes without output that are not called anywhere). They also seem helpful when prompted to add logging. LLMs are also effective in creating greenfield projects, serving as glorified template engines. But when lightly pressed on specific tasks like implementing a cross-domain feature... Their output starts to be at best a big ball of mud.
What will happen is over time this will become the new baseline for developing software.
It will mean we can deliver software faster. Maybe more so than other advances, but it won't fundamentally change the fact that software takes real effort and that effort will not go away, since that effort is much more than just coding this or that function.
I could create a huge list of things that have made developing and deploying quality software easier: linters, static type checkers, code formatters, hot reload, intelligent code completion, distributed version control (i.e., Git), unit testing frameworks, inference schema tools, code from schema, etc. I'm sure others can add dozens of items to that list. And yet there seems to be an unending amount of software to be built, limited only by the people available to build it and an organizations funding to hire those people.
In my personal work, I've found AI-assisted development to make me faster (not sure I have a good estimate for how much faster.) What I've also found is that it makes it much easier to tackle novel problems within an existing solution base. And I believe this is likely to be a big part of the dev productivity gain.
Just an example, lets say we want to use the strangler pattern as part of our modernization approach for a legacy enterprise app that has seen better days. Unless you have some senior devs who are both experienced with that pattern AND experienced with your code base, it can take a lot of trial and error to figure out how to make it work. (As you said, most of our work isn't actually typing code.)
This is where an AI/LLM tool can go to work on understanding the code base and understanding the pattern to create a reference implementation approach and tests. That can save a team of devs many weeks of trial & error (and stress) not to mention guidance on where they will run into roadblocks deep into the code base.
And, in my opinion, this is where a huge portion of the AI-assisted dev savings will come from - not so much writing the code (although that's helpful) but helping devs get to the details of a solution much faster.
It's that googling has always gotten us to generic references and AI gets us those references fit for our solution.
If 10x could be believed, we're long enough into having AI-coding assist that any such company that had gone all in would be head and shoulders above their competitors by now.
And we're not seeing that at all. The companies whose software I use that did announce big AI initiatives 6 months ago, if they really had gotten 10x productivity gain, that'd be 60 months—5 years—worth of "productivity". And yet somehow all of their software has gotten worse.
The other thing is that I don't believe software developers actually "do their best" when writing the code itself, that is, optimize the speed of writing code. Nor do they need to; they know writing the code doesn't take up time, waiting for CI and a code review and that iteration cycle does.
And does an AI agent doing a code review actually reduce that time too? I have doubts. Caveat, I haven't seen it in practice yet.
> I've estimated that LLMs make me 2-5x more productive on the parts of my job which involve typing code into a computer, which is itself a small portion of that I do as a software engineer.
This feels exactly right and is what I’ve thought since this all began.
But it also makes me think maybe there are those that A.I. helps 10x, but more because that code input is actually a very large part of their job. Some coders aren’t doing much design or engineering, just assembly.
Yeah, I hadn't thought about that. If you really are a programmer who gets all of their work assigned to them as detailed specifications maybe you are seeing a 10x boost.
I don't think I've encountered programmer like that in my own career, but I guess they might exist somewhere!
I've basically come to the same 2x to 5x conclusion as you. Problem is that "5x productivity" is really only a small portion of my actual job.
The hardest part of my job is actually understanding the problem space and making sure we're applying the correct solution. Actual coding is probably about 30% of my job.
That means, I'm only looking at something like 30% productivity gain by being 5x as effective at coding.
The thing that I keep wondering about: If the coding part is 2-5x more productive for you, but the stuff around the coding doesn't change... at some point, it'll have to, right? The cost/benefit of a lot of practices (this article talks about code review, which is a big one) changes a lot if coding becomes significantly easier relative to other tasks.
Yes, absolutely. Code used to be more expensive to write, which meant that a lot of features weren't sensible to build - the incremental value they provided wasn't worth the implementation effort.
Now when I'm designing software there are all sorts of things where I'm much less likely to think "nah, that will take too long to type the code for".
Speed of shipping software and pace of writing code are different things. Shipping software like iOS has a <50% component of programming so Amdahl's law caps the end-to-end improvement rather low, assuming other parts of the process stay the same.
At first I thought becoming “10x” meant outputting 10x as much code.
Now that I’m using Claude more as an expensive rubber duck, I’m hoping that I spend more time defining the fundamentals correctly that will lead to a large improvement in outcomes in the long run.
It lets me try things I couldn't commit the time to in the past, like quickly cobbling together a keystroke macro. I can also put together the outline of a plan in a few minutes. So much more can be 'touched' upon usefully.
I completely agree. I saw the claims about 30% increase in dev productivity a while ago and thought how is that possible when most of my job consists of meetings, SARs, threat modeling, etc.
Looking forward to those 20x most productive days out of an LLMs. And what are those most productive days? The ones when you can simplify and delete hundreds of lines of code... :-)
There was a YC video just a few months ago where a bunch of jergoffs sat in a circle and talked about engineers being 10 to 100x as effective as before. Im sure google will bring it up.
I don't doubt that some people are mistaken or dishonest in their self-reports as the article asserts, but my personal experience at least is a firm counterexample.
I've been heavily leaning on AI for an engagement that would otherwise have been impossible for me to deliver to the same parameters and under the same constraints. Without AI, I simply wouldn't have been able to fit the project into my schedule, and would have turned it down. Instead, not only did I accept and fit it into my schedule, I was able to deliver on all stretch goals, put in much more polish and automated testing than originally planned, and accommodate a reasonable amount of scope creep. With AI, I'm now finding myself evaluating other projects to fit into my schedule going forward that I couldn't have considered otherwise.
I'm not going to specifically claim that I'm an "AI 10x engineer", because I don't have hard metrics to back that up, but I'd guesstimate that I've experienced a ballpark 10x speedup for the first 80% of the project and maybe 3 - 5x+ thereafter depending on the specific task. That being said, there was one instance where I realized halfway through typing a short prompt that it would have been faster to make those particular changes by hand, so I also understand where some people's skepticism is coming from if their impression is shaped by experiences like that.
I believe the discrepancy we're seeing across the industry is that prompt-based engineering and traditional software engineering are overlapping but distinct skill sets. Speaking for myself, prompt-based engineering has come naturally due to strong written communication skills (e.g. experience drafting/editing/reviewing legal docs), strong code review skills (e.g. participating in security audits), and otherwise being what I'd describe as a strong "jack of all trades, master of some" in software development across the stack. On the other hand, for example, I could easily see someone who's super 1337 at programming high-performance algorithms and mid at most everything else finding that AI insufficiently enhances their core competency while also being difficult to effectively manage for anything outside of that.
As to how I actually approach this:
* Gemini Pro is essentially my senior engineer. I use Gemini to perform codebase-wide analyses, write documentation, and prepare detailed sprint plans with granular todo lists. Particularly for early stages of the project or major new features, I'll spend a several hours at a time meta-prompting and meta-meta-prompting with Gemini just to get a collection of prompts, documents, and JSON todo lists that encapsulate all of my technical requirements and feedback loops. This is actually harder than manual programming because I don't get the "break" of performing out all the trivial and boilerplate parts of coding; my prompts here are much more information-dense than code.
* Claude Sonnet is my coding agent. For Gemini-assisted sprints, I'll fire Claude off with a series of pre-programmed prompts and let it run for hours overnight. For smaller things, I'll pair program with Claude directly and multitask while it codes, or if I really need a break I'll take breaks in between prompting.
* More recently, Grok 4 through the Grok chat service is my Stack Overflow. I can't rave enough about it. Asking it questions and/or pasting in code diffs for feedback gets incredible results. Sometimes I'll just act as a middleman pasting things back and forth between Grok and Claude/Gemini while multitasking on other things, and find that they've collaboratively resolved the issue. Occasionally, I've landed on the correct solution on my own within the 2 - 3 minutes it took for Grok to respond, but even then the second opinion was useful validation. o3 is good at this too, but Grok 4 has been on another level in my experience; its information is usually up to date, and its answers are usually either correct or at least on the right track.
* I've heard from other comments here (possibly from you, Simon, though I'm not sure) that o3 is great at calling out anti-patterns in Claude output, e.g. its obnoxious tendency to default to keeping old internal APIs and marking them as "legacy" or "for backwards compatibility" instead of just removing them and fixing the resulting build errors. I'll be giving this a shot during tech debt cleanup.
As you can see, my process is very different from vibe coding. Vibe coding is fine for prototyping, on for non-engineers with no other options, but it's not how I would advise anyone to build a serious product for critical use cases.
One neat thing I was able to do, with a couple days' notice, was add a script to generate a super polished product walkthrough slide deck with a total of like 80 pages of screenshots and captions covering different user stories, with each story having its own zoomed out overview of a diagram of thumbnails linking to the actual slides. It looked way better than any other product overview deck I've put together by hand in the past, with the bonus that we've regenerated it on demand any time an up-to-date deck showing the latest iteration of the product was needed. This honestly could be a pretty useful product in itself. Without AI, we would've been stuck putting together a much worse deck by hand, and it would've gotten stale immediately. (I've been in the position of having to give disclaimers about product materials being outdated when sharing them, and it's not fun.)
Anyway, I don't know if any of this will convince anyone to take my word for it, but hopefully some of my techniques can at least be helpful to someone. The only real metric I have to share offhand is that the project has over 4000 (largely non-trivial) commits made substantially solo across 2.5 months on a part-time schedule juggled with other commitments, two vacations, and time spent on aspects of the engagement other than development. I realize that's a bit vague, but I promise that it's a fairly complex project which I feel pretty confident I wouldn't have been capable of delivering in the same form on the same schedule without AI. The founders and other stakeholders have been extremely satisfied with the end result. I'd post it here for you all to judge, but unfortunately it's currently in a soft launch status that we don't want a lot of attention on just yet.
There’s something ironic here. For decades, we dreamed of semi-automating software development. CASE tools, UML, and IDEs all promised higher-level abstractions that would "let us focus on the real logic."
Now that LLMs have actually fulfilled that dream — albeit by totally different means — many devs feel anxious, even threatened. Why? Because LLMs don’t just autocomplete. They generate. And in doing so, they challenge our identity, not just our workflows.
I think Colton’s article nails the emotional side of this: imposter syndrome isn’t about the actual 10x productivity (which mostly isn't real), it’s about the perception that you’re falling behind. Meanwhile, this perception is fueled by a shift in what “software engineering” looks like.
LLMs are effectively the ultimate CASE tools — but they arrived faster, messier, and more disruptively than expected. They don’t require formal models or diagrams. They leap straight from natural language to executable code. That’s exciting and unnerving. It collapses the old rites of passage. It gives power to people who don’t speak the “sacred language” of software. And it forces a lot of engineers to ask: What am I actually doing now?
I now understand what artists felt when seeing stable diffusion images - AI code is often just wrong - not in the moral sense, but it contains tons of bugs, weirdness, excess and peculiarities you'd never be happy to see in a real code base.
Often getting rid of all of this, takes comparable amount of time as doing the job in the first place.
Now I can always switch to a different model, increase the context, prompt better etc. but I still feel that actual good quality AI code is just out of arms reach, or when something clicks, and the AI magically starts producing exactly what I want, that magic doesn't last.
Like with stable diffusion, people who don't care as much or aren't knowledgeable enough to know better, just don't get what's wrong with this.
A week ago, I received a bug ticket claiming one of the internal libs i wrote didn't work. I checked out the reporter's code, which was full of weird issues (like the debugger not working and the typescript being full of red squiggles), and my lib crashed somewhere in the middle, in some esoteric minified js.
When I asked the guy who wrote it what's going on, he admitted he vibe coded the entire project.
The comparison to art is apt. Generated art gets the job done for most people. It's good enough. Maybe it's derivative, maybe there are small inaccuracies, but it is available instantly for free and that's what matters most. Same with code, to many people.
And the knock-on effect is that there is less menial work. Artists are commissioned less for the local fair, their friend's D&D character portrait, etc. Programmers find less work building websites for small businesses, fixing broken widgets, etc.
I wonder if this will result in fewer experts, or less capable ones. As we lose the jobs that were previously used to hone our skills will people go out of their way to train themselves for free or will we just regress?
> When I asked the guy who wrote it what's going on, he admitted he vibe coded the entire project.
This really irritates me. I’ve had the same experience with teammates’ pull requests they ask me to review. They can’t be bothered to understand the thing, but then expect you to do it for them. Really disrespectful.
At same time, there's also a huge of annoying Tech-brothers constantly shouting at artists something like, 'Your work was never valuable to begin with; why can't I copy your style? You're nothing but another matrix.'
You miss the fundamental constraint. The bottleneck in software development was never typing speed or generation, but verification and understanding.
Even if LLMs worked perfectly without hallucinations (they don't and might never), a conscientious developer must still comprehend every line before shipping it. You can't review and understand code 10x faster just because an LLM generated it.
In fact, reviewing generated code often takes longer because you're reverse-engineering implicit assumptions rather than implementing explicit intentions.
The "10x productivity" narrative only works if you either:
- Are not actually reviewing the output properly
or
- Are working on trivial code where correctness doesn't matter.
Real software engineering, where bugs have consequences, remains bottlenecked by human cognitive bandwidth, not code generation speed. LLMs shifted the work from writing to reviewing, and that's often a net negative for productivity.
> Even if LLMs worked perfectly without hallucinations (they don't and might never), a conscientious developer must still comprehend every line before shipping it.
This seems excessive to me. Do you comprehend the machine code output of a compiler?
There's many jobs that can be eliminated with software, but haven't because managers don't want to hire SWEs without proven value. I don't think HN realizes how big that market is.
With AI, the managers will replace their employees with a bunch of code they don't understand, watch that code fail in 3 years, and have to hire SWEs to fix it.
I'd bet those jobs will outnumber the ones initially eliminated by having non-technical people deliver the first iteration.
Many of those jobs will be high-skill/impact because they are necessarily focused on fixing stuff AI can't understand.
I try using an LLM for coding now and then, and tried again today with giving a model dedicated to coding a rather straight forward prompt and task.
The names all looked right, the comments were descriptive, it has test cases demonstrating the code work. It looks like something I'd expect a skilled junior or a senior to write.
The thing is, the code didn't work right, and the reasons it didn't work were quite subtle. Nobody would have fixed it without knowing how to have done it in the first place, and it took me nearly as long to figure out why as if I'd just written it myself in the first place.
I could see it being useful to a junior who hasn't solved a particular problem before and wanted to get a starting point, but I can't imagine using it as-is.
Nor do they produce those (do they?). That is what I would like to see. Formal models and diagrams are not needed to produce code. Their point is that they allow us to understand code and to formalize what we want it to do. That's what I'm hoping AI could do for me.
And while I don't categorically object to AI tools, I think your selling objections to them short.
It's completely legitimate to want an explainable/comprehendable/limited-and-defined tool rather than a "it just works" tool. Ideally, this puts one in an "I know its right" position rather than a "I scanned it and it looks generally right and seems to work" position.
I think if you're paying any attention to the state of the world, you can see labor is getting destroyed by capital - bad wages, worse working conditions including more surveillance, metrics everywhere, immoral companies, short contracts and unstable companies/career paths, increasing monopolization and consolidation of power. We were so insulated from this for so long that it's easy to not really grasp how bad things are for most workers. Now the precarity of our situation is dawning on us.
It kills the magic of coding for sure. The thing is. Now with everyone doing it, you get a ton of slop. Computing’s become saturated as hell. We don’t even need more code as it is. Before LLMs you could pretty much find what you needed on github… Now it’s even worse.
In many ways this feels like average software engineers telling on themselves. If you know the tech you're building, and you're good at splitting up your work, then you know ahead of time where the complexity is and you can tell the AI what level of granularity to build at. AI isn't magic; there is an upper limit to the complexity of a program that e.g. Sonnet 4 can write at once. If you can grok that limit, and you can grok the tech of your project, then you can tell the AI to build individual components that stay below that threshold. That works really well.
This is tautological. If you keep instructions dumbed-down enough for AI to work well, it will work well.
The problem is that AI needs to be spoon-fed overly detailed dos and donts, and even then the output can't be trusted without carefully checking it. It's easy to reach a point where breaking down the problem into pieces small enough for AI to understand takes more work than just writing the code.
AI may save time when it generates the right thing on the first try, but that's a gamble. The code may need multiple rounds of fixups, or end up needing a manual rewrite anyway, after wasting time and effort on instructing the AI. The ceiling of AI capabilities is very uneven and unpredictable.
Even worse, the AI can confidently generate code that looks superficially correct, but has subtle bugs/omissions/misinterpretations that end up costing way more time and effort than the AI saved. It has uncanny ability to write nicely structured, well-commented code that is just wrong.
I made an STT tool (guess who wrote it for me) and have a bluetooth mic. I spend 10 minutes pacing and telling the AI what I need it to build, and how to build it. Then it goes off and builds it, and meanwhile I go to the next Claude Code instance on a different project, and do the same thing there. Then do the same for a third, and maybe by that time the first is ready for more direction. Depending on how good you are with context switching and quickly designing complex systems and communicating those designs, you can get a whole lot done in parallel. The problems you're describing can be solved, if you're careful and detailed.
It's a brave, weird and crazy new world. "The future is now, old man."
The point is that good software engineers are good at doing the "hard part". So good that they have a backlog of "trivial" typing tasks. In a well functioning organization they would hand off the backlog of trivial parts to less experienced engineers, who might be herded by a manager. Now we don't need the less experienced engineers or the manager to herd them.
It can be, but if you're familiar with what you're working with and have experience with other systems that have transferrable knowledge, again, it can be an advantage.
I was surprised with claude code I was able to get a few complex things done that I had anticipated to be a few weeks to uncover, stitch together and get moving.
Instead I pushed Claude to consistently present the correct udnerstanding of the problem, strucutre, approach to solving things, and only after that was OK, was it allowed to propose changes.
True to it's shiny things corpus, it will over complicate things because it hasn't learned that less is more. Maybe that reflects the corpus of the average code.
Looking at how folks are setting up their claude.md and agents can go a long way if you haven't had a chance yet.
The implication is that I'm finding it straightforward to handle the issues that other people here seem to consider insurmountable. And they seem to be directly related to how good one is at building software.
might it not be the other way round? For all we know its mediocre devs who are relishing the prospect of doing jack shit all day and still being able to submit some auto generated PRs. Being "amazed" at what it produces when someone with higher standards might be less than amazed.
I find it impossible to work out who to trust on the subject, given that I'm not working directly with them, so remain entirely on the fence.
Sure, absolutely. But that's the hard part of software engineering. The dream is composing software from small components and adding new features by just writing more small components.
But nobody has ever managed to get there despite decades of research and work done in this area. Look at the work of Gerald Sussman (of SICP fame), for example.
So all you're saying is it makes the easy bit easier if you've already done, and continue to do, the hard bit. This is one of the points made in TFA. You might be able to go 200mph in a straight line, but you always need to slow down for the corners.
Of course there is an upper limit for AI. There's an upper limit for humans too.
What you need is just boring project management. Have a proper spec, architecture and tasks split into manageable chunks with enough information to implement them.
Then you just start watching TV and say "implement github issue #42" to Claude and it'll get on with it.
But if you say "build me facebook" and expect a shippable product, you'll have a bad time.
I agree, and the fact that in their list of scenarios that cause these not actually mentioning some people actually are 10x definitely points to them not being self-aware.
I tried doing all of my work for ~1 month with copilot/claude. It didn’t cause a ton of atrophy, because it didn’t work - I couldn’t avoid actually getting involved in rewriting code
I thought this would be another AI hate article, but it made some great points.
One thing that AI has helped me with is finding pesky bugs. I mainly work on numerical simulations. At one point I was stuck for almost a week trying to figure out why my simulation was acting so strange. Finally I pulled up chatgpt, put some of my files into the context and wrote a prompt explaining the strange behavior and what I thought might be happening. In a few seconds it figured out that I had improperly scaled one of my equations. It came down to a couple missing parentheses, and once I fixed it the simulation ran perfectly.
This has happened a few times where AI was easily able to see something I was overlooking. Am I a 10x developer now that I use AI? No... but when used well, AI can have a hugely positive impact on what I am able to get done.
I don't consider myself a 10x engineer. The number one thing that I've realized makes me more productive than other engineers at my company is thinking through system design and business needs with patterns that don't take badly written product tickets literally.
What I've seen with AI is that it does not save my coworkers from the pain of overcomplicating simple things that they don't really think through clearly. AI does not seem to solve this.
I don't consider myself a 2x engineer; my company tells me that by not paying me 2x vs my colleagues, even if I know (and others believe that too) I deliver more than 2x their output.
Counter: you are looking at it wrong. You can get work done in 1/2 of the time it used to. Now you got 1/2 of the day to just mess around. Socialize or network. It’s not necessarily that you’re producing 2x.
The first red flag there is "2x their output". You can find many an anecdote where a good engineer produced better solution in fewer lines of code (or sometimes, by removing code — the holy grail).
So always aim for outcomes, not output :)
At my company, we did promote people quickly enough that they are now close to double their salaries when they started a year or so ago, due to their added value as engineers in the team. It gets tougher as they get into senior roles, but even there, there's quite a bit of room for differentiation.
Additionally, since this is a market, you should not even expect to be paid twice for 2x value provided — then it makes no difference to a company if they get two 1x engineers instead, and you are really not that special if you are double the cost. So really, the "fair" value is somewhere in between: 1.5x to equally reward both parties, or leaning one way or the other :)
> What I've seen with AI is that it does not save my coworkers from the pain of overcomplicating simple things that they don't really think through clearly. AI does not seem to solve this.
100%. The biggest challenge with software is not that it’s too hard to write, but that it’s too easy to write.
I'm skeptical of the 10x claims for different reasons than the author focuses on. The productivity gains might be real for individual tasks, but they're being measured wrong.
Most of the AI productivity stories I hear sound like they're optimizing for the wrong metric. Writing code faster doesn't necessarily mean shipping better products faster. In my experience, the bottleneck is rarely "how quickly can we type characters into an editor" - it's usually clarity around requirements, decision-making overhead, or technical debt from the last time someone optimized for speed over maintainability.
The author mentions that real 10x engineers prevent unnecessary work rather than just code faster. That rings true to me. I've seen more productivity gains from saying "no" to features or talking teams out of premature microservices(or adopting Kafka :D) than from any coding tool.
What worries me more is the team dynamic this creates. When half your engineers feel like they're supposed to be 10x more productive and aren't, that's a morale problem that compounds. The engineers who are getting solid 20-30% gains from AI (which seems realistic) start questioning if they're doing it wrong.
Has anyone actually measured this stuff properly in a production environment with consistent teams over 6+ months? Most of the data I see is either anecdotal or from artificial coding challenges.
Olympic athletes don't exist because no one at my gym runs that fast.
You are right that typing speed isn't the bottleneck, but wrong about what AI actually accelerates. The 10x engineers aren't typing faster they're exploring 10 different architectural approaches in the time it used to take to try one, validating ideas through rapid prototyping, automating the boring parts to focus on the hard decisions.
You can't evaluate a small sample size of people who are not exploiting the benefits well and come to an accurate assessment of the utility of a new technology.
This article sets a ludicrous bar ("10x"), then documents the author's own attempt over some indeterminate time to clear that bar. As a result, the author has classified all the AI-supporters in the industry into three categories: (1) people who are wrong in good faith, (2) people who are selling AI tools, and (3) evil bosses trying to find leverage in programmer anxiety.
That aside: I still think complaining about "hallucination" is a pretty big "tell".
> I still think complaining about "hallucination" is a pretty big "tell".
The conversation around LLMs is so polarized. Either they’re dismissed as entirely useless, or they’re framed as an imminent replacement for software developers altogether.
Hallucinations are worth talking about! Just yesterday, for example, Claude 4 Sonnet confidently told me Godbolt was wrong wrt how clang would compile something (it wasn’t). That doesn’t mean I didn’t benefit heavily from the session, just that it’s not a replacement for your own critical thinking.
Like any transformative tool, LLMs can offer a major productivity boost but only if the user can be realistic about the outcome. Hallucinations are real and a reason to be skeptical about what you get back; they don’t make LLMs useless.
To be clear, I’m not suggesting you specifically are blind to this fact. But sometimes it’s warranted to complain about hallucinations!
That's not what people mean when they bring up "hallucinations". What the author apparently meant was that they had an agent generating Terraform for them, and that Terraform was broken. That's not surprising to me! I'm sure LLMs are helpful for writing Terraform, but I wouldn't expect that agents are at the point of being able to reliably hand off Terraform that actually does anything, because I can't imagine an agent being given permission to iterate Terraform. Now have an agent write Java for you. That problem goes away: you aren't going to be handed code with API calls that literally don't exist (this is what people mean by "hallucination"), because that could wouldn't pass a compile or linter pass.
Hi there! I appreciate your comment, and I remember reading your article about AI and some of the counterarguments to it helped me get over the imposter syndrome I was feeling.
To be clear, I did not classify "all the AI-supporters" as being in those three categories, I specifically said the people posting that they are getting 10x improvements thanks to AI.
Can you tell me about what you've done to no longer have any hallucinations? I notice them particularly in a language like Terraform, the LLMs add properties that do not exist. They are less common in languages like Javascript but still happen when you import libraries that are less common (e.g. DrizzleORM).
Can you help me understand which articles you're referring to? A link to the biggest "AI made me a 10x developer" article you've read would certainly clear this up.
The expectations are higher than reality, but LLMs are quite useful in many circumstances. You can characterize their use by "level of zoom", from "vibe coding" on the high end, to "write this function given its arguments and what it should return" at the low end. The more 'zoomed in' you are, the better it works, in my experience.
Plus there are use-cases for LLMs that go beyond augmenting your ability to produce code, especially for learning new technologies. The yield depends on the distribution of tasks you have in your role. For example, if you are in lots of meetings, or have lots of administrative overhead to push code, LLMs will help less. (Although I think applying LLMs to pull request workflow, commit cleanup and reordering, will come soon).
I'm a pretty huge proponent for AI-assisted development, but I've never found those 10x claims convincing. I've estimated that LLMs make me 2-5x more productive on the parts of my job which involve typing code into a computer, which is itself a small portion of that I do as a software engineer.
That's not too far from this article's assumptions. From the article:
> I wouldn't be surprised to learn AI helps many engineers do certain tasks 20-50% faster, but the nature of software bottlenecks mean this doesn't translate to a 20% productivity increase and certainly not a 10x increase.
I think that's an under-estimation - I suspect engineers that really know how to use this stuff effectively will get more than a 0.2x increase - but I do think all of the other stuff involved in building software makes the 10x thing unrealistic in most cases.
This seems to be the current consensus.
A very similar quote from another recent AI article:
One host compares AI chatbots to “a very smart assistant who has a dozen Ph.D.s but is also high on ketamine like 30 percent of the time.”
https://lithub.com/what-happened-when-i-tried-to-replace-mys...
It's funny, Github Copilot puts these models in the 'bargin bin' (they are free in 'ask' mode, whereas the other models count against your monthly limit of premium requests) and it's pretty clear why, they seem downright nerfed. They're tolerable for basic questions but you wouldn't use them if price weren't a concern.
Brandwise, I don't think it does OpenAI any favors to have their models be priced as 'worthless' compared to the other models on premium request limits.
This alone is where I get a lot of my value. Otherwise, I'm using Cursor to actively solve smaller problems in whatever files I'm currently focused on. Being able to refactor things with only a couple sentences is remarkably fast.
The more you know about your language's features (and their precise names), and about higher-level programming patterns, the better time you'll have with LLMs, because it matches up with real documentation and examples with more precision.
Best analogy I've ever heard and it's completely accurate. Now, back to work debugging and finishing a vibe coded application I'm being paid to work on.
If you're not specific enough, it will definitely spit out a half-baked pseudocode file where it expects you to fill in the rest. If you don't specify certain libraries, it'll use whatever is featured in the most blogspam. And if you're in an ecosystem that isn't publicly well-documented, it's near useless.
First, until I can re-learn boundaries, they are a fiasco for work-life balance. It's way too easy to have a "hmm what if X" thought late at night or first thing in the morning, pop off a quick ticket from my phone, assign to Copilot, and then twenty minutes later I'm lying in bed reviewing a PR instead of having a shower, a proper breakfast, and fully entering into work headspace.
And on a similar thread, Copilot's willingness to tolerate infinite bikeshedding and refactoring is a hazard for actually getting stuff merged. Unlike a human colleague who loses patience after a round or two of review, Copilot is happy to keep changing things up and endlessly iterating on minutiae. Copilot code reviews are exhausting to read through because it's just so much text, so much back and forth, every little change with big explanations, acknowledgments, replies, etc.
if I want to throw a shuriken abiding to some artificial, magic Magnus force like in the movie wanted, both chatGpt and Claude let me down, using pygame. what if I wanted c-level performance or if I wanted to use zig? burp.
It works like the average Microsoft employee, like some doped version of an orange wig wearer who gets votes because his daddys kept the population as dumb as it gets after the dotcom x Facebook era. in essence, the ones to be disappointed by are the Chan-Zuckerbergs of our time. there was a chance, but there also was what they were primed for
The suggestions were always unusably bad. The /fix were always obviously and straight up false unless it was a super silly issue.
Claude Code with Opus model on the other hand was mind-blowing to me and made me change my mind on almost everything wrt my opinion of LLMs for coding.
You still need to grow the skill of how to build the context and formulate the prompt, but the buildin execution loop is a complete game changer and I didn't realize that until I actually used it effectively on a toy project myself.
MCP in particular was another thing I always thought was massively over hyped, until I actually started to use some in the same toy project.
Frankly, the building blocks already exist at this point to make a vast majority of all jobs redundant (and I'm thinking about all grunt work office jobs, not coding in particular). The tooling still need to be created, so I'm not seeing a short term realization (<2 yrs), but medium term (5+yrs)?
You should expect most companies to let people go at staggering numbers, with only small amounts of highly skilled people left to administer the agents
(1) for my day job, it doesn't make me super productive with creation, but it does help with discovery, learning, getting myself unstuck, and writing tedious code.
(2) however, the biggest unlock is it makes working on side projects __immensely__ easier. Before AI I was always too tired to spend significant time on side projects. Now, I can see my ideas come to life (albeit with shittier code), with much less mental effort. I also get to improve my AI engineering skills without the constraint of deadlines, data privacy, tool constraints etc..
Being able to sit down after a long way of work and ask an AI model to implement some bug or feature on something while you relax and _not_ type code is a major boon. It is able to immediately get context and be productive even when you are not.
I hear this take a lot but does it really make that much of an improvement over what we already had with search engines, online documentation and online Q&A sites?
I haven't begun doing side projects or projects for self, yet. But I did go down the road of finding out what would be needed to do something I wished existed. It was much easier to explore and understand the components and I might have a decent chance at a prototype.
The alternative to this would have been to ask people around or formulate extensively researched questions for online forums, where I'd expect to get half cryptic answers (and a jibe at my ignorance every now and then) at a pace that I would take years before I had something ready.
I see the point for AI as a prototyping and brainstorming tool. But I doubt we are at a point where I would be comfortable pushing changes to a production environment without giving 3x the effort in reviewing. Since there's a chance of the system hallucinating, I have a genuine fear that it would seem accurate, but what it would do is something really really stupid.
For 20 a month I can get my stupid tool and utility ideas from "it would be cool if I could..." to actual "works well enough for me" -tools in an evening - while I watch my shows at the same time.
After a day at work I don't have the energy to start digging through, say, OpenWeather's latest 3.0 API and its nuances and how I can refactor my old code to use the new API.
Claude did it in maybe one episode of What We Do in the Shadows :D I have a hook that makes my computer beep when Claude is done or pauses for a question, so I can get back, check what it did and poke it forward.
The smartest programmer I know is so impressive mainly for two reasons: first, he seems to have just an otherworldly memory and seems to kind of have absolutely every little feature and detail of the programming languages he uses memorized. Second, his real power is really in cognitive ability, or the ability to always quickly and creatively come up with the smartest and most efficient yet elegant and clean solution to any given problem. Of course somewhat opinionated but in a good way. Funnily he often wouldn't know the academic/common name for some algorithm he arrived at but it just happened to be what made sense to him and he arrived at it independently. Like a talented musician with perfect pitch who can't read notation or doesn't know theory yet is 10x more talented than someone who has studied it all.
When I pair program with him, it's evident that the current iteration of AI tools is not as quick or as sharp. You could arrive at similar solutions but you would have to iterate for a very long time. It would actually slow that person down significantly.
However, there is such a big spectrum of ability in this field that I could actually see this increasing for example my productivity by 10x. My background/profession is not in software engineering but when I do it in my free time the perfectionist tendencies make me work very slowly. So for me these AI tools are actually cool for generating the first crappy proof of concepts for my side projects/ideas, just to get something working quickly.
People keep focusing on general intelligence style capabilities but that is the golden grail. The world could go through multiple revolutions before finding that golden grail, but even before then everything would have changed beyond recognition.
So write an integration over the API docs I just copy-pasted.
This is particularly true for headlines like this one which stand alone as statements.
[And to those saying we're using it wrong... well I can't argue with something that's not falsifiable]
https://www.construx.com/blog/productivity-variations-among-...
I have found for myself it helps motivate me, resulting in net productivity gain from that alone. Even when it generates bad ideas, it can get me out of a rut and give me a bias towards action. It also keeps me from procrastinating on icky legacy codebases.
I guess this is still the "caveat" that can keep the hype hopes going. But I've found at a team velocity level, with our teams, where everyone is actively using agentic coding like Claude Code on the daily, we actually didn't see an increase in team velocity yet.
I'm curious to hear anecdotal from other teams, has your team seen velocity increase since it adopted agentic AI?
This article thinks that most people who say 10x productivity are claiming 10x speedup on end-to-end delivering features. If that's indeed what someone is saying, they're most of the time quite simply wrong (or lying).
But I think some people (like me) aren't claiming that. Of course the end to end product process includes a lot more work than just the pure coding aspect, and indeed none of those other parts are getting a 10x speedup right now.
That said, there are a few cases where this 10x end-to-end is possible. E.g. when working alone, especially on new things but not only - you're skipping a lot of this overhead. That's why smaller teams, even solo teams, are suddenly super interesting - because they are getting a bigger speedup comparatively speaking, and possibly enough of one to be able to rival larger teams.
If I'm using it to remember the syntax or library for something I used to know how to do, it's great.
If I'm using it to explore something I haven't done before, it makes me faster, but sometimes it lies to me. Which was also true of Stack Overflow.
But when I ask it to so something fairly complex on it's own, it usually tips over. I've tried a bunch of tests with a bunch of models, and it never quite gets it right. Sometimes it's minor stuff that I can fix if I bang on it long enough, and sometimes it's a steaming pile that I end up tossing in the garbage.
For example, I've asked it to code me a web-based calculator, or a 3D model of the solar system using WebGL, and none of the models I've tried have been able to do either.
I think that the key realization is that there are tasks where LLMs excel and might even buy you 10x productivity, whereas some tasks their contribution might even be net negative.
LLM are largely excellent at writing and refactoring unit tests, mainly because their context is very limited (i.e., write a method in a class that calls this specific method of this specific class a specific way and check the output) and their output is very repetitive (i.e., write isolated methods in standalone classes without output that are not called anywhere). They also seem helpful when prompted to add logging. LLMs are also effective in creating greenfield projects, serving as glorified template engines. But when lightly pressed on specific tasks like implementing a cross-domain feature... Their output starts to be at best a big ball of mud.
What will happen is over time this will become the new baseline for developing software.
It will mean we can deliver software faster. Maybe more so than other advances, but it won't fundamentally change the fact that software takes real effort and that effort will not go away, since that effort is much more than just coding this or that function.
I could create a huge list of things that have made developing and deploying quality software easier: linters, static type checkers, code formatters, hot reload, intelligent code completion, distributed version control (i.e., Git), unit testing frameworks, inference schema tools, code from schema, etc. I'm sure others can add dozens of items to that list. And yet there seems to be an unending amount of software to be built, limited only by the people available to build it and an organizations funding to hire those people.
In my personal work, I've found AI-assisted development to make me faster (not sure I have a good estimate for how much faster.) What I've also found is that it makes it much easier to tackle novel problems within an existing solution base. And I believe this is likely to be a big part of the dev productivity gain.
Just an example, lets say we want to use the strangler pattern as part of our modernization approach for a legacy enterprise app that has seen better days. Unless you have some senior devs who are both experienced with that pattern AND experienced with your code base, it can take a lot of trial and error to figure out how to make it work. (As you said, most of our work isn't actually typing code.)
This is where an AI/LLM tool can go to work on understanding the code base and understanding the pattern to create a reference implementation approach and tests. That can save a team of devs many weeks of trial & error (and stress) not to mention guidance on where they will run into roadblocks deep into the code base.
And, in my opinion, this is where a huge portion of the AI-assisted dev savings will come from - not so much writing the code (although that's helpful) but helping devs get to the details of a solution much faster.
It's that googling has always gotten us to generic references and AI gets us those references fit for our solution.
And we're not seeing that at all. The companies whose software I use that did announce big AI initiatives 6 months ago, if they really had gotten 10x productivity gain, that'd be 60 months—5 years—worth of "productivity". And yet somehow all of their software has gotten worse.
And does an AI agent doing a code review actually reduce that time too? I have doubts. Caveat, I haven't seen it in practice yet.
This feels exactly right and is what I’ve thought since this all began.
But it also makes me think maybe there are those that A.I. helps 10x, but more because that code input is actually a very large part of their job. Some coders aren’t doing much design or engineering, just assembly.
I don't think I've encountered programmer like that in my own career, but I guess they might exist somewhere!
The hardest part of my job is actually understanding the problem space and making sure we're applying the correct solution. Actual coding is probably about 30% of my job.
That means, I'm only looking at something like 30% productivity gain by being 5x as effective at coding.
Now when I'm designing software there are all sorts of things where I'm much less likely to think "nah, that will take too long to type the code for".
But of course that’s ridiculous.
10x is intended to symbolize a multiplier. As Microsoft fired that guy, 10 × 0 is still 0.
I'm not sure it is and I'll take it a step further:
Over the course of development, efficiency gains trend towards zero.
AI has a better case for increasing surface area (what an engineer is capable of working on) and effectiveness, but efficiency is a mirage.
Who's making these claims?
I've been heavily leaning on AI for an engagement that would otherwise have been impossible for me to deliver to the same parameters and under the same constraints. Without AI, I simply wouldn't have been able to fit the project into my schedule, and would have turned it down. Instead, not only did I accept and fit it into my schedule, I was able to deliver on all stretch goals, put in much more polish and automated testing than originally planned, and accommodate a reasonable amount of scope creep. With AI, I'm now finding myself evaluating other projects to fit into my schedule going forward that I couldn't have considered otherwise.
I'm not going to specifically claim that I'm an "AI 10x engineer", because I don't have hard metrics to back that up, but I'd guesstimate that I've experienced a ballpark 10x speedup for the first 80% of the project and maybe 3 - 5x+ thereafter depending on the specific task. That being said, there was one instance where I realized halfway through typing a short prompt that it would have been faster to make those particular changes by hand, so I also understand where some people's skepticism is coming from if their impression is shaped by experiences like that.
I believe the discrepancy we're seeing across the industry is that prompt-based engineering and traditional software engineering are overlapping but distinct skill sets. Speaking for myself, prompt-based engineering has come naturally due to strong written communication skills (e.g. experience drafting/editing/reviewing legal docs), strong code review skills (e.g. participating in security audits), and otherwise being what I'd describe as a strong "jack of all trades, master of some" in software development across the stack. On the other hand, for example, I could easily see someone who's super 1337 at programming high-performance algorithms and mid at most everything else finding that AI insufficiently enhances their core competency while also being difficult to effectively manage for anything outside of that.
As to how I actually approach this:
* Gemini Pro is essentially my senior engineer. I use Gemini to perform codebase-wide analyses, write documentation, and prepare detailed sprint plans with granular todo lists. Particularly for early stages of the project or major new features, I'll spend a several hours at a time meta-prompting and meta-meta-prompting with Gemini just to get a collection of prompts, documents, and JSON todo lists that encapsulate all of my technical requirements and feedback loops. This is actually harder than manual programming because I don't get the "break" of performing out all the trivial and boilerplate parts of coding; my prompts here are much more information-dense than code.
* Claude Sonnet is my coding agent. For Gemini-assisted sprints, I'll fire Claude off with a series of pre-programmed prompts and let it run for hours overnight. For smaller things, I'll pair program with Claude directly and multitask while it codes, or if I really need a break I'll take breaks in between prompting.
* More recently, Grok 4 through the Grok chat service is my Stack Overflow. I can't rave enough about it. Asking it questions and/or pasting in code diffs for feedback gets incredible results. Sometimes I'll just act as a middleman pasting things back and forth between Grok and Claude/Gemini while multitasking on other things, and find that they've collaboratively resolved the issue. Occasionally, I've landed on the correct solution on my own within the 2 - 3 minutes it took for Grok to respond, but even then the second opinion was useful validation. o3 is good at this too, but Grok 4 has been on another level in my experience; its information is usually up to date, and its answers are usually either correct or at least on the right track.
* I've heard from other comments here (possibly from you, Simon, though I'm not sure) that o3 is great at calling out anti-patterns in Claude output, e.g. its obnoxious tendency to default to keeping old internal APIs and marking them as "legacy" or "for backwards compatibility" instead of just removing them and fixing the resulting build errors. I'll be giving this a shot during tech debt cleanup.
As you can see, my process is very different from vibe coding. Vibe coding is fine for prototyping, on for non-engineers with no other options, but it's not how I would advise anyone to build a serious product for critical use cases.
One neat thing I was able to do, with a couple days' notice, was add a script to generate a super polished product walkthrough slide deck with a total of like 80 pages of screenshots and captions covering different user stories, with each story having its own zoomed out overview of a diagram of thumbnails linking to the actual slides. It looked way better than any other product overview deck I've put together by hand in the past, with the bonus that we've regenerated it on demand any time an up-to-date deck showing the latest iteration of the product was needed. This honestly could be a pretty useful product in itself. Without AI, we would've been stuck putting together a much worse deck by hand, and it would've gotten stale immediately. (I've been in the position of having to give disclaimers about product materials being outdated when sharing them, and it's not fun.)
Anyway, I don't know if any of this will convince anyone to take my word for it, but hopefully some of my techniques can at least be helpful to someone. The only real metric I have to share offhand is that the project has over 4000 (largely non-trivial) commits made substantially solo across 2.5 months on a part-time schedule juggled with other commitments, two vacations, and time spent on aspects of the engagement other than development. I realize that's a bit vague, but I promise that it's a fairly complex project which I feel pretty confident I wouldn't have been capable of delivering in the same form on the same schedule without AI. The founders and other stakeholders have been extremely satisfied with the end result. I'd post it here for you all to judge, but unfortunately it's currently in a soft launch status that we don't want a lot of attention on just yet.
1.2x increase
Now that LLMs have actually fulfilled that dream — albeit by totally different means — many devs feel anxious, even threatened. Why? Because LLMs don’t just autocomplete. They generate. And in doing so, they challenge our identity, not just our workflows.
I think Colton’s article nails the emotional side of this: imposter syndrome isn’t about the actual 10x productivity (which mostly isn't real), it’s about the perception that you’re falling behind. Meanwhile, this perception is fueled by a shift in what “software engineering” looks like.
LLMs are effectively the ultimate CASE tools — but they arrived faster, messier, and more disruptively than expected. They don’t require formal models or diagrams. They leap straight from natural language to executable code. That’s exciting and unnerving. It collapses the old rites of passage. It gives power to people who don’t speak the “sacred language” of software. And it forces a lot of engineers to ask: What am I actually doing now?
Now I can always switch to a different model, increase the context, prompt better etc. but I still feel that actual good quality AI code is just out of arms reach, or when something clicks, and the AI magically starts producing exactly what I want, that magic doesn't last.
Like with stable diffusion, people who don't care as much or aren't knowledgeable enough to know better, just don't get what's wrong with this.
A week ago, I received a bug ticket claiming one of the internal libs i wrote didn't work. I checked out the reporter's code, which was full of weird issues (like the debugger not working and the typescript being full of red squiggles), and my lib crashed somewhere in the middle, in some esoteric minified js.
When I asked the guy who wrote it what's going on, he admitted he vibe coded the entire project.
And the knock-on effect is that there is less menial work. Artists are commissioned less for the local fair, their friend's D&D character portrait, etc. Programmers find less work building websites for small businesses, fixing broken widgets, etc.
I wonder if this will result in fewer experts, or less capable ones. As we lose the jobs that were previously used to hone our skills will people go out of their way to train themselves for free or will we just regress?
This really irritates me. I’ve had the same experience with teammates’ pull requests they ask me to review. They can’t be bothered to understand the thing, but then expect you to do it for them. Really disrespectful.
Even if LLMs worked perfectly without hallucinations (they don't and might never), a conscientious developer must still comprehend every line before shipping it. You can't review and understand code 10x faster just because an LLM generated it.
In fact, reviewing generated code often takes longer because you're reverse-engineering implicit assumptions rather than implementing explicit intentions.
The "10x productivity" narrative only works if you either:
- Are not actually reviewing the output properly
or
- Are working on trivial code where correctness doesn't matter.
Real software engineering, where bugs have consequences, remains bottlenecked by human cognitive bandwidth, not code generation speed. LLMs shifted the work from writing to reviewing, and that's often a net negative for productivity.
This seems excessive to me. Do you comprehend the machine code output of a compiler?
There's many jobs that can be eliminated with software, but haven't because managers don't want to hire SWEs without proven value. I don't think HN realizes how big that market is.
With AI, the managers will replace their employees with a bunch of code they don't understand, watch that code fail in 3 years, and have to hire SWEs to fix it.
I'd bet those jobs will outnumber the ones initially eliminated by having non-technical people deliver the first iteration.
Many of those jobs will be high-skill/impact because they are necessarily focused on fixing stuff AI can't understand.
The names all looked right, the comments were descriptive, it has test cases demonstrating the code work. It looks like something I'd expect a skilled junior or a senior to write.
The thing is, the code didn't work right, and the reasons it didn't work were quite subtle. Nobody would have fixed it without knowing how to have done it in the first place, and it took me nearly as long to figure out why as if I'd just written it myself in the first place.
I could see it being useful to a junior who hasn't solved a particular problem before and wanted to get a starting point, but I can't imagine using it as-is.
Nor do they produce those (do they?). That is what I would like to see. Formal models and diagrams are not needed to produce code. Their point is that they allow us to understand code and to formalize what we want it to do. That's what I'm hoping AI could do for me.
And while I don't categorically object to AI tools, I think your selling objections to them short.
It's completely legitimate to want an explainable/comprehendable/limited-and-defined tool rather than a "it just works" tool. Ideally, this puts one in an "I know its right" position rather than a "I scanned it and it looks generally right and seems to work" position.
Deleted Comment
- it’s not just X, it’s Y
- emdashes everywhere
The problem is that AI needs to be spoon-fed overly detailed dos and donts, and even then the output can't be trusted without carefully checking it. It's easy to reach a point where breaking down the problem into pieces small enough for AI to understand takes more work than just writing the code.
AI may save time when it generates the right thing on the first try, but that's a gamble. The code may need multiple rounds of fixups, or end up needing a manual rewrite anyway, after wasting time and effort on instructing the AI. The ceiling of AI capabilities is very uneven and unpredictable.
Even worse, the AI can confidently generate code that looks superficially correct, but has subtle bugs/omissions/misinterpretations that end up costing way more time and effort than the AI saved. It has uncanny ability to write nicely structured, well-commented code that is just wrong.
It's a brave, weird and crazy new world. "The future is now, old man."
I was surprised with claude code I was able to get a few complex things done that I had anticipated to be a few weeks to uncover, stitch together and get moving.
Instead I pushed Claude to consistently present the correct udnerstanding of the problem, strucutre, approach to solving things, and only after that was OK, was it allowed to propose changes.
True to it's shiny things corpus, it will over complicate things because it hasn't learned that less is more. Maybe that reflects the corpus of the average code.
Looking at how folks are setting up their claude.md and agents can go a long way if you haven't had a chance yet.
Deleted Comment
I find it impossible to work out who to trust on the subject, given that I'm not working directly with them, so remain entirely on the fence.
But nobody has ever managed to get there despite decades of research and work done in this area. Look at the work of Gerald Sussman (of SICP fame), for example.
So all you're saying is it makes the easy bit easier if you've already done, and continue to do, the hard bit. This is one of the points made in TFA. You might be able to go 200mph in a straight line, but you always need to slow down for the corners.
What you need is just boring project management. Have a proper spec, architecture and tasks split into manageable chunks with enough information to implement them.
Then you just start watching TV and say "implement github issue #42" to Claude and it'll get on with it.
But if you say "build me facebook" and expect a shippable product, you'll have a bad time.
One thing that AI has helped me with is finding pesky bugs. I mainly work on numerical simulations. At one point I was stuck for almost a week trying to figure out why my simulation was acting so strange. Finally I pulled up chatgpt, put some of my files into the context and wrote a prompt explaining the strange behavior and what I thought might be happening. In a few seconds it figured out that I had improperly scaled one of my equations. It came down to a couple missing parentheses, and once I fixed it the simulation ran perfectly.
This has happened a few times where AI was easily able to see something I was overlooking. Am I a 10x developer now that I use AI? No... but when used well, AI can have a hugely positive impact on what I am able to get done.
It’s a rubber duck that’s pretty educated and talks back.
What I've seen with AI is that it does not save my coworkers from the pain of overcomplicating simple things that they don't really think through clearly. AI does not seem to solve this.
Using AI will change nothing in this context.
Deleted Comment
So always aim for outcomes, not output :)
At my company, we did promote people quickly enough that they are now close to double their salaries when they started a year or so ago, due to their added value as engineers in the team. It gets tougher as they get into senior roles, but even there, there's quite a bit of room for differentiation.
Additionally, since this is a market, you should not even expect to be paid twice for 2x value provided — then it makes no difference to a company if they get two 1x engineers instead, and you are really not that special if you are double the cost. So really, the "fair" value is somewhere in between: 1.5x to equally reward both parties, or leaning one way or the other :)
Deleted Comment
100%. The biggest challenge with software is not that it’s too hard to write, but that it’s too easy to write.
Most of the AI productivity stories I hear sound like they're optimizing for the wrong metric. Writing code faster doesn't necessarily mean shipping better products faster. In my experience, the bottleneck is rarely "how quickly can we type characters into an editor" - it's usually clarity around requirements, decision-making overhead, or technical debt from the last time someone optimized for speed over maintainability.
The author mentions that real 10x engineers prevent unnecessary work rather than just code faster. That rings true to me. I've seen more productivity gains from saying "no" to features or talking teams out of premature microservices(or adopting Kafka :D) than from any coding tool.
What worries me more is the team dynamic this creates. When half your engineers feel like they're supposed to be 10x more productive and aren't, that's a morale problem that compounds. The engineers who are getting solid 20-30% gains from AI (which seems realistic) start questioning if they're doing it wrong.
Has anyone actually measured this stuff properly in a production environment with consistent teams over 6+ months? Most of the data I see is either anecdotal or from artificial coding challenges.
You are right that typing speed isn't the bottleneck, but wrong about what AI actually accelerates. The 10x engineers aren't typing faster they're exploring 10 different architectural approaches in the time it used to take to try one, validating ideas through rapid prototyping, automating the boring parts to focus on the hard decisions.
You can't evaluate a small sample size of people who are not exploiting the benefits well and come to an accurate assessment of the utility of a new technology.
Skill is always a factor.
That aside: I still think complaining about "hallucination" is a pretty big "tell".
The conversation around LLMs is so polarized. Either they’re dismissed as entirely useless, or they’re framed as an imminent replacement for software developers altogether.
Hallucinations are worth talking about! Just yesterday, for example, Claude 4 Sonnet confidently told me Godbolt was wrong wrt how clang would compile something (it wasn’t). That doesn’t mean I didn’t benefit heavily from the session, just that it’s not a replacement for your own critical thinking.
Like any transformative tool, LLMs can offer a major productivity boost but only if the user can be realistic about the outcome. Hallucinations are real and a reason to be skeptical about what you get back; they don’t make LLMs useless.
To be clear, I’m not suggesting you specifically are blind to this fact. But sometimes it’s warranted to complain about hallucinations!
To be clear, I did not classify "all the AI-supporters" as being in those three categories, I specifically said the people posting that they are getting 10x improvements thanks to AI.
Can you tell me about what you've done to no longer have any hallucinations? I notice them particularly in a language like Terraform, the LLMs add properties that do not exist. They are less common in languages like Javascript but still happen when you import libraries that are less common (e.g. DrizzleORM).
And I think that sentence is a pretty big tell, so ...
https://www.windowscentral.com/software-apps/sam-altman-ai-w...
https://brianchristner.io/how-cursor-ai-can-make-developers-...
https://thenewstack.io/the-future-belongs-to-ai-augmented-10...
Plus there are use-cases for LLMs that go beyond augmenting your ability to produce code, especially for learning new technologies. The yield depends on the distribution of tasks you have in your role. For example, if you are in lots of meetings, or have lots of administrative overhead to push code, LLMs will help less. (Although I think applying LLMs to pull request workflow, commit cleanup and reordering, will come soon).