Everything around LLMs is still magical and wishful thinking

One thing I find frustrating is that management where I work has heard of 10x productivity gains. Some of those claims even come from early adopters at my work.

But that sets expectation way too high. Partly it is due to Amdahl's law: I spend only a portion of my time coding, and far more time thinking and communicating with others that are customers of my code. Even if does make the coding 10x faster (and it doesn't most of the time) overall my productivity is 10-15% better. That is nothing to sneeze at, but it isn't 10x.

TeMPOraL · 8 months ago

Maybe it's due to a more R&D-ish nature of my current work, but for me, LLMs are delivering just as much gains in the "thinking" part as in "coding" part (I handle the "communicating" thing myself just fine for now). Using LLMs for "thinking" tasks feels similar to how mastering web search 2+ decades ago felt. Search engines enabled access to information provided you know what you're looking for; now LLMs boost that by helping you figure out what you're looking for in the first place (and then conveniently searching it for you, too). This makes trivial some tasks I previously classified as hard due to effort and uncertainty involved.

At this point I'd say about 1/3 of my web searches are done through ChatGPT o3, and I can't imagine giving it up now.

(There's also a whole psychological angle in how having LLM help sort and rubber-duck your half-baked thought makes many task seem much less daunting, and that alone makes a big difference.)

jorl17 · 8 months ago

This, and if you add in a voice mode (e.g. ChatGPT's Advanced Mode), it is perfect for brainstorming.

Once I decide I want to "think a problem through with an LLM", I often start with just the voice mode. This forces me to say things out loud — which is remarkably effective (hear hear rubber duck debugging) — and it also gives me a fundamentally different way of consuming the information the LLM provides me. Instead of being delivered a massive amount of text, where some information could be wrong, I instead get a sequential system where I can stop/pause the LLM/redirect it as soon as something gets me curious or as I find problems with it said.

You would think that having this way of interacting would be limiting, as having a fast LLM output large chunks of information would let you skim through it and commit it to memory faster. Yet, for me, the combination of hearing things and, most of all, not having to consume so much potentially wrong info (what good is it to skim pointless stuff), ensures that ChatGPT's Advanced Voice mode is a great way to initially approach a problem.

After the first round with the voice mode is done, I often move to written-form brainstorming.

seba_dos1 · 8 months ago

From time to time I use an LLM to pretend to research a topic that I had researched recently, to check how much time it would have saved me.

So far, most of the time, my impression was "I would have been so badly mislead and wouldn't even know it until too late". It would have saved me some negative time.

The only thing LLMs can consistently help me with so far is typing out mindless boilerplate, and yet it still sometimes requires manual fixing (but I do admit that it still does save effort). Anything else is hit or miss. The kind of stuff it does help researching with is usually the stuff that's easy to research without it anyway. It can sometimes shine with a gold nugget among all the mud it produces, but it's rare. The best thing is being able to describe something and ask what it's called, so you can then search for it in traditional ways.

That said, search engines have gotten significantly worse for research in the last decade or so, so the bar is lower for LLMs to be useful.

solumunus · 8 months ago

I’m surprised it’s only 1/3rd. 90% of my searches for information start at Perplexity or Claude at this point.

wubrr · 8 months ago

> One thing I find frustrating is that management where I work has heard of 10x productivity gains. Some of those claims even come from early adopters at my work.

Similar situation at my work, but all of the productivity claims from internal early adopters I've seen so far are based on very narrow ways of measuring productivity, and very sketchy math, to put it mildly.

thunky · 8 months ago

> One thing I find frustrating is that management where I work has heard of 10x productivity gains.

That may also be in part because llms are not as big of an accelerant for junior devs as they are for seniors (juniors don't know what is good and bad as well).

So if you give 1 senior dev a souped up llm workflow I wouldn't be too surprised if they are as productive as 10 pre-llm juniors. Maybe even more, because a bad dev can actually produce negative productivity (stealing from the senior), in which case it's infinityx.

Even a decent junior is mostly limited to doing the low level grunt work, which llms can already do better.

Point is, I can see how jobs could be lost, legitimately.

Loughla · 8 months ago

The item lost is pipeline of talent in all of this though.

Precision machining is going through an absolute nightmare where the journeymen or master machinists are aging out of the work force. These were people who originally learned on manual machines, and upgraded to CNC over the years. The pipeline collapsed about 1997.

Now there are no apprentice machinists to replace the skills of the retiring workforce.

This will happen to software developers. Probably faster because they tend to be financially independent WAY sooner than machinists.

louthy · 8 months ago

> overall my productivity is 10-15% better. That is nothing to sneeze at, but it isn't 10x.

It is something to sneeze at if you are 10-15% more expensive to employ due to the cost of the LLM tools. The total cost of production should always be considered, not just throughput.

CharlesW · 8 months ago

> It is something to sneeze at if you are 10-15% more expensive to employ due to the cost of the LLM tools.

Claude Max is $200/month, or ~2% of the salary of an average software engineer.

votepaunchy · 8 months ago

> if you are 10-15% more expensive to employ due to the cost of the LLM tools

How is one spending anywhere close to 10% of total compensation on LLMs?

bravesoul2 · 8 months ago

That's a good insight be because with perfect competition it means you need to share your old salary with an LLM!

datpuz · 8 months ago

It's just another tech hype wave. Reality will be somewhere between total doom and boundless utopia. But probably neither of those.

The AI thing kind of reminds me of the big push to outsource software engineers in the early 2000's. There was a ton of hype among executives about it, and it all seemed plausible on paper. But most of those initiatives ended up being huge failures, and nearly all of those jobs came back to the US.

People tend to ignore a lot of the little things that glue it all together that software engineers do. AI lacks a lot of this. Foreigners don't necessarily lack it, but language barriers, time zone differences, cultural differences, and all sorts of other things led to similar issues. Code quality and maintainability took a nosedive and a lot of the stuff produced by those outsourced shops had to be thrown in the trash.

I can already see the AI slop accumulating in the codebases I work in. It's super hard to spot a lot of these things that manage to slip through code review, because they tend to look reasonable when you're looking at a diff. The problem is all the redundant code that you're not seeing, and the weird abstractions that make no sense at all when you look at it from a higher level.

2muchcoffeeman · 8 months ago

This was what I was saying to a friend the other day. I think anyone vaguely competent that is using LLMs will make the technology look far better than it is.

Management thinks the LLM is doing most of the work. Work is off shored. Oh, the quality sucks when someone without a clue is driving. We need to hire again.

coolKid721 · 8 months ago

On my personal projects it's easily 10x faster if not more in some circumstances. At work where things are planned out months in advanced and I'm working with 5 different teams to figure out the right way to do things for requirements that change 8 times during development? Even just stuff with PR review and making sure other people understand it and can access it. idk sometimes it's probably break even or that 10-15%. It just doesn't work well in some environments and what really makes it flourish (having super high quality architectural planning/designs/standardized patterns etc.) is basically just not viable at anything but the smallest startups and solo projects.

Frankly even just getting engineers to agree upon those super specificized standardized patterns is asking a ton, especially since lots of the things that help AI out are not what they are used to. As soon as you have stuff that starts deviating it can confuse the AI and makes that 10x no longer accessible. Also no one would want to review the PRs I'd make for the changes I do on my "10x" local project... Especially maintaining those standards is already hard enough on my side projects AI will naturally deviate and create noise and the challenge is constructing systems to guide that to make sure nothing deviates (since noise would lead to more noise).

I think it's mostly a rebalancing thing, if you have 1 or a couple like minded engineers who intend to do it they can get that 10x. I do not see that EVER existing in any actual corporate environment or even once you get more then like 4 people tbh.

Ai for middle management and project planning on the other hand...

mlinsey · 8 months ago

I don't disagree with your assessment of the world today, but just 12 months ago (before the current crop of base models and coding agents like Claude Code), even that 10X improvement of writing some-of-the-code wouldn't have been true.

majormajor · 8 months ago

> just 12 months ago (before the current crop of base models and coding agents like Claude Code), even that 10X improvement of writing some-of-the-code wouldn't have been true.

You had to paste more into your prompts back then to make the output work with the rest of your codebase, because there weren't good IDEs/"agents" for it, but you've been able to get really really good code for 90% of "most" day to day SWE since at least OpenAI releasing the ChatGPT-4 API, which was a couple years ago.

Today it's a lot easier to demo low-effort "make a whole new feature or prototype" things than doing the work to make the right API calls back then, but most day to day work isn't "one shot a new prototype web app" and probably won't ever be.

I'm personally more productive than 1 or 2 years ago now because the time required to build the prompts was slower than my personal rate of writing code for a lot of things in my domain, but hardly 10x. It usually one-shots stuff wrong, and then there's a good chance that it'll take longer to chase down the errors than it would've to just write the thing - or only use it as "better autocomplete" - in the first place.

timr · 8 months ago

> I don't disagree with your assessment of the world today, but just 12 months ago (before the current crop of base models and coding agents like Claude Code), even that 10X improvement of writing some-of-the-code wouldn't have been true.

So? It sounds like you're prodding us to make an extrapolation fallacy (I don't even grant the "10x in 12 months" point, but let's just accept the premise for the sake of argument).

Honestly, 12 months ago the base models weren't substantially worse than they are right now. Some people will argue with me endlessly on this point, and maybe they're a bit better on the margin, but I think it's pretty much true. When I look at the improvements of the last year with a cold, rational eye, they've been in two major areas:

  * cost & efficiency

  * UI & integration

So how do we improve from here? Cost & efficiency are the obvious lever with historical precedent: GPUs kinda suck for inference, and costs are (currently) rapidly dropping. But, maybe this won't continue -- algorithmic complexity is what it is, and barring some revolutionary change in the architecture, LLMs are exponential algorithms.

UI and integration is where most of the rest of the recent improvement has come from, and honestly, this is pretty close to saturation. All of the various AI products already look the same, and I'm certain that they'll continue to converge to a well-accepted local maxima. After that, huge gains in productivity from UX alone will not be possible. This will happen quickly -- probably in the next year or two.

Basically, unless we see a Moore's law of GPUs, I wouldn't bet on indefinite exponential improvement in AI. My bet is that, from here out, this looks like the adoption curve of any prior technology shift (e.g. mainframe -> PC, PC -> laptop, mobile, etc.) where there's a big boom, then a long, slow adoption for the masses.

__loam · 8 months ago

It still isn't.

ericmcer · 8 months ago

Its great when they use AI to write a small app “without coding at all” over the weekend and then come in on Monday to brag about it and act baffled that tasks take engineers any time at all.

jppope · 8 months ago

The reports from analysis of open source projects are that its something in the range of 10%-15% productivity gains... so it sounds like you're spot on

smcleod · 8 months ago

That's about right for copilots. It's much higher for agentic coding.

doug_durham · 8 months ago

How much of the communication and meetings are because traditionally code was very expensive and slow to create? How many of those meetings might be streamlined or entirely disappear in the future? In my experience there is are a lot of process around making sure that software on schedule track and that it's doing what it is supposed to do. I think that the software lifecycle is about to be reinvented.

deadbabe · 8 months ago

Wait till they hear about the productivity gains from using vim/neovim.

Your developers still push a mouse around to get work done? Fire them.

tom_m · 8 months ago

Expectations are absolutely way too high. It's going to lead to a lot of toxicity and people being fired. It's really going to suck.

ghuntley · 8 months ago

Canva has seen a 30% productivity uplift - https://fortune.com/2025/06/25/canva-cto-encourages-all-5000...

AI is the new uplift. Embrace and adapt, as a rift is forming (see my talk at https://ghuntley.com/six-month-recap/), in what employers seek in terms of skills from employees.

I'm happy to answer any questions folks may have. Currently AFK [2] vibecoding a brand new programming language [1].

[1] https://x.com/GeoffreyHuntley/status/1940964118565212606 [2] https://youtu.be/e7i4JEi_8sk?t=29722

ofjcihen · 8 months ago

There’s something hilariously Portlandia about making outlandish claims with complete confidence and then plugging your own talk.

CuriouslyC · 8 months ago

And that's with 50% adoption and probably a broad distribution of tool use skill.

eviks · 8 months ago

> The productivity for software engineers is at around 30%

That would be a 70% descent?

abletonlive · 8 months ago

I’m a tech lead and I have maybe 5X output now compared to everybody else under me. Quantified by scoring tickets at a team level. I also have more responsibilities outside of IC work compared to the people under me. At this point I’m asking my manager to fire people that still think llms are just toys because I’m tired of working with people with this poor mindset. A pragmatic engineer continually reevaluates what they think they know. We are at a tipping point now. I’m done arguing with people that have a poor model of reality. The rest of us are trying to compete and get shit done. This isn’t an opinion or a game. It’s business with real life consequences if you fall behind. I’ve offered to share my workflows, prompts, setup. Guess how many of these engineers have taken me up on my offer. 1-2 and the juniors or ones that are very far behind have not.

ofjcihen · 8 months ago

It’s funny. We fired someone with this attitude Thursday. And by this attitude I mean yours.

Not necessarily because of their attitude but because it turns out the software they were shipping was ripe with security issues. Security managed to quickly detect and handle the resulting incident. I can’t say his team were sad to see him go.

Applejinx · 8 months ago

Are you the one at Ableton responsible for it ignoring the renaming of parameter names during the setState part of a Live program? Some of us are already jumping through ridiculous hoops to cover for your… mindset. There's stuff coming up that used to work and doesn't now, like in Live 12. From your response I would guess this is a trend that will hold.

We should not be having to code special 'host is Ableton Live' cases in JUCE just to get your host to work like the others.

Can you please not fire any people who are still holding your operation together?

mattmanser · 8 months ago

You've been doing the big I am about LLMs on HN for most of your last comments.

Everyone else who raises any doubts about LLMs is an idiot and you're 10,000x better than everyone else and all your co-workers should be fired.

But what's absent from all your comments is what you make. Can you tell us what you actually do in your >500k job?

Are you, by any chance, a front-end developer?

Also, a team-lead that can't fire their subordinates isn't a team-lead, they're a number two.

dgfitz · 8 months ago

I will thank God every day I don’t work with you or for you. How toxic.

blibble · 8 months ago

> I’m done arguing with people that have a poor model of reality.

isn't this the entire LLM experience?

nasduia · 8 months ago

A new copypasta is born.

swader999 · 8 months ago

"I’ve offered to share my workflows, prompts" That should all be checked in.

gabrieledarrigo · 8 months ago

Dude, if you are a tech lead, and you measure productivity by scoring tickets, you are doing it pretty badly. I would fire you instead.

solumunus · 8 months ago

You seem completely insufferable and incredibly cringeworthy.

I have to say I’m in the exact camp the author is complaining about. I’ve shipped non trivial greenfield products which I started back when it was only ChatGPT and it was shitty. I started using Claude with copying and pasting back and forth between the web chat and XCode. Then I discovered Cursor. It left me with a lot of annoying build errors, but my productivity was still at least 3x. Now that agents are better and claude 4 is out, I barely ever write code, and I don’t mind. I’ve leaned into the Architect/Manager role and direct the agent with my specialized knowledge if I need to.

I started a job at a demanding startup and it’s been several months and I have still not written a single line of code by hand. I audit everything myself before making PRs and test rigorously, but Cursor + Sonnet is just insane with their codebase. I’m convinced I’m their most productive employee and that’s not by measuring lines of code, which don’t matter; people who are experts in the codebase ask me for help with niche bugs I can narrow in on in 5-30 minutes as someone whose fresh to their domain. I had to lay off taking work away from the front end dev (which I’ve avoided my whole career) because I was stepping on his toes, fixing little problems as I saw them thanks to Claude. It’s not vibe coding - there’s a process of research and planning and perusing in careful steps, and I set the agent up for success. Domain knowledge is necessary. But I’m just so floored how anyone could not be extracting the same utility from it. It feels like there’s two articles like this every week now.

rester324 · 8 months ago

But you just confirmed everything the blogpost claimed.

You didn't share any evidence with us even though you claim unbelievable things.

You even went as far as registering a throwavay account to hide your identity and to make verifying any of your claims impossible.

Your comment feels more like a joke to me

peteforde · 8 months ago

... this from an account with <100 karma.

Look, the person who wrote that comment doesn't need to prove anything to you just because you're hopped up after reading a blog post that has clearly given you a temporary dopamine bump.

People who understand their domains well and are excellent written communicators can craft prompts that will do what we used to spend a week spinning up. It's self-evident to anyone in that situation, and the only thing we see when people demand "evidence" is that you aren't using the tools properly.

We don't need to prove anything because if you are working on interesting problems, even the most skeptical person will prove it to themselves in a few hours.

mccoyb · 8 months ago

Same experience here, probably in a slightly different way of work (PhD student). Was extremely skeptical of LLMs, Claude Code has completely transformed the way I work.

It doesn't take away the requirements of _curation_ - that remains firmly in my camp (partially what a PhD is supposed to teach you! to be precise and reflective about why you are doing X, what do you hope to show with Y, etc -- breakdown every single step, explain those steps to someone else -- this is a tremendous soft skill, and it's even more important now because these agents do not have persistent world models / immediately forget the goal of a sequence of interactions, even with clever compaction).

If I'm on my game with precise communication, I can use CC to organize computation in a way which has never been possible before.

It's not easier than programming (if you care about quality!), but it is different, and it comes with different idioms.

0x696C6961 · 8 months ago

I find that the code quality LLMs output is pretty bad. I end up going through so many iterations that it ends up being faster to do it myself. What I find agents actually useful for is doing large scale mechanical refractors. Instead of trying to figure out the perfect vim macro or AST rewrite script, I'll throw an agent at it.

AnotherGoodName · 8 months ago

I disagree strongly at this point. The code is generally good if the prompt was reasonable at this point but also every test possible is now being written, every ui element has the all required traits, every function has the correct documentation attached, the million little refactors to improve the codebase are being done, etc.

Someone told me ‘ai makes all the little things trivial to do’ and i agree strongly with that. Those many little things are things that together make a strong statement about quality. Our codebase has gone up in quality significantly with ai whereas we’d let the little things slide due to understaffing before.

CharlesW · 8 months ago

> I find that the code quality LLMs output is pretty bad.

That was my experience with Cursor, but Claude Code is a different world. What specific product/models brought you to this generalization?

the__alchemist · 8 months ago

What sort of mechanical refactors?

xoralkindi · 8 months ago

> I audit everything myself before making PRs and test rigorously

How do you audit code from an untrusted source that quickly, LLMs do not have the whole project in their heads and are proned to hallucinate.

On average how long are your prompts and does the LLM also write the unit tests?

hotpotat · 8 months ago

The auditing is not quick. I prefer cursor to claude code because I can review its changes while it’s going more easily and stop and redirect it if it starts to veer off course (which is often, but the cost of doing business). Over time I still gain an understanding of the codebase that I can use to inform my prompts or redirection, so it’s not like I’m blindly asking it to do things. Yes, I do ask it to write unit tests a lot of the time. But I don’t have it spin off and just iterate until the unit tests pass — that’s a recipe for it to do what it needs to do to pass them and is counterproductive. I plan what I want the set of tests to look like and have them write functions in isolation without mentioning tests, and if tests fail I go through a process of auditing the failing code and then the tests themselves to make sure nothing was missed. It’s exactly how I would treat a coworkers code that I review. My prompts range from a few sentences to a few paragraphs, and nowadays I construct a large .md file with a checklist that we iterate on for larger refactors and projects to manage context

bamboozled · 8 months ago

I use Claude code for hours a day, it’s a liar, trust what it does at your own risk.

I personally think you’re sugar coating the experience.

swader999 · 8 months ago

It lies with such enthusiasm though.

CharlesW · 8 months ago

> I use Claude code for hours a day, it’s a liar, trust what it does at your own risk.

The person you're responding to literally said, "I audit everything myself before making PRs and test rigorously".

troupo · 8 months ago

Please re-read the article. Especially the first list of things we don't know about you, your projects etc.

Your specific experience cannot be generalized. And speaking as the author, and who is (as written in the article) literally using these tools everyday.

> But I’m just so floored how anyone could not be extracting the same utility from it. It feels like there’s two articles like this every week now.

This is where we learn that you haven't actually read the article. Because it is very clearly stating, with links, that I am extracting value from these tools.

And the article is also very clearly not about extracting or not extracting value.

hotpotat · 8 months ago

I did read the entire article before commenting and acknowledge that you are using them to some affect, but the line about 50% of the time it works 50% of the time is where I lost faith in the claims you’re making. I agree it’s very context dependent but, in the same way, you did not outline your approaches and practices in how you use AI in your workflow. The same lack of context exists on the other side of the argument.

gabrieledarrigo · 8 months ago

> I started a job at a demanding startup and it’s been several months and I have still not written a single line of code by hand

Damn, this sounds pretty boring.

hotpotat · 8 months ago

It’s not. It’s like I used to play baseball professionally and now I’m a coach or GM building teams and yielding results. It’s a different set of skills. I’m working mostly in idea space and seeing my ideas come to life with a faster feedback loop and the toil is mostly gone

gyomu · 8 months ago

> I’ve shipped non trivial greenfield products

Links please

larve · 8 months ago

Here's maybe the most impressive thing I've vibecoded, where I wanted to track a file write/read race condition in a vscode extension: https://github.com/go-go-golems/go-go-labs/tree/main/cmd/exp...

This is _far_ from web crud.

Otherwise, 99% of my code these days is LLM generated, there's a fair amount of visible commits from my opensource on my profile https://github.com/wesen .

A lot of it is more on the system side of things, although there are a fair amount of one-off webapps, now that I can do frontends that don't suck.

hotpotat · 8 months ago

I’d like to, but purposefully am using a throwaway account. It’s an iOS app rated 4.5 stars on the app store and has a nice community. Mild userbase, in the hundreds.

exe34 · 8 months ago

> but my productivity was still at least 3x

How do you measure this?

hotpotat · 8 months ago

Mean time to shipping features of various estimated difficulty. It’s subjective and not perfect, but generally speaking I need to work way less. I’ll be honest, one thing I think I could have done faster without AI was to implement CRDT-based cloud sync for a project I have going. I think I’ve tried to utilize AI too much for this. It’s good at implementing vector clock implementations, but not at preventing race conditions.

dosnem · 8 months ago

> there’s a process of research and planning and perusing in careful steps, and I set the agent up for success

Are there any good articles you can share or maybe your process? I’m really trying to get good at this but I don’t find myself great at using agents and I honestly don’t know where to start. I’ve tried the memory bank in cline, tried using more thinking directives, but I find I can’t get it to do complex things and it ends up being a time sink for me.

hotpotat · 8 months ago

https://www.lesswrong.com/posts/dxiConBZTd33sFaRC/field-note...

pier25 · 8 months ago

And you created an account just to write this unbelievable claim?

A bit suspicious, wouldn’t you agree?

the__alchemist · 8 months ago

Web dev CRUD in node?

hotpotat · 8 months ago

Multi platform web+native consumer application with lots of moving parts and integration. I think to call it a CRUD app would be oversimplifying it.

itsafarqueue · 8 months ago

More anecdata: +1 for “LLMs write all my production code now”. 25+ years in industry, as expert as it’s possible to be in my domain. 100% agree LLMs fail hilariously badly, often, and dangerously. And still, write ~all my code.

No agenda here, not selling anything. Just sitting here towards the later part of my career, no need to prove anything to anyone, stating the view from a grey beard.

Crypto hype was shill from grifters pumping whatever bag holding scam they could, which was precisely what the behavioral economic incentives drove. GenAI dev is something else. I’ve watched many people working with it, your mileage will vary. But in my opinion (and it’s mine, you do you), hand coding is an apocryphal skill. The only part I wonder about is how far up and down the system/design/architecture stack the power-tooling is going to go. My intuition and empirical findings incline towards a direction I think would fuel a flame war. But I’m just grey beard Internet random, and hey look, no evidence just more baseless claims. Nothing to see here.

Disclosure: I hold no direct shares in Mag 7, nor do I work for one.