The great AI delusion is falling apart

That METR paper - discussed here two weeks ago: https://news.ycombinator.com/item?id=44522772 - has so much sticking power.

It's not a bad paper, but it's also turning into a fantastic illustration of how much thirst there is out there for anything that shows that AI productivity doesn't work.

I just learned there was a 4 minute TV news segment about it on CNBC! https://www.youtube.com/watch?v=WP4Ird7jZoA

timr · 5 months ago

> It's not a bad paper, but it's also turning into a fantastic illustration of how much thirst there is out there for anything that shows that AI productivity doesn't work.

Maybe. I think it's a fantastic illustration of anyone doing anything to provide something other than hype around the subject. An actual RCT? Too good to be true! The thirst is for fact vs. speculation, influencer blogposts and self-promotion.

That this RCT provided evidence opposing the hype, is, of course, irresistible.

simonw · 5 months ago

That's what I liked about the paper: the methodology felt better to me than many other studies. I noted that when I blogged about it (second-to-last paragraph https://simonwillison.net/2025/Jul/12/ai-open-source-product... )

ofjcihen · 5 months ago

To be fair that thirst probably comes from people who aren’t seeing the gains the hype would lead you to believe and are reaching into the void to not feel like they’re taking crazy pills.

It’s also probably not coming from a place of “I’m scared of AI so I want it to fail” but more like “my complex use case doesn’t work with AI and I’m really wondering why that is”.

There’s this desire it seems to think of people who aren’t on the hype train as “against” AI but people need to remember that these are most likely devs with a decade of experience who have been evaluating the usefulness of the tools they use for a long time.

jsbisviewtiful · 5 months ago

Personal take, but I think some people also understand that the hype machine around AI is coming from the rich and C-level people. Meanwhile companies are widely and openly axing jobs and or not paying artists, citing AI as the source of their new fortune. Personally, my use of AI in my job has so far not been that fruitful and for something that has so far dramatically underdelivered on its promises of utopian ideas we are instead actively seeing it used to undercut the 99% - and that’s not even getting into the environmental impact or the hellscape it’s made of the Internet.

Yoric · 5 months ago

> To be fair that thirst probably comes from people who aren’t seeing the gains the hype would lead you to believe and are reaching into the void to not feel like they’re taking crazy pills.

Yes, exactly!

I've spent way too much time trying to get anything remotely close to an LLM writing useful code. Yeah, I'm sure it can speed up writing code that I can write in my sleep, but I want it to write code I can learn from, and so far, my success rate is ~0 (although the documentation along the bogus code is sometimes a good starting point).

Having my timelines filled by people who basically claim that I'm just an idiot for failing to achieve that? Yeah, it's craze-inducing.

Every time I see research that appears to confirm the hype, I see a huge hole in the protocol.

Now finally, some research confirming my observations? It feels so good!

Terr_ · 5 months ago

Trying out an analogy:

The time is the early 2000s, and the Segway™ is being suggested as the archetype of almost all future personal transportation in cities and suburbs. I don't hate the product, there's neat technology there, they're fun to mess with, but... My bullshit sensor is still going off.

I become tired of being told that I'm just not using enough imagination, or that I would understand if only I was plugged into the correct social-groups of visionaries who've given arguments I already don't find compelling.

Then when somebody does a proper analysis of start/stop distance, road throughput, cargo capacity, etc, that's awesome! Finally, some glimmer of knowledge to push back the fog of speculation.

Sure, there's a nonzero amount of confirmation bias going on, but goshdangit at least I'm getting mine from studies with math, rather than the folks getting it from artistic renderings of self-balancing vehicles filling a street in the year 2025.

goalieca · 5 months ago

For some, it is also coming from a place that their company leadership is mandating AI use.

emp17344 · 5 months ago

Well, there aren’t any studies showing AI agents boost productivity, so it’s all we’ve got. It seems like a well-conducted study, so I’m inclined to trust its conclusions.

simonw · 5 months ago

One of the articles linked from the OP includes links to such studies: https://theconversation.com/does-ai-actually-boost-productiv... - scroll down to the "AI and individual productivity" section, there are two papers there on the "increases productivity" side followed by two others that didn't.

bluefirebrand · 5 months ago

Anyone who is an employee drawing a salary should be extremely hopeful that AI productivity doesn't work.

Why should we be eager to find out that some new tech is going to undercut us and replace us, devaluing us even more than we already are?

simonw · 5 months ago

Do you benefit from open source?

Open source packages are the biggest productivity boost of my entire career, at no point did I think "wow, I wish these didn't exist, they're a threat to my livelihood".

kbelder · 5 months ago

Should we be wary of any productivity gains?

Should be looking for ways to work slower? I can go back to just one monitor.

Deleted Comment

ajaisjdh · 5 months ago

There’s literally billions of dollars on the “pro AI side”.

What you’re seeing is a thirst for objective reporting. The average person only has the ability to provide anecdotes - many of which are in stark contrast to the narrative pushed by the billionaires pumping AI.

I don’t think anyone serious thinks AI isn’t useful in some capacity - but it’s more like a bloom filter than a new branch of mathematics. Magically powerful in specific use cases, but not a paradigm shift.

didibus · 5 months ago

Personally, I think this is a disingenuous take. The thirst is for tangible data, the issue is that we've never been able to measure any form of productivity/quality in software development.

My team does two person PR reviews for example. We'd go a lot faster if we didn't or even just allowed a single reviewer. Similarly, we have no idea what the quality impact would be if we stopped, and what we gain by doing so. We are we not having a 3 reviewer rule for example, why not, two is an arbitrary number?

Unit tests... We'd surely go a lot faster if we didn't bother with them. Teams used to have some dedicated QA members and you'd rely entirely on manual testing. You can push a lot more code out. Was software in the 90s when unit tests and integ tests wasn't used buggier than today's software?

Now take AI, what is the impact of its use? It's not even obvious if it reduced the time it takes to launch a feature, my team isn't suddenly ahead of schedule on all our projects, even though we all use Agentic tools actively now. Ask any one of us and "I think it makes us faster" will be the answer. But ask us why we have a 2 person review rule and we'd similarly say: "I think it prevents bugs and improves the code quality".

The difference with AI now is that you pay for it, it's not free. Having unit tests or doing a 2 person review is just a process change. AI is something you pay for, so there's more desire to know for sure. And it also is something people would like to know if they can lower their headcount without impacting their competitive edge and ability to deliver fast and with good enough quality. Nobody wants to lower the headcount and find out the hard way.

thewebguyd · 5 months ago

> the issue is that we've never been able to measure any form of productivity/quality in software development.

Yep. It's been a "problem" for decades at this point. Business types constantly trying, and failing, to find some way to measure dev productivity, like they can with other types of office drone work.

We've been through Lines of Code, Function Points, various agile metrics, etc. None of these have given business types their holy grail of a perfectly objective measure of productivity. But no one wants to accept an answer of "You just can't effectively measure productivity in software development" because we now live in a data-driven business culture where every little thing must be measured and quantified.

I'm happy to let people think that AI does not yield productivity gains. There is no point engaging on this topic, so I will just outwork/outperform them.

quxbar · 5 months ago

I now have the pleasure of giving exercises to candidates where they are explicitly allowed to use any AI or autocomplete that they want, but it's one of those tricky real-world problems where you'll only get yourself into trouble if you only follow the model's suggestions. It really separates the builders from the bureaucrats far more effectively than seeing who can whiteboard or leetcode.

jamil7 · 5 months ago

Its kind of a trap, we allow people in interviews to do the same and some of them waste so much time accepting wrong LLM completions and then changing them than if they'd just written the code themselves.

pydry · 5 months ago

Ive been doing this inadvertently for years by making tasks that were as realistic as possible - explicitly based upon the code the candidate will be working upon.

As it happens, this meant when candidates started throwing AI at the task, instead of performing that magic it usually can when you make it build a todo app or solve some done-to-death irrelevant leetcode problem it flailed and left the candidate feeling embarrassed.

I really hope AI signals the death knell of fucking stupid interview problems like leetcode. Alas many companies are instead knee jerking and "banning" AI from interview use instead (even claude, hilariously).

BoiledCabbage · 5 months ago

> but it's one of those tricky real-world problems where you'll only get yourself into trouble if you only follow the model's suggestions.

What's the goal of this? What are you looking for?

yomismoaqui · 5 months ago

That's really interesting... can you give more details about the problem you are using?

This sounds like in there will be a race between this kind of booby trap tests and AIs learning them.

elpakal · 5 months ago

Some code challenge platforms allow for seeing how often someone pasted things in. That's been interesting.

arealaccount · 5 months ago

Interesting, care to elaborate? Or this is a carefully guarded secret?

Lionga · 5 months ago

If you are so happy to let people think that AI does not yield productivity gains, why comment here?

How exactly did you outperform? Show, don't talk.

nayshins · 5 months ago

I rolled out a migration to 60+ backends by using Claude code to manage it in the background. Simultaneously, I worked on other features while keeping my usual meeting load. I have more commits and releases per week than I have had in my whole career, which is objectively more productive.

haswell · 5 months ago

The issue I have with comments like this one is the one-dimensional notion of value described as "productivity gains" for a single person.

There are many things in this world that could be fairly described as "more productive" or "faster" than the norm, yet few people would argue that it makes those things a net benefit. You can lie and cheat your way to success, and that tends to be successful too. There are good reasons society frowns on this.

To me, focusing only on "I'm more productive" while ignoring the systemic and societal factors impacted by that "productivity" is completely missing the forest for the trees.

The fact that you further feel that there isn't even a point in engaging on the topic is disturbing considering those ignored factors.

troupo · 5 months ago

> I'm happy to let people think that AI does not yield productivity gains.

vs.

--- start quote ---

In a randomised controlled trial – the first of its kind – experienced computer programmers could use AI tools to help them write code.

--- end quote ---

Your quote is very representative of the magical wishful thinking most people have about AI: https://dmitriid.com/everything-around-llms-is-still-magical...

simonw · 5 months ago

"Your quote is very representative of the magical wishful thinking most people have about AI"

Your comment here is very representative of how quickly people who are AI skeptics will jump on anything that supports their skepticism.

refulgentis · 5 months ago

(not op)

Gosh, I was conflicted, then you pulled out that sentence and I was convinced. :)

Alternatively: When faced with a contradiction, first, check your premises.

I don't want to belabor the point too much, there's little common ground if we're at all or nothing thinking - "the study proved AI is net-negative because of this pull quote" isn't discussion.

pydry · 5 months ago

ive watched a lot of people code with cursor, etc. and i noticed that they seem to get a rush when it occasionally does something amazing that more than offsets their disappointment when it (more often) screws up.

the psychological effect reminds me a bit of slot machines, which provide you with enough intermittent wins to make you feel like you're winning while youre lose.

I think this might be linked to that study that found experienced oss devs who thought they were faster when they were in actual fact 20% slower.

hooverd · 5 months ago

Crazy how productivity gains just lead to more work for you.

stronglikedan · 5 months ago

Crazy how people would let their managers know they could get more done, instead of getting the same amount done quicker and having more free time.

Deleted Comment

nayshins · 5 months ago

more work is good for the soul... until it isnt

bluefirebrand · 5 months ago

This is actually worth talking about imo

There is nothing in it for me, if I am more productive but earn the same and don't get any more time off

Why should I bother at that point?

skeeter2020 · 5 months ago

>> so I will just outwork/outperform them.

actually based on your own admission this is not what you're doing...

throwawayqqq11 · 5 months ago

Green or naturally grown brown field projects?

People who boast about AI enhanced productivity seem to always forget to mention.

thefz · 5 months ago

Until you will not have access to it and be outperformed by people used to thinking every day.

masfuerte · 5 months ago

Yet here you are, engaging.

nayshins · 5 months ago

the art of the bait

kubb · 5 months ago

Good luck getting paid more for your improved performance :D

some_random · 5 months ago

If they're right in their belief that AI usage leads to significantly more performance, their compensation is that they will keep their job.

stronglikedan · 5 months ago

No one gets paid more to get the same job done, unless you count free time as compensation.

nayshins · 5 months ago

I get paid a lot already.

archagon · 5 months ago

Uh, and who are you, exactly?

gishglish · 5 months ago

> so I will just outwork/outperform them.

At the game of producing garbage slop? Probably yeah.