It's not a bad paper, but it's also turning into a fantastic illustration of how much thirst there is out there for anything that shows that AI productivity doesn't work.
> It's not a bad paper, but it's also turning into a fantastic illustration of how much thirst there is out there for anything that shows that AI productivity doesn't work.
Maybe. I think it's a fantastic illustration of anyone doing anything to provide something other than hype around the subject. An actual RCT? Too good to be true! The thirst is for fact vs. speculation, influencer blogposts and self-promotion.
That this RCT provided evidence opposing the hype, is, of course, irresistible.
To be fair that thirst probably comes from people who aren’t seeing the gains the hype would lead you to believe and are reaching into the void to not feel like they’re taking crazy pills.
It’s also probably not coming from a place of “I’m scared of AI so I want it to fail” but more like “my complex use case doesn’t work with AI and I’m really wondering why that is”.
There’s this desire it seems to think of people who aren’t on the hype train as “against” AI but people need to remember that these are most likely devs with a decade of experience who have been evaluating the usefulness of the tools they use for a long time.
Personal take, but I think some people also understand that the hype machine around AI is coming from the rich and C-level people. Meanwhile companies are widely and openly axing jobs and or not paying artists, citing AI as the source of their new fortune. Personally, my use of AI in my job has so far not been that fruitful and for something that has so far dramatically underdelivered on its promises of utopian ideas we are instead actively seeing it used to undercut the 99% - and that’s not even getting into the environmental impact or the hellscape it’s made of the Internet.
> To be fair that thirst probably comes from people who aren’t seeing the gains the hype would lead you to believe and are reaching into the void to not feel like they’re taking crazy pills.
Yes, exactly!
I've spent way too much time trying to get anything remotely close to an LLM writing useful code. Yeah, I'm sure it can speed up writing code that I can write in my sleep, but I want it to write code I can learn from, and so far, my success rate is ~0 (although the documentation along the bogus code is sometimes a good starting point).
Having my timelines filled by people who basically claim that I'm just an idiot for failing to achieve that? Yeah, it's craze-inducing.
Every time I see research that appears to confirm the hype, I see a huge hole in the protocol.
Now finally, some research confirming my observations? It feels so good!
The time is the early 2000s, and the Segway™ is being suggested as the archetype of almost all future personal transportation in cities and suburbs. I don't hate the product, there's neat technology there, they're fun to mess with, but... My bullshit sensor is still going off.
I become tired of being told that I'm just not using enough imagination, or that I would understand if only I was plugged into the correct social-groups of visionaries who've given arguments I already don't find compelling.
Then when somebody does a proper analysis of start/stop distance, road throughput, cargo capacity, etc, that's awesome! Finally, some glimmer of knowledge to push back the fog of speculation.
Sure, there's a nonzero amount of confirmation bias going on, but goshdangit at least I'm getting mine from studies with math, rather than the folks getting it from artistic renderings of self-balancing vehicles filling a street in the year 2025.
Well, there aren’t any studies showing AI agents boost productivity, so it’s all we’ve got. It seems like a well-conducted study, so I’m inclined to trust its conclusions.
One of the articles linked from the OP includes links to such studies: https://theconversation.com/does-ai-actually-boost-productiv... - scroll down to the "AI and individual productivity" section, there are two papers there on the "increases productivity" side followed by two others that didn't.
Open source packages are the biggest productivity boost of my entire career, at no point did I think "wow, I wish these didn't exist, they're a threat to my livelihood".
There’s literally billions of dollars on the “pro AI side”.
What you’re seeing is a thirst for objective reporting. The average person only has the ability to provide anecdotes - many of which are in stark contrast to the narrative pushed by the billionaires pumping AI.
I don’t think anyone serious thinks AI isn’t useful in some capacity - but it’s more like a bloom filter than a new branch of mathematics. Magically powerful in specific use cases, but not a paradigm shift.
Personally, I think this is a disingenuous take. The thirst is for tangible data, the issue is that we've never been able to measure any form of productivity/quality in software development.
My team does two person PR reviews for example. We'd go a lot faster if we didn't or even just allowed a single reviewer. Similarly, we have no idea what the quality impact would be if we stopped, and what we gain by doing so. We are we not having a 3 reviewer rule for example, why not, two is an arbitrary number?
Unit tests... We'd surely go a lot faster if we didn't bother with them. Teams used to have some dedicated QA members and you'd rely entirely on manual testing. You can push a lot more code out. Was software in the 90s when unit tests and integ tests wasn't used buggier than today's software?
Now take AI, what is the impact of its use? It's not even obvious if it reduced the time it takes to launch a feature, my team isn't suddenly ahead of schedule on all our projects, even though we all use Agentic tools actively now. Ask any one of us and "I think it makes us faster" will be the answer. But ask us why we have a 2 person review rule and we'd similarly say: "I think it prevents bugs and improves the code quality".
The difference with AI now is that you pay for it, it's not free. Having unit tests or doing a 2 person review is just a process change. AI is something you pay for, so there's more desire to know for sure. And it also is something people would like to know if they can lower their headcount without impacting their competitive edge and ability to deliver fast and with good enough quality. Nobody wants to lower the headcount and find out the hard way.
> the issue is that we've never been able to measure any form of productivity/quality in software development.
Yep. It's been a "problem" for decades at this point. Business types constantly trying, and failing, to find some way to measure dev productivity, like they can with other types of office drone work.
We've been through Lines of Code, Function Points, various agile metrics, etc. None of these have given business types their holy grail of a perfectly objective measure of productivity. But no one wants to accept an answer of "You just can't effectively measure productivity in software development" because we now live in a data-driven business culture where every little thing must be measured and quantified.
I don't think AI lives up to the current hype but this article is garbage.
They're obviously talking about the METR paper, but the main takeaway according to the authors themselves was that self-reporting productivity increases is unreliable, not that you should cancel your subscription.
Nothing in that paper said that AI can't speed up software engineering.
> Nothing in that paper said that AI can't speed up software engineering
I mean, the paper did provide tangible data that at least in their experiments, AI slowed down software engineering.
What they said is that it's not a proof that there's isn't a scenario or a mechanism where AI could result in speeding up software engineering. For that more research would be needed in measuring productivity of AI in more varied contexts.
For me at least, their experiment seem to describe the average developer's use of AI. So it's probably telling you that currently on average AI might be slowing things down.
Now the question is, can we find good data of outliers, and is it a simple matter of figuring out how to use it effectively, so we can upskill people and get the average to now be faster. Or will the outlier be conditioned on like, only for newbies, only for prototypes, only for the first X weeks on a greenfield code base, etc.
Edit: That said, the most fascinating data point of that study is how software engineers are not able to determine if AI makes them faster or slower, because they all thought they were 20% faster but were 19% slower in reality. So now you have to become really skeptical of anyone who claims they found a methodology or a workflow where their use of AI makes them faster. We need better measurement than just "I feel faster".
I'm happy to let people think that AI does not yield productivity gains. There is no point engaging on this topic, so I will just outwork/outperform them.
I now have the pleasure of giving exercises to candidates where they are explicitly allowed to use any AI or autocomplete that they want, but it's one of those tricky real-world problems where you'll only get yourself into trouble if you only follow the model's suggestions. It really separates the builders from the bureaucrats far more effectively than seeing who can whiteboard or leetcode.
Its kind of a trap, we allow people in interviews to do the same and some of them waste so much time accepting wrong LLM completions and then changing them than if they'd just written the code themselves.
Ive been doing this inadvertently for years by making tasks that were as realistic as possible - explicitly based upon the code the candidate will be working upon.
As it happens, this meant when candidates started throwing AI at the task, instead of performing that magic it usually can when you make it build a todo app or solve some done-to-death irrelevant leetcode problem it flailed and left the candidate feeling embarrassed.
I really hope AI signals the death knell of fucking stupid interview problems like leetcode. Alas many companies are instead knee jerking and "banning" AI from interview use instead (even claude, hilariously).
I rolled out a migration to 60+ backends by using Claude code to manage it in the background. Simultaneously, I worked on other features while keeping my usual meeting load. I have more commits and releases per week than I have had in my whole career, which is objectively more productive.
The issue I have with comments like this one is the one-dimensional notion of value described as "productivity gains" for a single person.
There are many things in this world that could be fairly described as "more productive" or "faster" than the norm, yet few people would argue that it makes those things a net benefit. You can lie and cheat your way to success, and that tends to be successful too. There are good reasons society frowns on this.
To me, focusing only on "I'm more productive" while ignoring the systemic and societal factors impacted by that "productivity" is completely missing the forest for the trees.
The fact that you further feel that there isn't even a point in engaging on the topic is disturbing considering those ignored factors.
Gosh, I was conflicted, then you pulled out that sentence and I was convinced. :)
Alternatively: When faced with a contradiction, first, check your premises.
I don't want to belabor the point too much, there's little common ground if we're at all or nothing thinking - "the study proved AI is net-negative because of this pull quote" isn't discussion.
ive watched a lot of people code with cursor, etc. and i noticed that they seem to get a rush when it occasionally does something amazing that more than offsets their disappointment when it (more often) screws up.
the psychological effect reminds me a bit of slot machines, which provide you with enough intermittent wins to make you feel like you're winning while youre lose.
I think this might be linked to that study that found experienced oss devs who thought they were faster when they were in actual fact 20% slower.
All AI impact studies and research papers need to be taken with a pinch of salt. The field is moving so fast that by the time you get peer reviewed, you’re already outdated
I’ve watched coding change from Cursor-esque IDEs to terminal based agentic tools within months.
I'm building things at a level of complexity I wouldn't have even attempted without AI.
This piece, however, only focuses on time spent on a task that could be done both ways. Even there, it falls short. Let's assume this study is correct and a specific coding task does take me 19% more time with AI. I can still be more productive because the AI doing some of the work allows me to do other tasks during that time.
I do worry about atrophy of my mind outsourcing too many tasks, admittedly. But that's a different issue.
There is really no argument that AI creates some productivity gains. Even if it's just an improved autocomplete. Because autocomplete does create some productivity gains. Pushing farther is murkier though. When it comes to the bulk of the type of work that AI is proving useful for, one of the main questions is why is there a need for speed. It's not in a sense of fear of automating jobs, it's just that generally we are reaching the bottlenecks more quickly and potentially causing more, but different problems, than we are solving.
It's similar to the story of the development of vehicles and how even though we move much faster we spend a greater amount of time in transit. My mom used to lament how annoying it was to have to drive to the grocery store because when she was younger and not everyone had cars the store came to you. Twice a day, in the morning and the evening the "rolling store" would drive through the neighborhood and if they didn't have what you needed right then, they would bring it on the next trip. We are finally coming back full circle with things like Instacart but it's taken a solid ~60 years of largely wasted inefficient travel times.
I think AI greatly reduces the starting costs for mundane tasks/boilerplate, for a reduction in velocity in implementation. So possibly an illusion that programmers feel more productive. It could be that RPE(Rate of perceived exertion) is lower when using AI for tasks, but raw throughput may be higher if programmers just do the jobs themselves and get into productive/flow state.
> t could be that RPE(Rate of perceived exertion) is lower when using AI for tasks, but raw throughput may be higher if programmers just do the jobs themselves and get into productive/flow state.
I think you're onto something with this take, based on my own experience. I definitely agree that my RPE seems lower when I'm using AI for things, whether it actually is making me more productive or not over the long term remains to be seen but things do certainly "feel" easier/less cognitively demanding. Which, tbh, is still a benefit even if it doesn't result in large gains in output. Putting in less cognitive load at work just conserves my energy for things that matter - everything else outside of $dayjob.
That won't happen - this money was stolen from the middle class via currency devaluation. Spending it on any economic activity, no matter how pointless, is actually better than just gobbling up assets.
It's not a bad paper, but it's also turning into a fantastic illustration of how much thirst there is out there for anything that shows that AI productivity doesn't work.
I just learned there was a 4 minute TV news segment about it on CNBC! https://www.youtube.com/watch?v=WP4Ird7jZoA
Maybe. I think it's a fantastic illustration of anyone doing anything to provide something other than hype around the subject. An actual RCT? Too good to be true! The thirst is for fact vs. speculation, influencer blogposts and self-promotion.
That this RCT provided evidence opposing the hype, is, of course, irresistible.
It’s also probably not coming from a place of “I’m scared of AI so I want it to fail” but more like “my complex use case doesn’t work with AI and I’m really wondering why that is”.
There’s this desire it seems to think of people who aren’t on the hype train as “against” AI but people need to remember that these are most likely devs with a decade of experience who have been evaluating the usefulness of the tools they use for a long time.
Yes, exactly!
I've spent way too much time trying to get anything remotely close to an LLM writing useful code. Yeah, I'm sure it can speed up writing code that I can write in my sleep, but I want it to write code I can learn from, and so far, my success rate is ~0 (although the documentation along the bogus code is sometimes a good starting point).
Having my timelines filled by people who basically claim that I'm just an idiot for failing to achieve that? Yeah, it's craze-inducing.
Every time I see research that appears to confirm the hype, I see a huge hole in the protocol.
Now finally, some research confirming my observations? It feels so good!
The time is the early 2000s, and the Segway™ is being suggested as the archetype of almost all future personal transportation in cities and suburbs. I don't hate the product, there's neat technology there, they're fun to mess with, but... My bullshit sensor is still going off.
I become tired of being told that I'm just not using enough imagination, or that I would understand if only I was plugged into the correct social-groups of visionaries who've given arguments I already don't find compelling.
Then when somebody does a proper analysis of start/stop distance, road throughput, cargo capacity, etc, that's awesome! Finally, some glimmer of knowledge to push back the fog of speculation.
Sure, there's a nonzero amount of confirmation bias going on, but goshdangit at least I'm getting mine from studies with math, rather than the folks getting it from artistic renderings of self-balancing vehicles filling a street in the year 2025.
Why should we be eager to find out that some new tech is going to undercut us and replace us, devaluing us even more than we already are?
Open source packages are the biggest productivity boost of my entire career, at no point did I think "wow, I wish these didn't exist, they're a threat to my livelihood".
Should be looking for ways to work slower? I can go back to just one monitor.
Deleted Comment
What you’re seeing is a thirst for objective reporting. The average person only has the ability to provide anecdotes - many of which are in stark contrast to the narrative pushed by the billionaires pumping AI.
I don’t think anyone serious thinks AI isn’t useful in some capacity - but it’s more like a bloom filter than a new branch of mathematics. Magically powerful in specific use cases, but not a paradigm shift.
My team does two person PR reviews for example. We'd go a lot faster if we didn't or even just allowed a single reviewer. Similarly, we have no idea what the quality impact would be if we stopped, and what we gain by doing so. We are we not having a 3 reviewer rule for example, why not, two is an arbitrary number?
Unit tests... We'd surely go a lot faster if we didn't bother with them. Teams used to have some dedicated QA members and you'd rely entirely on manual testing. You can push a lot more code out. Was software in the 90s when unit tests and integ tests wasn't used buggier than today's software?
Now take AI, what is the impact of its use? It's not even obvious if it reduced the time it takes to launch a feature, my team isn't suddenly ahead of schedule on all our projects, even though we all use Agentic tools actively now. Ask any one of us and "I think it makes us faster" will be the answer. But ask us why we have a 2 person review rule and we'd similarly say: "I think it prevents bugs and improves the code quality".
The difference with AI now is that you pay for it, it's not free. Having unit tests or doing a 2 person review is just a process change. AI is something you pay for, so there's more desire to know for sure. And it also is something people would like to know if they can lower their headcount without impacting their competitive edge and ability to deliver fast and with good enough quality. Nobody wants to lower the headcount and find out the hard way.
Yep. It's been a "problem" for decades at this point. Business types constantly trying, and failing, to find some way to measure dev productivity, like they can with other types of office drone work.
We've been through Lines of Code, Function Points, various agile metrics, etc. None of these have given business types their holy grail of a perfectly objective measure of productivity. But no one wants to accept an answer of "You just can't effectively measure productivity in software development" because we now live in a data-driven business culture where every little thing must be measured and quantified.
They're obviously talking about the METR paper, but the main takeaway according to the authors themselves was that self-reporting productivity increases is unreliable, not that you should cancel your subscription.
Nothing in that paper said that AI can't speed up software engineering.
Why are we responding to hype with nonsense?
I mean, the paper did provide tangible data that at least in their experiments, AI slowed down software engineering.
What they said is that it's not a proof that there's isn't a scenario or a mechanism where AI could result in speeding up software engineering. For that more research would be needed in measuring productivity of AI in more varied contexts.
For me at least, their experiment seem to describe the average developer's use of AI. So it's probably telling you that currently on average AI might be slowing things down.
Now the question is, can we find good data of outliers, and is it a simple matter of figuring out how to use it effectively, so we can upskill people and get the average to now be faster. Or will the outlier be conditioned on like, only for newbies, only for prototypes, only for the first X weeks on a greenfield code base, etc.
Edit: That said, the most fascinating data point of that study is how software engineers are not able to determine if AI makes them faster or slower, because they all thought they were 20% faster but were 19% slower in reality. So now you have to become really skeptical of anyone who claims they found a methodology or a workflow where their use of AI makes them faster. We need better measurement than just "I feel faster".
As it happens, this meant when candidates started throwing AI at the task, instead of performing that magic it usually can when you make it build a todo app or solve some done-to-death irrelevant leetcode problem it flailed and left the candidate feeling embarrassed.
I really hope AI signals the death knell of fucking stupid interview problems like leetcode. Alas many companies are instead knee jerking and "banning" AI from interview use instead (even claude, hilariously).
What's the goal of this? What are you looking for?
This sounds like in there will be a race between this kind of booby trap tests and AIs learning them.
How exactly did you outperform? Show, don't talk.
There are many things in this world that could be fairly described as "more productive" or "faster" than the norm, yet few people would argue that it makes those things a net benefit. You can lie and cheat your way to success, and that tends to be successful too. There are good reasons society frowns on this.
To me, focusing only on "I'm more productive" while ignoring the systemic and societal factors impacted by that "productivity" is completely missing the forest for the trees.
The fact that you further feel that there isn't even a point in engaging on the topic is disturbing considering those ignored factors.
vs.
--- start quote ---
In a randomised controlled trial – the first of its kind – experienced computer programmers could use AI tools to help them write code.
--- end quote ---
Your quote is very representative of the magical wishful thinking most people have about AI: https://dmitriid.com/everything-around-llms-is-still-magical...
Your comment here is very representative of how quickly people who are AI skeptics will jump on anything that supports their skepticism.
Gosh, I was conflicted, then you pulled out that sentence and I was convinced. :)
Alternatively: When faced with a contradiction, first, check your premises.
I don't want to belabor the point too much, there's little common ground if we're at all or nothing thinking - "the study proved AI is net-negative because of this pull quote" isn't discussion.
the psychological effect reminds me a bit of slot machines, which provide you with enough intermittent wins to make you feel like you're winning while youre lose.
I think this might be linked to that study that found experienced oss devs who thought they were faster when they were in actual fact 20% slower.
Deleted Comment
There is nothing in it for me, if I am more productive but earn the same and don't get any more time off
Why should I bother at that point?
actually based on your own admission this is not what you're doing...
People who boast about AI enhanced productivity seem to always forget to mention.
At the game of producing garbage slop? Probably yeah.
I’ve watched coding change from Cursor-esque IDEs to terminal based agentic tools within months.
I still suspect the vast silent majority of professional software devs haven’t integrated any, even Cursor-style, AI tools in to their main gig.
And I reckon that’s completely rational, for those that have made this choice explicitly.
Early adopters of AI tools are making a speculative bet, but so far most of them seem happy with the return.
This piece, however, only focuses on time spent on a task that could be done both ways. Even there, it falls short. Let's assume this study is correct and a specific coding task does take me 19% more time with AI. I can still be more productive because the AI doing some of the work allows me to do other tasks during that time.
I do worry about atrophy of my mind outsourcing too many tasks, admittedly. But that's a different issue.
It's similar to the story of the development of vehicles and how even though we move much faster we spend a greater amount of time in transit. My mom used to lament how annoying it was to have to drive to the grocery store because when she was younger and not everyone had cars the store came to you. Twice a day, in the morning and the evening the "rolling store" would drive through the neighborhood and if they didn't have what you needed right then, they would bring it on the next trip. We are finally coming back full circle with things like Instacart but it's taken a solid ~60 years of largely wasted inefficient travel times.
I think you're onto something with this take, based on my own experience. I definitely agree that my RPE seems lower when I'm using AI for things, whether it actually is making me more productive or not over the long term remains to be seen but things do certainly "feel" easier/less cognitively demanding. Which, tbh, is still a benefit even if it doesn't result in large gains in output. Putting in less cognitive load at work just conserves my energy for things that matter - everything else outside of $dayjob.