Ask HN: Why there are no actual studies that show AI is more productive?

Dora released a report last year: https://dora.dev/research/2025/dora-report/

The gains are ~17% increase in individual effectiveness, but a ~9% of extra instability.

In my experience using AI assisted coding for a bit longer than 2 years, the benefit is close to what Dora reported (maybe a bit higher around 25%). Nothing close to an average of 2x, 5x, 10x. There's a 10x in some very specific tasks, but also a negative factor in others as seemingly trivial, but high impact bugs get to production that would have normally be caught very early in development on in code reviews.

Obviously depends what one does. Using AI to build a UI to share cat pictures has a different risk appetite than building a payments backend.

lucasluitjes · 5 days ago

The full report can be found here: https://services.google.com/fh/files/misc/2025_state_of_ai_a...

That 17% increase is in self-reported effectiveness. The software delivery throughput only went up 3%, at a cost of that 9% extra instability. So you can build 3% faster with 9% more bugs, if I'm reading those numbers right.

yorwba · 5 days ago

Those aren't even percentage increases, but standardized effect sizes. So if you take an individual survey respondent and all you know is that they self-reported higher AI usage, you can guess their answers to the self-reported individual effectiveness slightly more accurately, but most of the variation will be due to unrelated factors.

The question that people are actually interested in, "After adopting this specific AI tool, will there be a noticeable impact on measures we care about?" is not addressed by this model at all, since they do not compare individual respondents' answers over time, nor is there any attempt to establish causality.

PunchyHamster · 5 days ago

And 3% difference is at "the new coffee in office is kinda shit and developers are annoyed" level of difference

orwin · 4 days ago

I think for myself, it's close to 25% if I only take my role as a dev. If I take my 'senior' role it's less, because I spend way more time in reviews or in prod incident meetings.

Three months ago, with opus4.5, I would have said that the productivity improvement was ~10% for my whole team.

I now have to contradict myself: juniors and even experienced new hires with little domain knowledge don't improve as fast as they used to. I still have to write new tasks/issue like I would have for someone we just hired, after 8 months. I still catch the same issues we caught in reviews three months ago.

Basically, experience doesn't improve productivity as fast as it used to. On easy stuff it doesn't matter (like frontend changes, the productivity gains are extremely high, probably 10x), and on specific subjects like red teaming where a quantity of small tools is better than an integrated solution I think it can be better than that.

But I'm in a netsec tooling team, we do hard automation work to solve hard engineering issues, and that is starting to be a problem if juniors don't level up fast.

unsupp0rted · 5 days ago

For me it is a 2x or 5x or something, "but high impact bugs get to production that would have normally be caught very early in development on in code reviews" is what takes it back down to a 1.5x.

There are genuinely weeks where I go 5x though, and others where I go 0.5x.

duncanfwalker · 5 days ago

It's not so valuable to assess the current state - what the impact of using AI is today. From personal experience it feels like overall impact on productivity was not positive a couple of years ago, might be positive now and will be positive in a couple of years. That means by assessing the current state of impact on product where just finding where we are on that change curve. If we accept that trend is happening then we know at some point it will (or has) pass the threshold where our companies will fall behind if they're not using it. We also know it takes a while to get up to speed and make sure we're making the most of it so the earlier we start the better. That's the counter arguement that we could wait for a later wave to jump on but that's risky and the only potential reward is a small percentage short-term productivity gain.

AnimalMuppet · 4 days ago

Of course, if stability is part of what you're supposed to be delivering, then you can't be 17% more effective.

Because we are incapable of measuring developer productivity.

arzke · 5 days ago

We're incapable of putting an accurate, standardized value on developer productivity, yet there often seems to be consensus between senior engineers on who are the high performers and the low performers. I certainly can tell this about the people I work with.

kqr · 5 days ago

We are definitely not. Point at a problem, and measure the cost of solving it. That's developer productivity.

We only avoid doing it at scale because it's expensive. In particular if we want the measurement to generalise out of sample.

(In particular in this case, where once we're done, proponents will claim our data is too old to be a useful guide to tomorrow.)

xigoi · 5 days ago

> Point at a problem, and measure the cost of solving it.

The problem with this is that AI will create worse code that is going to cause more problems in the future, but the measurements won’t take that into account.

actionfromafar · 5 days ago

Yes.

If we could even measure teams, against themselves, others and some kind of baseline, but we don't AFAIK.

blitzar · 5 days ago

Lines of code pushed ... obviously /s

Unironically, ai evaluating the impact of those lines might be getting close to a metric that would measure output better than having everyone print out their last 6 months of work for the new boss to look at.

PunchyHamster · 5 days ago

Or it might be horribly bad at it, as near every other problem people claim "AI might be good at it"

AugustoCAS · 5 days ago

chrysoprace · 5 days ago

Self-reported productivity does not equate to actual productivity. People have all sorts of biases that make such assessments fairly pointless. They only gauge how you feel about your productivity, which is not necessarily a bad thing, but it doesn't mean you're actually more productive.

esperent · 5 days ago

To extend on this, the measures of productivity before LLMs were difficult for any kind of complex work, so there's no reason to think we would have better measures now.

You need broad economic measurements, not individual or company specific. And that takes a long time plus there's a lot of noise in the data right now (war, for example).

Deleted Comment

lysecret · 5 days ago

aragilar · 5 days ago

How do you know you're more productive? Humans are excellent at fooling themselves, and absent a metric (or multiple metrics) by which you can measure your productivity, you can't be sure you're actually being more productive.

mikkupikku · 5 days ago

I don't know if it's made me more productive but I do know that for the past ten years I've been thinking about making an intermediate mode GUI toolkit for MPV user scripts, rendered with ASS subtitles and with a full suite of composable widgets, but for ten years I kept putting it off because it seemed like it would be a big quagmire of difficult to diagnose rendering errors (based off far more modest forays into making one-off GUIs in this way.) And I know that yesterday I decided to explain my idea to claude and now it just fucking works after just a few hours of easy casual back and forth.

I don't know man, could just be in my head. I better defer judgement, put aside all my own opinions about what happened and let some researchers with god knows what axe to grind make that decision for me.

make_it_sure · 5 days ago

I'm very very sure. Based on my last 15 years of coding experience I can assist fairly accurate how much a task takes. With AI I can finish the task 2x-4x faster (this includes testing, edge case handling etc).

hennell · 5 days ago

What's the best car? If you're trying to go fast it's one answer, if you're trying to carry as much load as possible it's another, if you're buying for your just-qualifed-teen it's another. But best is obviously subjective, so what about safest? I don't know specifics there, but if you're in the EU the "safest" car would be very different to the "safest" in the US, because their safety studies measure very different things.

Which is the issue with almost all studies and statistics, what it means depends entirely on what you're measuring.

I can program very very fast if I only consider the happy path, hard code everything and don't bother with things like writing tests defining types or worrying about performance under expected scale. It's all much faster right up until the point it isn't - and then it's much slower. Ai isn't quite so obviously bad, but it can still hide short term gains into long term problems which is what studies tend to focus on as the short term doesn't usually require a study to observe.

I think Ai is similar to outsourcing staff to cheeper counties, replacing ingredients with cheaper alternatives and other MBA style ideas. It's almost always instantly beneficial, but the long term issues are harder to predict, and can have far more varied outcomes dependent on weird specifics of the business.

felipeerias · 5 days ago

Most people seem to be expecting some kind of quantitative analysis: N developers undertook M tasks with and without access to a given AI tool, here is the statistical evidence that shows (or fails to show) the effect, and this result is valid across other projects and tools.

In practice, arriving at this ideal scenario can be very challenging. Actually feasible experiments will be necessarily narrow, with the expectation that their results can be (roughly) extrapolated outside of their specific experimental setup.

Another valid approach would be to carry out qualitative research, for example a case study. This typically requires the study of one (or a few) developers and their specific contexts in great detail. The idea is that a deep understanding of how one person navigates their work and their tools would provide us with insights that might be related to our specific situation.

Personally, in this particular area, I tend to prefer detailed qualitative accounts of how other developers are working on similar projects and with similar tools as me.

But in any case, both approaches are valid and complementary.

Nevermark · 5 days ago

I think most major efficiency improvements involve more adaptation costs than expected.

Those that can “see” the potential push through the adaptation period, even when longer than expected.

Depending on how forward looking a group is, the adaptation costs are a problem, a dilemma, or a completely obvious win.

Yet, external measurements don't distinguish between accumulating, accelerating, flat or fading intermediate value.

Avoidance of necessary adaptation, even with no immediate impact, becomes the dual. Technical, strategic, or capability debt.

Does that hidden anti-productivity ever get accounted for? When maladaptive firms take their anti-productivity into a hole as they fade/demise?

A company can operate with high margins while its sales fall off a cliff. Is that just "decreasing quantities" of uniformly "high productivity"?

IshKebab · 5 days ago

These sort of things are really hard to study. Combine that with the fact that the AI landscape is so varied and fast moving... It's easy to see why there aren't many studies on it.

There are a mountain of things that we reasonably know to be true but haven't done studies on. Is it beneficial for programming languages to support comments? Are regexes error-prone? Does static typing improve productivity on large projects? Is distributed version control better than centralised (lock based)? Etc.

Also you can't just say "AI improves productivity". What kind of AI? What are you using it for? If you're making static landing pages... yeah obviously it's going to help. Writing device drivers in Ada? Not so much.

teew · 5 days ago

I think these comparisons are unfairly picked. A good chunk of the world's economy is not currently jacked up on the promise that comments in code will lead to unimaginably high value (in pretty much every field from medicine to the media industry) in the span of a couple of years. Given the claims and market valuations around AI, wouldn't you agree a bit more hard evidence would be reassuring?