Readit News logoReadit News
narush · 2 months ago
Hey HN -- study author here! (See previous thread on the paper here [1].)

I think this blog post is an interesting take on one specific factor that is likely contributing to slowdown. We discuss this in the paper [2] in the section "Implicit repository context (C.1.5)" -- check it out if you want to see some developer quotes about this factor.

> This is why AI coding tools, as they exist today, will generally slow someone down if they know what they are doing, and are working on a project that they understand.

I made this point in the other thread discussing the study, but in general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor -- there's a bunch of factors that contribute to this result -- at least 5 seem likely, and at least 9 we can't rule out (see the full factors table on page 11).

> If there are no takers then I might try experimenting on myself.

This sounds super cool! I'd be very excited to see how you set this up + how it turns out... please do shoot me an email (in the paper) if you do this!

> AI slows down open source developers. Peter Naur can teach us why

Nit: I appreciate how hard it is to write short titles summarizing the paper (the graph title is the best I was able to do after a lot of trying) -- but I might have written this "Early-2025 AI slows down experienced open-source developers. Peter Naur can give us more context about one specific factor." It's admittedly less of a catchy-title, but I think getting the qualifications right are really important!

Thanks again for the sweet write-up! I'll hang around in the comments today as well.

[1] https://news.ycombinator.com/item?id=44522772

[2] https://metr.org/Early_2025_AI_Experienced_OS_Devs_Study.pdf

seanwilson · 2 months ago
If this makes sense, how is the study able to give a reasonable measure of how long an issue/task should have taken, vs how long it took with AI to determine that using AI was slower?

Or it's comparing how long the dev thought it should take with AI vs how long it actually took, which now includes the dev's guess of how AI impacts their productivity?

When it's hard to estimate how difficult an issue should be to complete, how does the study account for this? What percent speed up or slow down would be noise due to estimates being difficult?

I do appreciate that this stuff is very hard to measure.

krona · 2 months ago
An easier way to think about it might be if you timed how long it took each ticket in your backlog. You also recorded whether you were drunk or not when you worked on it, and the ticket was selected at random from your backlog. The assumption (null-hypothesis) is that being drunk has no effect on ticket completion time.

Using the magic of statistics, if you have completed enough tickets, we can determine whether the null-hypothesis holds (for a given level of statistical certainty), and if it doesn't, low large is the difference (with a margin of error).

That's not to say there couldn't be other causes for the difference (if there is one), but that's how science proceeds, generally.

calf · 2 months ago
Slowing down isn't necessarily bad, maybe slow programming (literate/Knuth comes to mind as another early argument) encourages better theory formation. Maybe programming today is like fast food, and proper theory and abstraction (and language design) requires a good measure of slow and deliberate work that has not been the norm in industry.
jwhiles · 2 months ago
Thanks for the response, and apologies for misrepresenting your results somewhat! I'm probably not going to change the title since I am at heart and polemicist and a sloppy thinker, but I'll update the article to call out this misrepresentation.

That said, I think that what I wrote more or less encompasses three of the factors you call out as being likely to contribute: "High developer familiarity with reposito- ries", "Large and complex repositories", and "Implicit repository context".

I thought more about experimenting on myself, and while I hope to do it - I think it will be very hard to create a controlled enviornment whilst also responding to the demands the job puts on me. I also don't have the luxury of a list of well scoped tasks that could feasibly be completed in a few hours.

karmakaze · 2 months ago
I would expect any change to an optimized workflow (developing own well understood project) to initially be slower. What I'd like to see is how these same developers do 6 months or a year from now after using AI has become the natural workflow on these same projects. The article mentions that these results don't extrapolate to other devs, but it's important to note that it may not extrapolate over time to these same devs.

I myself am just getting started and I can see how so many things can be scripted with AI that would be very difficult to (semi-)automate without. You gotta ask yourself "Is it worth the time?"[0]

[0] https://xkcd.com/1205/

antonvs · 2 months ago
> Early-2025 AI slows down experienced open-source developers.

Even that's too general, because it'll depend on what the task is. It's not as if open source developers in general never work on tasks where AI could save time.

narush · 2 months ago
We call this over-generalization out specifically in the "We do not provide evidence that:" table in the blog post and paper - I agree there are tasks these developers are likely sped up on with early-2025 tools.

Dead Comment

munificent · 2 months ago
> The inability of developers to tell if a tool sped them up or slowed them down is fascinating in itself, probably applies to many other forms of human endeavour, and explains things as varied as why so many people think that AI has made them 10 times more productive, why I continue to use Vim, why people drive in London etc.

In boating, there's a notion of a "set and drift" which describes how wind and current pushes a boat off course. If a mariner isn't careful, they'll end up far from their destination because of it.

This is because when you're sitting in a boat, your perception of motion is relative and local. You feel the breeze on your face, and you see how the boat cuts through the surrounding water. You interpret that as motion towards your destination, but it can equally consist of wind and current where the medium itself is moving.

I think a similar effect explains all of these. Our perception of "making progress" is mostly a sense of motion and "stuff happening" in our immediate vicinity. It's not based on a perception of the goal getting closer, which is much harder to measure and develop an intuition for.

So people tend to choose strategies that make them feel like they're making progress even if it's not the most effective strategy. I think this is why people often take "shortcuts" when driving that are actually longer. All of the twists and turns keep them busy and make them feel like they're making more progress than zoning out on a boring interstate does.

wrsh07 · 2 months ago
Something I noticed early on when using AI tools was that it was great because I didn't get blocked. Somehow, I always wanted to keep going and always felt like I could keep going.

The problem, of course, is that one might thoughtlessly invoke the ai tool when it would be faster to make the one line change directly

Edit

This could make sense with the driving analogy. If the road I was planning to take is closed, gps will happily tell me to try something else. But if that fails too, it might go back to the original suggestion.

thinkingemote · 2 months ago
Exactly! Waze the navigation app tends to route users on longer routes but which feels more fast. When driving we perceive our journey as fast or slow not by the actual length but by our memories of what happened. Waze knows human drivers are happier with driving a route that may be longer in time and distance of they feel like they are making progress with the twists and turns.

Ai tools makes programming feel easier. That it might be actually less productive is interesting but we humans prefer the easier shortcuts. Our memories of coding with AI tells us that we didn't struggle and therefore we made progress.

tjr · 2 months ago
That sounds like a navigation tool that I absolutely do not want! Occasionally I do enjoy meandering around, but usually fastest / shortest path would be preferred.

And I'm not sure about the other either. In my 20+ year career in aerospace software, the most memorable times were solving interesting problems, not days with no struggle just churning out code.

PicassoCTs · 2 months ago
I also think that AI written code- is just not read. People hate code-reviews, and actively refuse to read code- because that is hard work, reading into other peoples thoughts and ideas.

This is why pushing for new code, rewrites, new frameworks is so popular. https://www.joelonsoftware.com/2000/04/06/things-you-should-...

So a ton of ai-generated code- is just that, never read. Its generated, tested against test-functions - and thats it. I wouldn't wonder, if some of these devs themselves have only marginal ideas whats in there codebases and why.

tjr · 2 months ago
I have mostly worked in aerospace software, and find this rather horrifying. I suppose, if your tests are in fact good and comprehensive enough, there could be a logical argument for not needing to understand the code, but if we're talking people's safety in the hands of your software, I don't know if there is any number of tests I would accept in exchange for willingly giving up understanding of the code.
jiggawatts · 2 months ago
> The inability of developers to tell if a tool sped them up or slowed them down is fascinating in itself

Linux/UNIX users are convinced of the superiority of keyboard control and CLI tools, but studies have shown that the mouse is faster for almost all common tasks.

Keyboard input feels faster because there are more actions per second.

mhuffman · 2 months ago
>but studies have shown that the mouse is faster for almost all common tasks.

Do you think that daily CLI Linux/UNIX users might have a different list of what they consider "common tasks"?

Alex_L_Wood · 2 months ago
We all as humans are hardwired to prefer greedy algorithms, basically.
blake1 · 2 months ago
I think a reasonable summary of the study referenced is that: "AI creates the perception of productivity enhancements far beyond the reality."

Even within the study, there were some participants who saw mild improvements to productivity, but most had a significant drop in productivity. This thread is now full of people telling their story about huge productivity gains they made with AI, but none of the comments contend with the central insight of this study: that these productivity gains are illusions. AI is a product designed to make you value the product.

In matters of personal value, perception is reality, no question. Anyone relying heavily on AI should really be worried that it is mostly a tool for warping their self-perception, one that creates dependency and a false sense of accomplishment. After all, it speaks a highly optimized stream of tokens at you, and you really have to wonder what the optimization goal was.

thinkingemote · 2 months ago
It's like the difference between being fast and quick. AI tools make the developer feel quick but they may not be fast. It's less cognitive effort in some ways. It's an interesting illusion, one that is based on changing emotions from different feedback loops and the effects of how memory forms.
asadotzler · 2 months ago
Quickness is a burst; speed is a flow.

Or, "slow is smooth, and smooth is fast"

BriggyDwiggs42 · 2 months ago
I’ve noticed that you can definitely use them to help you learn something, but that your understanding tends to be more abstract and LLM-like that way. You definitely want to mix it up when learning too.
daxfohl · 2 months ago
I've also had bad results with hallucinations there. I was trying to learn more about multi-dimensional qubit algorithms, and spent a whole day learning a bunch of stuff that was fascinating but plain wrong. I only figured out it was wrong at the end of the day when I tried to do a simulation and the results weren't consistent.

Early in the chat it substituted a `-1` for an `i`, and everything that followed was garbage. There were also some errors that I spotted real-time and got it to correct itself.

But yeah, IDK, it presents itself so confidently and "knows" so much and is so easy to use, that it's hard not to try to use as a reference / teacher. But it's also quite dangerous if you're not confirming things; it can send you down incorrect paths and waste a ton of time. I haven't decided whether the cost is worth the benefit or not.

Presumably they'll get better at this over time, so in the long run (probably no more than a year) it'll likely easily exceed the ROI breakeven point, but for now, you do have to remain vigilant.

tonyedgecombe · 2 months ago
I keep wondering whether the best way to use these tools is to do the work yourself then ask the AI to critique it, to find the bugs, optimisations or missing features.
nico · 2 months ago
> They are experienced open source developers, working on their own projects

I just started working on a 3-month old codebase written by someone else, in a framework and architecture I had never used before

Within a couple hours, with the help of Claude Code, I had already created a really nice system to replicate data from staging to local development. Something I had built before in other projects, and I new that manually it would take me a full day or two, especially without experience in the architecture

That immediately sped up my development even more, as now I had better data to test things locally

Then a couple hours later, I had already pushed my first PR. All code following the proper coding style and practices of the existing project and the framework. That PR, would have taken me at least a couple of days and up to 2 weeks to fully manually write out and test

So sure, AI won’t speed everyone or everything up. But at least in this one case, it gave me a huge boost

As I keep going, I expect things to slow down a bit, as the complexity of the project grows. However, it’s also given me the chance to get an amazing jumpstart

Vegenoid · 2 months ago
I have had similar experiences as you, but this is not the kind of work that the study is talking about:

“When open source developers working in codebases that they are deeply familiar with use AI tools to complete a task, they take longer to complete that task”

I have anecdotally found this to be true as well, that an LLM greatly accelerates my ramp up time in a new codebase, but then actually leads me astray once I am familiar with the project.

Navarr · 2 months ago
> I have anecdotally found this to be true as well, that an LLM greatly accelerates my ramp up time in a new codebase, but then actually leads me astray once I am familiar with the project.

If you are unfamiliar with the project, how do you determine that it wasn't leading you astray in the first place? Do you ever revisit what you had done with AI previously to make sure that, once you know your way around, it was doing it the right way?

Gormo · 2 months ago
> I have anecdotally found this to be true as well, that an LLM greatly accelerates my ramp up time in a new codebase, but then actually leads me astray once I am familiar with the project.

How does using AI impact the amount of time it takes you to become sufficiently familiar with the project to recognize when you are being led astray?

One of the worries I have with the fast ramp-up is that a lot of that ramp-up time isn't just grunt work to be optimized a way, it's active learning, and bypassing too much of it can leave you with an incomplete understanding of the problem domain that slows you down perpetually.

Sometimes, there are real efficiencies to be gained; other times those perceived efficiencies are actually incurring heavy technical debt, and I suspect that overuse of AI is usually the latter.

pragma_x · 2 months ago
Not just new code-bases. I recently used an LLM to accelerate my learning of Rust.

Coming from other programming languages, I had a lot of questions that would be tough to nail down in a Google search, or combing through docs and/or tutorials. In retrospect, it's super fast at finding answers to things that _don't exist_ explicitly, or are implied through the lack of documentation, or exist at the intersection of wildly different resources:

- Can I get compile-time type information of Enum values?

- Can I specialize a generic function/type based on Enum values?

- How can I use macros to reflect on struct fields?

- Can I use an enum without its enclosing namespace, as I can in C++?

- Does rust have a 'with' clause?

- How do I avoid declaring timelines on my types?

- What is an idiomatic way to implement the Strategy pattern?

- What is an idiomatic way to return a closure from a function?

...and so on. This "conversation" happened here and there over the period of two weeks. Not only was ChatGPT up to the task, but it was able to suggest what technologies would get me close to the mark if Rust wasn't built to do what I had in mind. I'm now much more comfortable and competent in the language, but miles ahead of where I would have been without it.

davidclark · 2 months ago
> That PR, would have taken me at least a couple of days and up to 2 weeks to fully manually write out and test

What is your accuracy on software development estimates? I always see these productivity claims matched again “It would’ve taken me” timelines.

But, it’s never examined if we’re good at estimating. I know I am not good at estimates.

It’s also never examined if the quality of the PR is the same as it would’ve been. Are you skipping steps and system understanding which let you go faster, but with a higher % chance of bugs? You can do that without AI and get the same speed up.

OptionOfT · 2 months ago
Now the question is: did you gain the same knowledge and proficiency in the codebase that you would've gained organically?

I find that when working with an LLM the difference in knowledge is the same as learning a new language. Learning to understanding another language is easier than learning to speak another language.

It's like my knowledge of C++. I can read it, and I can make modifications of existing files. But writing something from scratch without a template? That's a lot harder.

nico · 2 months ago
Some additional notes given the comments in the thread

* I wasn’t trying to be dismissive of the article or the study, just wanted to present a different context in which AI tools do help a lot

* It’s not just code. It also helps with a lot of tasks. For example, Claude Code figured out how to “manually” connect to the AWS cluster that hosted the source db, tested different commands via docker inside the project containers and overall helped immensely with discovery of the overall structure and infrastructure of the project

* My professional experience as a developer, has been that 80-90% of the time, results trump code quality. That’s just the projects and companies I’ve been personally involved with. Mostly saas products in which business goals are usually considered more important than the specifics of the tech stack used. This doesn’t mean that 80-90% of code is garbage, it just means that most of the time readability, maintainability and shipping are more important than DRY, clever solutions or optimizations

* I don’t know how helpful AI is or could be for things that require super clever algorithms or special data structures, or where code quality is incredibly important

* Having said that, the AI tools I’ve used can write pretty good quality code, as long as they are provided with good examples and references, and the developer is on top of properly managing the context

* Additionally, these tools are improving almost on a weekly or monthly basis. My experience with them has drastically changed even in the last 3 months

At the end of the day, AI is not magic, it’s a tool, and I as the developer, am still accountable for the code and results I’m expected to deliver

PaulDavisThe1st · 2 months ago
TFA was specifically about people very familiar with the project and codebase that they are working on. Your anecdots is precisely the opposite of the situation is was about, and it acknowledged the sort of process you describe.
kevmo314 · 2 months ago
You've missed the point of the article, which in fact agrees with your anecdote.

> It's equally common for developers to work in environments where little value is placed on understanding systems, but a lot of value is placed on quickly delivering changes that mostly work. In this context, I think that AI tools have more of an advantage. They can ingest the unfamiliar codebase faster than any human can, and can often generate changes that will essentially work.

moogleii · 2 months ago
That would be an aside, or a comment, not the point of the article.
antonvs · 2 months ago
> You've missed the point of the article

Sadly clickbait headlines like the OP, "AI slows down open source developers," spread this misinformation, ensuring that a majority of people will have the same misapprehension.

samtp · 2 months ago
Well that's exactly what it does well at the moment. Boilerplate starter templates, landing pages, throwaway apps, etc. But for projects that need precision like data pipelines, security - it code generated has many subtle flaws that can/will cause giant headaches in your project unless you dig through every line produced
quantumHazer · 2 months ago
You clearly have not read the study. Problem is developers thought they were 20% faster, but they were actually slower. Anyway from a fast review about your profile you're in conflict of interest about vibe coding, so I will definitely take your opinion with a grain of salt.
floren · 2 months ago
> Anyway from a fast review about your profile you're in conflict of interest about vibe coding

Seems to happen every time, doesn't it?

xoralkindi · 2 months ago
How are you confident in the code, coding style and practices simply because the LLM says so. How do you know it is not hallucinating since you don't understand the codebase?

Deleted Comment

bko · 2 months ago
When anecdote and data don't align, it's usually the data that's wrong.

Not always the case, but whenever I read about these strained studies or arguments about how AI is actually making people less productive, I can't help but wonder why nearly every programmer I know, myself included, finds value in these tools. I wonder if the same thing happened with higher level programming languages where people argued, you may THINK not managing your own garbage collector will lead to more productivity but actually...

Even if we weren't more "productive", millions prefer to use these tools, so it has to count for something. And I don't need a "study" to tell me that

adrian_b · 2 months ago
TFA says clearly that it is likely that AI will make more productive anyone working on an unfamiliar code base, but make less productive those working on a project they understand well, and it gives reasonable arguments for why this is likely to happen.

Moreover, it acknowledges that for programmers working in most companies the first case is much more frequent.

rsynnott · 2 months ago
> I can't help but wonder why nearly every programmer I know, myself included, finds value in these tools.

One of the more interesting findings of the study mentioned was that the LLM users, even where use of an LLM had apparently degraded their performance, tended to believe it had enhanced it. Anecdote is a _really_ bad argument against data that shows a _perception_ problem.

> Even if we weren't more "productive", millions prefer to use these tools, so it has to count for something.

I mean, on that basis, so does homeopathy.

Like, it's just one study. It's not the last word. But "my anecdotes disprove it" probably isn't a _terribly_ helpful approach.

markstos · 2 months ago
I had a similar experience with AI and open source. AI allowed me to implement features in a language and stack I didn't know well. I had wanted these features for months and no one else was volunteering to implement them. I had tried to study the stack directly myself, but found the total picture to be complex and under-documented for people getting started.

Using Warp terminal (which used Claude) I was get past those barriers and achieve results that weren't happening at all before.

tomasz_fm · 2 months ago
Only one developer in this study had more than 50h of Cursor experience, including time spent using Cursor during the study. That one developer saw a 25% speed improvement.

Everyone else was an absolute Cursor beginner with barely any Cursor experience. I don't find it surprising that using tools they're unfamiliar with slows software engineers down.

I don't think this study can be used to reach any sort of conclusion on use of AI and development speed.

narush · 2 months ago
Hey, thanks for digging into the details here! Copying a relevant comment (https://news.ycombinator.com/item?id=44523638) from the other thread on the paper, in case it's help on this point.

1. Some prior studies that find speedup do so with developers that have similar (or less!) experience with the tools they use. In other words, the "steep learning curve" theory doesn't differentially explain our results vs. other results.

2. Prior to the study, 90+% of developers had reasonable experience prompting LLMs. Before we found slowdown, this was the only concern that most external reviewers had about experience was about prompting -- as prompting was considered the primary skill. In general, the standard wisdom was/is Cursor is very easy to pick up if you're used to VSCode, which most developers used prior to the study.

3. Imagine all these developers had a TON of AI experience. One thing this might do is make them worse programmers when not using AI (relatable, at least for me), which in turn would raise the speedup we find (but not because AI was better, but just because with AI is much worse). In other words, we're sorta in between a rock and a hard place here -- it's just plain hard to figure out what the right baseline should be!

4. We shared information on developer prior experience with expert forecasters. Even with this information, forecasters were still dramatically over-optimistic about speedup.

5. As you say, it's totally possible that there is a long-tail of skills to using these tools -- things you only pick up and realize after hundreds of hours of usage. Our study doesn't really speak to this. I'd be excited for future literature to explore this more.

In general, these results being surprising makes it easy to read the paper, find one factor that resonates, and conclude "ah, this one factor probably just explains slowdown." My guess: there is no one factor -- there's a bunch of factors that contribute to this result -- at least 5 seem likely, and at least 9 we can't rule out (see the factors table on page 11).

I'll also note that one really important takeaway -- that developer self-reports after using AI are overoptimistic to the point of being on the wrong side of speedup/slowdown -- isn't a function of which tool they use. The need for robust, on-the-ground measurements to accurately judge productivity gains is a key takeaway here for me!

(You can see a lot more detail in section C.2.7 of the paper ("Below-average use of AI tools") -- where we explore the points here in more detail.)

brulard · 2 months ago
1. That does not support these results in any way 2. Having experience prompting is quite a little part of being able to use agentic IDE tools. It's like relating cutting onion to being a good cook

I think we should all focus on how the effectivity is going to change in the long-term. We all know AI tooling is not going to disappear but to become better and better. I wouldn't be afraid to lose some productivity for months if I would acquire new skills for the future.

WhyNotHugo · 2 months ago
An interesting little detail. Any seasoned developer is likely going to take substantially longer if they have to use any IDE except their everyday one.

I've been using Vim/Neovim for over a decade. I'm sure if I wanted to use something like Cursor, it would take me at least a month before I can productive even a fraction of my usual.

bagacrap · 2 months ago
I recently switched from vim (16 years) to vscode and perceived my productivity to be about the same after one week.

No objective measurements here; it might have even increased. But either way, "a month to regain a fraction of productivity" is extreme hyperbole, for me at least.

Art9681 · 2 months ago
This is exactly my same take. Any tool an engineer is inexperienced with will slow them down. AI is no different.
bluefirebrand · 2 months ago
This runs counter to the starry eyed promises of AI letting people with no experience accomplish things

Deleted Comment

yomismoaqui · 2 months ago
Someone on X said that these agentic AI tools (Claude Code, Amp, Gemini Cli) are to programming like the table saw was to hand-made woodworking.

It can make some things faster and better than a human with a saw, but you have to learn how to use them right (or you will loose some fingers).

I personally find that agentic AI tools make me be more ambitious in my projects, I can tackle some things I didn't tthougth about doing before. And I also delegate work that I don't like to them because they are going to do it better and quicker than me. So my mind is free to think on the real problems like architecture, the technical debt balance of my code...

Problem is that there is the temptation of letting the AI agent do everything and just commit the result without understanding YOUR code (yes, it was generated by an AI but if you sign the commit YOU are responsible for that code).

So as with any tool try to take the time to understand how to better use it and see if it works for you.

candiddevmike · 2 months ago
> to programming like the table saw was to hand-made woodworking

This is a ridiculous comparison because the table saw is a precision tool (compared to manual woodworking) when agentic AI is anything but IMO.

marcellus23 · 2 months ago
The nature of the comparison is in the second paragraph. It's nothing to do with how precise it is.
bgwalter · 2 months ago
"You are using it wrong!"

This is insulting to all pre-2023 open source developers, who produced the entire stack that the "AI" robber barons use in their companies.

It is even more insulting because no actual software of value has been demonstrably produced using "AI".

yomismoaqui · 2 months ago
> It is even more insulting because no actual software of value has been demonstrably produced using "AI".

Claude Code and Amp (equivalent from Sourcegraph) are created by humans using these same tools to add new features and fix bugs.

Having used both tools for some weeks I can tell you that they provide a great value to me, enough that I see paying $100 monthly as a bargain related to that value.

Edit: typo

antimora · 2 months ago
I'm one of the regular code reviewers for Burn (a deep learning framework in Rust). I recently had to close a PR because the submitter's bug fix was clearly written entirely by an AI agent. The "fix" simply muted an error instead of addressing the root cause. This is exactly what AI tends to do when it can't identify the actual problem. The code was unnecessarily verbose and even included tests for muting the error. Based on the person's profile, I suspect their motivation was just to get a commit on their record. This is becoming a troubling trend with AI tools.
dawnerd · 2 months ago
That's what I love about LLMs. You can spot it doesn't know the answer, tell it that it's wrong and it'll go, "You're absolutely right. Let me actually fix it"

It scares me how much code is being produced by people without enough experience to spot issues or people that just gave up caring. We're going to be in for wild ride when all the exploits start flowing.

cogman10 · 2 months ago
My favorite LLM moment. I wrote some code, asked the LLM "Find any bugs or problems with this code" and of course what it did was hyperfocus on an out of date comment (that I didn't write). Since the problem no longer existed identified in the comment, the LLM just spat out like 100 lines of garbage to refactor the code.
rectang · 2 months ago
> "You're absolutely right."

I admit a tendency to anthropomorphize the LLM and get irritated by this quirk of language, although it's not bad enough to prevent me from leveraging the LLM to its fullest.

The key when acknowledging fault is to show your sincerity through actual effort. For technical problems, that means demonstrating that you have worked to analyze the issue, take corrective action, and verify the solution.

But of course current LLMs are weak at understanding, so they can't pull that off. I wish that the LLM could say, "I don't know", but apparently the current tech can't know that that it doesn't know.

And so, as the LLM flails over and over, it shamelessly kisses ass and bullshits you about the work its doing.

I figure that this quirk of LLMs will be minimized in the near future by tweaking the language to be slightly less obsequious. Improved modeling and acknowledging uncertainty will be a heavier lift.

colechristensen · 2 months ago
I also get things like this from very experienced engineers working outside their area of expertise. It's obviously less of the completely boneheaded suggestion but still doing exactly the wrong thing suggested by AI that required a person to step in and correct.
daxfohl · 2 months ago
It'd be nice if github had a feature that updated the issue with this context automatically too, so that if this agent gives up and closes the PR, the next agent doesn't go and do the exact same thing.
candiddevmike · 2 months ago
> tell it that it's wrong and it'll go, "You're absolutely right. Let me actually fix it"

...and then it still doesn't actually fix it

Macha · 2 months ago
I recently reviewed a MR from a coworker. There was a test that was clearly written by AI, except I guess however he prompted it, it gave some rather poor variable names like "thing1", "thing2", etc. in test cases. Basically, these were multiple permutations of data that all needed to be represented in the result set. So I asked for them to be named distinctively, maybe by what makes them special.

It's clear he just took that feedback and asked the AI to make the change, and it came up with a change that gave them all very long, very unique names, that just listed all the unique properties in the test case. But to the extent that they sort of became noise.

It's clear writing the PR was very fast for that developer, I'm sure they felt they were X times faster than writing it themselves. But this isn't a good outcome for the tool either. And I'm sure if they'd reviewed it to the extent I did, a lot of that gained time would have dissipated.

meindnoch · 2 months ago
>a deep learning framework in Rust [...] This is becoming a troubling trend with AI tools.

The serpent is devouring its own tail.

TeMPOraL · 2 months ago
OTOH when they'll start getting good AI contributions, then... it'll be too late for us all.
LoganDark · 2 months ago
Deep learning can be incredibly cool and not just used for AI slop.
jampa · 2 months ago
> I suspect their motivation was just to get a commit on their record. This is becoming a troubling trend with AI tools.

It has been for a while, AI just makes SPAM more effective:

https://news.ycombinator.com/item?id=24643894

pennomi · 2 months ago
This is the most frustrating thing LLMs do. They put wide try:catch structures around the code making it impossible to actually track down the source of a problem. I want my code to fail fast and HARD during development so I can solve every problem immediately.
daxfohl · 2 months ago
Seems like there's a need for github to create a separate flow for AI-cretaed PRs. Project maintainers should be able to stipulate rules like this in English, and an AI "pre-reviewer" would check that the AI has followed all these rules before the PR is created, and chat with the AI submitter to resolve any violations. For exceptional cases, a human submitter is required.

Granted, the compute required is probably more expensive than github would offer for free, and IDK whether it'd be within budget for many open-source projects.

Also granted, something like this may be useful for human-sourced PRs as well, though perhaps post-submission so that maintainers can see and provide some manual assistance if desired. (And also granted, in some cases maybe maintainers would want to provide manual assistance to AI submissions, but I expect the initial triaging based on whether it's a human or AI would be what makes sense in most cases).

kfajdsl · 2 months ago
This is my number one complaint with LLM produced code too. The worst thing is when it swallows an error to print its own error message with far less info and no traceback.

In my rules I tell it that try catches are completely banned unless I explicitly ask for one (an okay tradeoff, since usually my error boundaries are pretty wide and I know where I want them). I know the context length is getting too long when it starts ignore that.

0xbadcafebee · 2 months ago
> The "fix" simply muted an error instead of addressing the root cause.

FWIW, I have seen human developers do this countless times. In fact there are many people in engineering that will argue for these kinds of "fixes" by default. Usually it's in closed-source projects where the shittiness is hidden from the world, but trust me, it's common.

> I suspect their motivation was just to get a commit on their record. This is becoming a troubling trend with AI tools.

There was already a problem (pre-AI) with shitty PRs on GitHub made to try to game a system. Regardless of how they made the change, the underlying problem is a policy one: how to deal with people making shitty changes for ulterior motives. I expect the solution is actually more AI to detect shitty changes from suspicious submitters.

Another solution (that I know nobody's going to go for): stop using GitHub. Back in the "olden times", we just had CVS, mailing lists and patches. You had to perform some effort in order to get to the point of getting the change done and merged, and it was not necessarily obvious afterward that you had contributed. This would probably stop 99% of people who are hoping for a quick change to boost their profile.

nerdjon · 2 months ago
I will never forget being in a code review for a upcoming release, there was a method that was... different. Like massively different with no good reason why it was changed as much as it was for such a small addition.

We asked the person why they made the change, and "silence". They had no reason. It became painfully clear that all they did was copy and paste the method into an LLM and say "add this thing" and it spit out a completely redone method.

So now we had a change that no one in the company actually knew just because the developer took a shortcut. (this change was rejected and reverted).

The scariest thing to me is no one actually knowing what code is running anymore with these models having a tendency to make change for the sake of making change (and likely not actually addressing the root thing but a shortcut like you mentioned)

tomrod · 2 months ago
As a side question: I work in AI, but mostly python and theory work. How can I best jump into Burn? Rust has been intriguing to me for a long time
lvl155 · 2 months ago
This is a real problem that’s only going to get worse. With the major model providers basically keeping all the data themselves, I frankly don’t like this trend long term.
doug_durham · 2 months ago
You should be rejecting the PR because the fix was insufficient, not because it was AI agent written. Bad code is bad code regardless of the source. I think the fixation on how the code was generated is not productive.
glitchc · 2 months ago
No, that's not how code review works. Getting inside the mind of the developer, understanding how they thought about the fix, is critical to the review process.

If an actual developer wrote this code and submitted it willingly, it would either constitute malice, an attempt to sabotage the codebase or inject a trojan, or stupidity, for failing to understand the purpose of the error message. With an LLM we mostly have stupidity. Flagging it as such reveals the source of the stupidity, as LLMs do not actually understand anything.

RobinL · 2 months ago
The problem is that code often takes as long to review as to write, and AI potentially reduces the quality bar to pull requests. So maintainers have a problem of lots of low quality PRs that take time to reject
rustyminnow · 2 months ago
> You should be rejecting the PR because the fix was insufficient

I mean they probly could've articulated it your way, but I think that's basically what they did... they point out the insufficient "fix" later, but the root cause of the "fix" was blind trust in AI output, so that's the part of the story they lead with.

andix · 2 months ago
What I noticed: AI development constantly breaks my flow. It makes me more tired, and I work for shorter time periods on coding.

It's a myth that you can code a whole day long. I usually do intervals of 1-3 hours for coding, with some breaks in between. Procrastination can even happen on work related things, like reading other project members code/changes for an hour. It has a benefit to some extent, but during this time I don't get my work done.

Agentic AI works the best for me. Small refactoring tasks on a selected code snippet can be helpful, but isn't a huge time saver. The worst are AI code completions (first version Copilot style), they are much more noise then help.

rightbyte · 2 months ago
It would be interesting to record what one do in a day at the desk. Probably quite depressing to watch.

Like, I think 1h would be streaching it for mature codebases.

andix · 2 months ago
The 1h I'm talking about is not all the time I might spend reading on code. It's the time I might procrastinate on my tasks with reading unrelated code.

Like doom scrolling on social media: Let's see what the fancy new guy got done this week. I need to feel better, I'm just going to look at the commits of the guy in the other team that always breaks production. Let's see how close he got to that recently, ...