Readit News logoReadit News
rsynnott · 5 months ago
This idea that you can get good results from a bad process as long as you have good quality control seems… dubious, to say the least. “Sure, it’ll produce endless broken nonsense, but as long as someone is checking, it’s fine.” This, generally, doesn’t really work. You see people _try_ it in industry a bit; have a process which produces a high rate of failures, catch them in QA, rework (the US car industry used to be notorious for this). I don’t know of any case where it has really worked out.

Imagine that your boss came to you, the tech lead of a small team, and said “okay, instead of having five competent people, your team will now have 25 complete idiots. We expect that their random flailing will sometimes produce stuff that kinda works, and it will be your job to review it all.” Now, you would, of course, think that your boss had gone crazy. No-one would expect this to produce good results. But somehow, stick ‘AI’ on this scenario, and a lot of people start to think “hey, maybe that could work.”

Manfred · 5 months ago
Reviewing code from less experienced or unmotivated people is also very taxing, both in a cognitive and emotional sense. It will never approach a really good level of quality because you just give up after 4 rounds of reviews on the same feature.
EdwardDiego · 5 months ago
Except humans learn from your PR comments and in other interactions with more experienced people, and so inexperienced devs become experienced devs eventually. LLMs are not so trainable.
btown · 5 months ago
Here’s the thing about AI though - you don’t need to worry about its confidence or impact on professional development if you’re overly critical, and it will do a turn within seconds. That gives a tremendous amount of flexibility and leverage to the code reviewer. Works better on some types of problems than others, but it’s worth exploring!
hakfoo · 5 months ago
With human co-workers, you can generally assume things you can't with AI.

My human co-workers generally have good faith. Even the developer who was clearly on the verge of getting a role elsewhere without his heart in it-- he tried to solve the problems assigned to him, not some random delusion that the words happened to echo. I don't have that level of trust with AI.

If there's a misunderstanding the problem or the context, it's probably still the product of a recognizable logic flow that you can use to discuss what went wrong. I can ask Claude "Why are you converting this amount from Serbian Dinars to Poppyseed Bagels in line 476?" but will its answer be meaningful?

Human code review often involves a bit of a shared background. We've been working with the same codebases for several years, so we're going to use existing conventions. In this situation, the "AI knows all and sees all" becomes an anti-feature-- it may optimize for "this is how most people solve this task from a blank slate" rather than "it's less of a cognitive burden for the overall process if your single change is consistent with 500 other similar structures which have been in place since the Clinton administration."

There may be ways to try to force-feed AI this behaviour, but the more effort you devote to priming and pre-configuring the machine, the less you're actually saving over doing the actual work in the first place.

HarHarVeryFunny · 5 months ago
Right, this is the exact opposite of the best practices that Edward Deming helped develop in Japan, then brought to the west.

Quality needs to come from the process, not the people.

Choosing to use a process known to be flawed, then hoping that people will catch the mistakes, doesn't seem like a great idea if the goal is quality.

The trouble is that LLMs can be used in many ways, but only some of those ways play to their strengths. Management have fantasies of using AI for everything, having either failed to understand what it is good for, or failed to learn the lessons of Japan/Deming.

thunky · 5 months ago
> Choosing to use a process known to be flawed, then hoping that people will catch the mistakes, doesn't seem like a great idea if the goal is quality.

You're also describing the software development process prior to LLMs. Otherwise code reviews wouldn't exist.

GarnetFloride · 5 months ago
Oh man, that's what I've been smelling with all this. It's the Red Bead Experiment, all over again. https://www.youtube.com/watch?v=ckBfbvOXDvU
giovannibonetti · 5 months ago
> Quality needs to come from the process, not the people.

Not sure which Japanese school of management you're following, but I think Toyota-style goes against that. The process gives more autonomy to workers than, say, Ford-style, where each tiny part of the process is pre-defined.

I got the impression that Toyota-style was considered to bring better quality to the product, even though it gives people more autonomy.

overfeed · 5 months ago
> Management have fantasies of using AI for everything, having either failed to understand what it is good for, or failed to learn the lessons of Japan/Deming.

Third option: they want to automate all jobs before the competition does. Think of it as AWS, but for labor.

stockresearcher · 5 months ago
> Deming helped develop in Japan

Deming’s process was about how to operate a business in a capital-intensive industry when you don’t have a lot of capital (with market-acceptable speed and quality). That you could continue to push it and raise quality as you increased the amount of capital you had was a side-effect, and the various Japanese automakers demonstrated widely different commitments to it.

And I’m sure you know that he started formulating his ideas during the Great Depression and refined them while working on defense manufacturing in the US during WWII.

jvanderbot · 5 months ago
What happens is a kind of feeling of developing a meta skill. It's tempting to believe the scope of what you can solve has expanded when you are self-assessed as "good" with AI.

Its the same with any "general" tech. I've seen it since genetic algorithms were all the rage. Everyone reaches for the most general tool, then assumes everything that tool might be used for is now a problem or domain they are an expert in, with zero context into that domain. AI is this times 100x, plus one layer more meta, as you can optimize over approaches with zero context.

CuriouslyC · 5 months ago
That's an oversimplification. AI can genuinely expand the scope of things you can do. How it does this is a bit particular though, and bears paying attention to.

Normally, if you want to achieve some goal, there is a whole pile of tasks you need to be able to complete to achieve it. If you don't have the ability complete any one of those tasks, you will be unable to complete the goal, even if you're easily able to accomplish all the other tasks involved.

AI raises your capability floor. It isn't very effective at letting you accomplish things that are meaningfully outside your capability/comprehension, but if there are straightforward knowledge/process blockers that don't involve deeper intuition it smooths those right out.

monkeyelite · 5 months ago
Yep. All the process in the world won’t teach you to make a system that works.

The pattern I see over and over is a team aimlessly putting a long through tickets in sprints until an engineer who knows how to solve the problem gets it on track personally.

keeda · 5 months ago
1. The flaw in this premise is that the process is bad. Aside from the countless anecdotal reports about how AI and agents are improving productivity, there are actual studies showing 25 - 55% boosts. Yes, RCTs at larger size than the METR one that keeps getting bandied about: https://news.ycombinator.com/item?id=44860577 and many more on Google Scholar: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&as_ylo...

2. Quality control is key to good processes as well. Code review is literally a best practice in the software industry. Especially in BigTech and high-performing organizations. That is, even for humans, including those that could be considered the cream of the industry, code review is a standard step of the delivery process.

3. People have posted their GitHub profiles and projects (including on this very forum) to show how AI is working out for them. Browse through some of them and see how much "endless broken nonsense" you find. And if that seems unscientific, well go back to point 1.

dingnuts · 5 months ago
I picked one of the studies in the search (!) you linked. First of all, it's a bullshit debate tactic to try to overwhelm your opponents with vague studies -- a search is complete bullshit because it puts the onus on the other person to discredit the gargantuan amount of data you've flooded them with. Many of the studies in that search don't have anything to do with programming at all.

So right off the bat, I don't trust you. Anyway, I picked one study from the search to give you the benefit of the doubt. It compared leetcode in the browser to LLM generation. This tells us absolutely nothing about real world development.

What made the METR paper interesting was that they studied real projects, in the real world. We all know LLMs can solve well bounded problems in their data sets.

As for 3 I've seen a lot of broken nonsense. Let me know when someone vibe codes up a new mobile operating system or a competitor to KDE and Gnome lol

xyzzy123 · 5 months ago
I have a play project which hits these constraints a lot.

I have been messing around with getting AI to implement novel (to me) data structures from papers. They're not rocket science or anything but there's a lot of detail. Often I do not understand the complex edge cases in the algorithms myself so I can't even "review my way out of it". I'm also working in go which is usually not a very good fit for implementing these things because it doesn't have sum types; lack of sum types oten adds so much interface{} bloat it would render the data structure pointless. Am working around with codegen for now.

What I've had to do is demote "human review" a bit; it's a critical control but it's expensive. Rather, think more holistically about "guard rails" to put where and what the acceptance criteria should be. This means that when I'm reviewing the code I am reasonably confident it's functionally correct, leaving me to focus on whether I like how that is being achieved. This won't work for every domain, but if it's possible to automate controls, it feels like this is the way to go wherever possible.

The "principled" way to do this would be to use provers etc, but being more of an engineer I have resorted to ruthless guard rails. Bench tests that automatically fail if the runtime doesn't meet requirements (e.g. is O(n) instead of O(log n)) or overall memory efficiency is too low - and enforcing 100% code coverage from both unit tests AND fuzzing. Sometimes the cli agent is running for hours chasing indexes or weird bugs; the two main tasks are preventing it from giving up, and stopping it from "punting" (wait, this isn't working, let me first create a 100% correct O(n) version...) or cheating. Also reminding it to check AGAIN for slice sharing bugs which crop up a surprising % of the time.

The other "interesting" part of my workflow right now is that I have to manually shuffle a lot between "deep research" (which goes and reads all the papers and blogs about the data structure) and the cli agent which finds the practical bugs etc but often doesn't have the "firepower" to recognise when it's stuck in a local maximum or going around in circles. Have been thinking about an MCP that lets the cli agent call out to "deep research" when it gets really stuck.

roenxi · 5 months ago
The issue with the hypothetical is if you give a team lead 25 competent people they'd also get bad results. Or at least, the "team lead" isn't really leading their team on technical matters apart from fighting off the odd attempt to migrate to MongoDB and hoping that their people are doing the right thing. The sweet spot for teams is 3-6 people and someone more interested in empire building than technical excellence can handle maybe around 9 people and still do a competent job. It doesn't depend much on the quality of the people.

The way team leads seem to get used is people who are good at code get a little more productive as more people are told to report to them. What is happening now is the senior-level engineers all automatically get the same option: a team of 1-2 mid-level engineers on the cheap thanks to AI which is entirely manageable. And anyone less capable gets a small team, a rubber duck or a mentor depending on where they fall vs LLM use.

Of course, the real question is what will happen as the AIs get into the territory traditionally associated with 130+ IQ ranges and the engineers start to sort out how to give them a bit more object persistence.

bitwize · 5 months ago
> Imagine that your boss came to you, the tech lead of a small team, and said “okay, instead of having five competent people, your team will now have 25 complete idiots. We expect that their random flailing will sometimes produce stuff that kinda works, and it will be your job to review it all.”

This is exactly the point of corporate Agile. Management believes that the locus of competence in an organization should reside within management. Depending on competent programmers is thus a risk, and what is sought is a process that can simulate a highly competent programmer's output with a gang of mediocre programmers. Kinda like the myth that you can build one good speaker out of many crappy ones, or the principle of RAID which is to use many cheap, failure-prone drives to provide the reliability guarantees of one expensive, reliable drive (which also kinda doesn't work if the drives came from the same lot and are prone to fail at about the same time). Every team could use some sort of process, but usually if you want to retain good people, this takes the form of "disciplines regarding branching, merging, code review/approval, testing, CI, etc." Something as stifling as Scrum risks scaring your good people away, or driving them nuts.

So yes, people do expect it to work, all the time. And with AI in the mix, it now gains very nice "labor is more fungible with capital" properties. We're going to see some very nice, spectacular failures in the next few years as a result, a veritable Perseid meteor shower of critical systems going boom; and those companies that wish to remain going concerns will call in human programmers to clean up the mess (but probably lowball on pay and/or try to get away with outsourcing to places with dirt-cheap COL). But it'll still be a rough few years for us while management in many orgs gets high off their own farts.

cyphar · 5 months ago
It also assumes that people who are "good" at the standard code review process (which is tuned for reviewing code written by humans with some level of domain experience and thus finding human-looking mistakes) will be able to translate their skills perfectly to reviewing code written by AI. There have been plenty of examples where this review process was shown to be woefully insufficient for things outside of this scope (for instance, malicious patches like the bad patches scandal with Linux a few years ago or the xz backdoor were only discovered after the fact).

I haven't had to review too much AI code yet, but from what I've seen it tends to be the kind of code review that really requires you to think hard and so seems likely to lead to mistakes even with decent code reviewers. (I wouldn't say that I'm a brilliant code reviewer, but I have been doing open source maintenance full-time for around a decade at this point so I would say I have some experience with code reviews.)

steelblueskies · 5 months ago
Evolution via random mutation and selection.

Or more broadly, the existence of complex or any life.

Sure, it's not the way I would pick to do most things, but when your buzzword magical thinking so deep all that you have is a hammer, even if it doesn't look like a nail you will force your wage slaves to hammer it anyway until it works.

As to your other cases.. injection molded plastic parts for things like the spinning t bar spray arm in some dishwashers. Crap molds, pass to low wage or temp to razorblade fix by hand and box up. Personally worked such a temp job before, among others so yes that bad output manual qc and fix up abounds still.

And if we are talking high failure rates... see also chip binning and foundry yields in semiconductors.

Just have to look around to see the dubious seeming is more the norm.

nurettin · 5 months ago
I went from ";" to fully working C++ production grade code with good test coverage. To my estimation, 90% of the work was done in an agent prompt. It was a side project, now it will be my job. The process is like they described.

For the core parts you cannot let go of the reins. You have to keep steering it. You have to take short breaks and reload the code into the agent as it starts acting confused. But once you get the hang of it, things that would take you months of convincing yourself and picking yourself back up to continue becomes a day's work.

Once you have a decent amount of work done, you can have the agent read your code as documentation and use it to develop further.

Terr_ · 5 months ago
> but as long as someone is checking

I predict many disastrous "AI" failures because the designers somehow believed that "some humans capable of constant vigilant attention to detail" was an easy thing they could have.

altspace · 5 months ago
What I took away from the article was that being good at code review makes the person better at guiding the agent to do the job, giving the right context and constraints at the right time… and not that the code reviewer has to fix whatever agent generated… this is also pretty close to my personal experience… LLM models are a bull which can be guided and definitely not a complete idiot…

In a strange kind of analogy, flowing water can cause a lot of damage.. but a dam built to the right specification and turbines can harness that for something very useful… the art is to learn how to build that dam

gus_massa · 5 months ago
I'm not sure about the current state of the art, but microprocessors production is (was?) very bad. You make a lot of them in a single silicon wafer, and then test them thoughtfully until you find the few that are good. You drop all the defective ones because they are very cheap piece of sand and charge a lot for the ones that works correctly to cover all the costs.

I'm not sure how this translates to programming, code review is too expensive, but for short code you can try https://en.wikipedia.org/wiki/Superoptimization

CorrectHorseBat · 5 months ago
Design for test is still a major part of (high volume) chip design. Anything that can't be tested in seconds on wafer is basically worthless for mass production.
rsynnott · 5 months ago
In that case, tho, no-one’s saying “let’s be sloppy with production and make up for it in the QA” (which really used to be a US car industry strategy until the Japanese wiped the floor with them); the process is as good as it reasonably can be, there are just physical limits. Chip manufacturers spend vast amounts on reducing the error rate.

Deleted Comment

moffkalast · 5 months ago
> Sure, it’ll produce endless broken nonsense, but as long as someone is checking, it’s fine

Well you've just described an EKF on a noisy sensor.

esafak · 5 months ago
I do not think anybody is going to get that reference. https://xkcd.com/2501/
estimator7292 · 5 months ago
Imagine a factory making injection molded plastic toys but instead of pumping out perfect parts 99.999% of the time, the machine gives you 50% and you have to pay people to pull out the bad ones from a full speed assembly line and hope no bad ones get through.
yen223 · 5 months ago
Is this not how microchips are made?
tempodox · 5 months ago
> … good results from a bad process …

Even if the process weren’t technically bad, it would still be shit. Doing code review with a human has meaning in that the human will probably learn something, and it’s an investment in the future. Baby-sitting an LLM, however, is utterly meaningless.

ben_w · 5 months ago
> I don’t know of any case where it has really worked out.

Supermarket vegetables.

HarHarVeryFunny · 5 months ago
Are you saying that supermarket vegetables/produce are good?

Quite a bit of it, like Tomatoes and Strawberries, is just crap. Form over substance. Nice color and zero flavor. Selected for delivery/shelf-life/appearance rather actually being any good.

nkmnz · 5 months ago
> This idea that you can get good results from a bad process

This idea is called "evolution"...

> as long as you have good quality control

...and it's QA is death on every single level of the systems: cell, organism, species, and ecosystem. You must consider that those devs or companies with not-good-enough QA will end up dead (from a business perspective).

dwattttt · 5 months ago
Evolution is extremely inefficient at producing good designs. Given enough time it'll explore more, because it's driven randomly, but most mutations either don't help, or downright hurt an organism's survival.
rsynnott · 5 months ago
I look forward to software which takes several million years to produce and tends to die of Software Cancer.

Like, evolution is not _good_ at ‘designing’ things.

bluefirebrand · 5 months ago
So we're software evolvers now, not engineers?

Sounds like a stupid path forward to me

ChrisMarshallNY · 5 months ago
That depends.

If the engineer, doing the implementation is top-shelf, you can get very good results from a “flawed” process (in quotes, because it’s not actually “bad.” It’s just a process that depends on the engineer being that particular one).

Silicon Valley is obsessed with process over people, manifesting “magical thinking” that a “perfect” process eliminates the need for good people.

I have found the truth to be in-between. I worked for a company that had overwhelming Process, but that process depended on good people, so it hired top graduates, and invested huge amounts of money and time into training and retention.

marklubi · 5 months ago
Said a little more crass/simply: A people hire A people. B people hire C people.

The first is phenomenal until someone makes a mistake and brings in a manager or supervisor from the C category that talks the talk but doesn't walk the walk.

If you accidentally end up in one that turns out to be the later. It's maddening trying to get anything accomplished if the task involves anyone else.

Hire slow, fire fast.

rhetocj23 · 5 months ago
Steve Jobs said this decades ago.

Its the content that matters, not the process.

TrinaryWorksToo · 5 months ago
Bayesian reasoning would lead me to think that a high rate of failures means even if QA is 99.9% amazing and dev is AI 80% slop, there still will be more poor features and bugs (99.9% * 80% = 79.92%) than if both are mediocore (%90 * %90 = 81%)
swaptr · 5 months ago
AI-generated code can be useful in the early stages of a project, but it raises concerns in mature ones. Recently, a 280kloc+ Postgres parser was merged into Multigres (https://github.com/multigres/multigres/pull/109) with no public code review. In open source, this is worrying. Many people rely on these projects for learning and reference. Without proper review, AI-generated code weakens their value as teaching tools, and more importantly the trust in pulling as dependencies. Code review isn’t just about bugs, it’s how contributors learn, understand design choices, and build shared knowledge. The issue isn’t speed of building software (although corporations may seem to disagree), but how knowledge is passed on.

Edit: Reference to the time it took to open the PR: https://www.linkedin.com/posts/sougou_the-largest-multigres-...

sougou · 5 months ago
I oversaw this work, and I'm open to feedback on how things can be improved. There are some factors that make this particular situation different:

This was an LLM assisted translation of the C parser from Postgres, not something from the ground up.

For work of this magnitude, you cannot review line by line. The only thing we could do was to establish a process to ensure correctness.

We did control the process carefully. It was a daily toil. This is why it took two months.

We've ported most of the tests from Postgres. Enough to be confident that it works correctly.

Also, we are in the early stages for Multigres. We intend to do more bulk copies and bulk translations like this from other projects, especially Vitess. We'll incorporate any possible improvements here.

The author is working on a blog post explaining the entire process and its pitfalls. Please be on the lookout.

I was personally amazed at how much we could achieve using LLM. Of course, this wouldn't have been possible without a certain level of skill. This person exceeds all expectations listed here: https://github.com/multigres/multigres/discussions/78.

wg002 · 5 months ago
"We intend to do more bulk copies and bulk translations like this from other projects"

Supabase’s playbook is to replicate existing products and open source projects, release them under open source, and monetize the adoption. They’ve repeated this approach across multiple offerings. With AI, the replication process becomes even faster, though it risks producing low-quality imitations that alienate the broader community and people will resent the stealing of their work.

brap · 5 months ago
My process is basically

1. Give it requirements

2. Tell it to ask me clarifying questions

3. When no more questions, ask it to explain the requirements back to me in a formal PRD

4. I criticize it

5. Tell it to come up with 2 alternative high level designs

6. I pick one and criticize it

7. Tell it to come up with 2 alternative detailed TODO lists

8. I pick one and criticize it

9. Tell it to come up with 2 alternative implementations of one of the TODOs

10. I pick one and criticize it

11. Back to 9

I usually “snapshot” outputs along the way and return to them to reduce useless context.

This is what produces the most decent results for me, which aren’t spectacular but at the very least can be a baseline for my own implementation.

It’s very time consuming and 80% of the time I end up wondering if it would’ve been quicker to just do it all by myself right from the start.

codingdave · 5 months ago
Definitely sounds slower than doing it yourself.

I am falling into a pattern of treating AI coding like a drunk mid-level dev: "I saw those few paragraphs of notes you wrote up on a napkin, and stayed up late Saturday night while drinking and spat out this implementation. you like?"

So I can say to myself, "No, do not like. But the overall gist at least started in the right direction, so I can revise it from here and still be faster than had I done it myself on Monday morning."

jvanderbot · 5 months ago
The most useful thing I've found is "I need to do X, show me 3 different popular libraries that do it". I've really limited my AI use to "Lady's Illustrated Primer" especially after some bad experiences with AI code from devs who should know better.
rco8786 · 5 months ago
> It’s very time consuming and 80% of the time I end up wondering if it would’ve been quicker to just do it all by myself right from the start.

Yes, this. Every time I read these sort of step by step guides to getting the best results with coding agents it all just sounds like boatloads of work that erase the efficiency margins that AI is supposed to bring in the first place. And anecdotally, I've found that to be true in practice as well.

Not to say that AI isn't useful. But I think knowing when and where AI will be useful is a skill in and of itself.

Leynos · 5 months ago
At least for me, I can have five of these processes running at once. I can also use Deepresearch for generating the designs with a survey of literature. I can use NotebookLM to analyse the designs. And I use Sourcery, CodeRabbit, Codex and Codescene together to do code review.

It took me a long time to get there with custom cli tools and browser userscripts. The out of the box tooling is very limited unless you are willing to pay big £££s for Devin or Blitzy.

jwrallie · 5 months ago
I think I’m working at lower levels, but usually my flow is:

- I start to build or refactor the code structure by myself creating the basic interfaces or skip to the next step when they already exist. I’ll use LLMs as autocomplete here.

- I write down the requirements and tell which files are the entry point for the changes.

- I do not tell the agent my final objective, only one step that gets me closer to it, and one at a time.

- I watch carefully and interrupt the agent as soon as I see something going wrong. At this point I either start over if my requirement assumptions were wrong or just correct the course of action of the agent if it was wrong.

Most of the issues I had in the past were from when I write down a broad objective that requires too many steps at the beginning. Agents cannot judge correctly when they finished something.

stavros · 5 months ago
I have a similar, though not as detailed, process. I do the same as you up to the PRD, then give it the PRD and tell it the high level architecture, and ask it to implement components how I want them.

It's still time-consuming, and it probably would be faster for me to do it myself, but I can't be bothered manually writing lines of code any more. I maybe should switch to writing code with the LLM function by function, though.

bluefirebrand · 5 months ago
> but I can't be bothered manually writing lines of code any more. I maybe should switch to writing code with the LLM function by function, though.

Maybe you should consider a change of career :/

scuff3d · 5 months ago
That's like a chef saying they can't be bothered to cook...
scuff3d · 5 months ago
Yeah, sounds like it would have been far quicker to use the AI to give you a general overview of approaches/libraries/language features/etc, and then done the work yourself.
lapcat · 5 months ago
If you are good at code review, you will also be good at not using AI agents.
fhd2 · 5 months ago
This. Having had the pleasure to review the work and fix the bugs of agent jockeys (generally capable developers that fell in love with Claude Code et al), I'm rather sceptical. The code often looks as if they were on mushrooms. They cannot reason about it whatsoever, like they weren't even involved, when I know they weren't completely hands off.

I really believe there are people out there that produce good code with these things, but all I've seen so far has been tragic.

Luckily, I've witnessed a few snap out of it and care again. Literally looks to me as if they had a substance abuse problem for a couple of months.

If you take a critical look at what comes out of contemporary agentic workflows, I think the conclusion must be that it's not there. So yeah, if you're a good reviewer, you would perhaps come to that conclusion much sooner.

coffeefirst · 5 months ago
Yeah.

I'm not even anti-LM. Little things—research, "write TS types for this object", search my codebase, go figure out exactly what line in the Django rest framework is causing this weird behavior, —are working great and saving me an hour here and 15m there.

It's really obvious when people lean on it, because they don't act like a beginner (trying things that might not work) or just being sloppy (where there's a logic ot it but there's no attention to detail), but it's like they copy pasted from Stackoverflow search results at random and there are pieces that might belong but the totality is incoherent.

bluefirebrand · 5 months ago
> I really believe there are people out there that produce good code with these things, but all I've seen so far has been tragic

I don't believe this at all, because all I've seen so far is tragic

I would need to see any evidence of good quality work coming from AI assisted devs before I start to entertain the idea myself. So far all I see is low effort low quality code that the dev themself is unable to reason about

carlmr · 5 months ago
>The code often looks as if they were on mushrooms. They cannot reason about it whatsoever

Interesting comparison, why not weed or alcohol?

penguin_booze · 5 months ago
If I'm good at code review, I want to get better at it.
sublinear · 5 months ago
> If you’re a nitpicky code reviewer, I think you will struggle to use AI tooling effectively. [...] Likewise, if you’re a rubber-stamp code reviewer, you’re probably going to put too much trust in the AI tooling.

So in other words, if you are good at code review you are also good enough at writing code that you will be better off writing it yourself for projects you will be responsible for maintaining long term. This is true for almost all of them if you work at a sane place or actually care about your personal projects. Writing code for you is not a chore and you can write it as fluently and quickly as anything else.

Your time "using AI" is much better spent filling in the blanks when you're unfamiliar with a certain tool or need to discover a new one. In short, you just need a few google searches a day... just like it ever was.

I will admit that modern LLMs have made life easier here. AI summaries on search engines have indeed improved to the point where I almost always get my answer and I no longer get hung up meat-parsing poorly written docs or get nerd-sniped pondering irrelevant information.

its-kostya · 5 months ago
Code review is part of the job, but one of the least enjoyable parts. Developers like _writing_ and that gives the most job satisfaction. AI tools are helpful, but inherently increases the amount of code we have to review with more scrutiny than my colleagues because of how unpredictable - yet convincing - it can be. Why did we create tools that do the fun part and increase the non-fun part? Where are the "code-review" agents at?
jmcodes · 5 months ago
Maybe I'm weird but I don't actually enjoy the act of _writing_ code. I enjoy problem solving and creating something. I enjoy decomposing systems and putting them back together in a better state, but actually manually typing out code isn't something I enjoy.

When I use an LLM to code I feel like I can go from idea to something I can work with in much less time than I would have normally.

Our codebase is more type-safe, better documented, and it's much easier to refactor messy code into the intended architecture.

Maybe I just have lower expectations of what these things can do but I don't expect it to problem solve. I expect it to be decent at gathering relevant context for me, at taking existing patterns and re-applying them to a different situation, and at letting me talk shit to it while I figure out what actually needs to be done.

I especially expect it to allow me to be lazy and not have to manually type out all of that code across different files when it can just generate them it in a few seconds and I can review each change as it happens.

kiitos · 5 months ago
the time spent literally typing code into an editor is never the bottleneck in any competently-run project

if the act of writing code is something you consider a burden rather than a joy then my friend you are in the wrong profession

legacynl · 5 months ago
If natural language was an efficient way to write software we would have done it already. Fact is that it's faster to write class X { etc }; Than it is to write "create a class named X with behavior etc". If you want to think and solve problems yourself, it doesn't make sense to then increase your workload by putting your thoughts in natural language, which will be more verbose.

I therefore think it makes the most sense to just feed it requirements and issues, and telling it to provide a solution.

Also unless you're starting a new project or big feature with a lot of boiler plate, in my experience it's almost never necessary to make a lot of files with a lot of text in it at once.

skydhash · 5 months ago
Code is the ultimate fact checker, where what you write is what gets done. Specs are well written wishes.
simonw · 5 months ago
> Where are the "code-review" agents at?

OpenAI's Codex Cloud just added a new feature for code review, and their new GPT-5-Codex model has been specifically trained for code review: https://openai.com/index/introducing-upgrades-to-codex/

Gemini and Claude both have code review features that work via GitHub Actions: https://developers.google.com/gemini-code-assist/docs/review... and https://docs.claude.com/en/docs/claude-code/github-actions

GitHub have their own version of this pattern too: https://github.blog/changelog/2025-04-04-copilot-code-review...

There are also a whole lot of dedicated code review startups like https://coderabbit.ai/ and https://www.greptile.com/ and https://www.qodo.ai/products/qodo-merge/

vrighter · 5 months ago
you can't use a system with the exact same hallucination problem to check the work of another one just like it. Snake oil
aleph_minus_one · 5 months ago
> Code review is part of the job, but one of the least enjoyable parts. Developers like _writing_ and that gives the most job satisfaction.

At least for me, what gives the most satisfaction (even though this kind of satisfaction happens very rarely) if I discover some very elegant structure behind whatever has to be implemented that changes the whole way how you thought about programming (oroften even about life) for decades.

marklubi · 5 months ago
> what gives the most satisfaction (even though this kind of satisfaction happens very rarely) if I discover some very elegant structure behind whatever has to be implemented that changes the whole way how you thought about programming

A number of years ago, I wrote a caching/lookup library that is probably some of the favorite code I've ever created.

After the initial configuration, the use was elegant and there was really no reason not to use it if you needed to query anything that could be cached on the server side. Super easy to wrap just about any code with it as long as the response is serializable.

CachingCore.Instance.Get(key, cacheDuration, () => { /* expensive lookup code here */ });

Under the hood, it would check the preferred caching solution (e.g., Redis/Memcache/etc), followed by less preferred options if the preferred wasn't available, followed by the expensive lookup if it wasn't found anywhere. Defaulted to in-memory if nothing else was available.

If the data was returned from cache, it would then compare the expiration to the specified duration... If it was getting close to various configurable tolerances, it would start a new lookup in the background and update the cache (some of our lookups could take several minutes*, others just a handful of seconds).

The hardest part was making sure that we didn't cause a thundering herd type problem with looking up stuff multiple times... in-memory cache flags indicating lookups in progress so we could hold up other requests if it failed through and then let them know once it's available. While not the absolute worst case scenario, you might end up making the expensive lookups once from each of the servers that use it if the shared cache isn't available.

* most of these have a separate service running on a schedule to pre-cache the data, but things have a backup with this method.

mercutio2 · 5 months ago
Junior developers love writing code.

Senior developers love removing code.

Code review is probably my favorite part of the job, when there isn’t a deadline bearing down on me for my own tasks.

So I don’t really agree with your framing. Code reviews are very fun.

KronisLV · 5 months ago
> Developers like _writing_ and that gives the most job satisfaction.

Is it possible that this is just the majority and there’s plenty of folks that dislike actually starting from nothing and the endless iteration to make something that works, as opposed to have some sort of a good/bad baseline to just improve upon?

I’ve seen plenty of people that are okay with picking up a codebase someone else wrote and working with the patterns and architecture in there BUT when it comes to them either needing to create new mechanisms in it or create an entirely new project/repo it’s like they hit a wall - part of it probably being friction, part not being familiar with it, as well as other reasons.

> Why did we create tools that do the fun part and increase the non-fun part? Where are the "code-review" agents at?

Presumably because that’s where the most perceived productivity gain is in. As for code review, there’s CodeRabbit, I think GitLab has their thing (Duo) and more options are popping up. Conceptually, there’s nothing preventing you from feeding a Git diff into RooCode and letting it review stuff, alongside reading whatever surrounding files it needs.

aleph_minus_one · 5 months ago
> I’ve seen plenty of people that are okay with picking up a codebase someone else wrote and working with the patterns and architecture in there BUT when it comes to them either needing to create new mechanisms in it or create an entirely new project/repo it’s like they hit a wall - part of it probably being friction, part not being familiar with it, as well as other reasons.

For me, it's exactly the opposite:

I love to build things from "nothing" (if I had the possibility, I would even like to write my own kernel that is written in a novel programming language developed by me :-) ).

On the other hand, when I pick up someone else's codebase, I nearly always (if it was not written by some insanely smart programmer) immediately find it badly written. In nearly al cases I tend to be right in my judgements (my boss agrees), but I am very sensitive to bad code, and often ask myself how the programmer who wrote the original code did not yet commit seppuku, considering how much of a shame the code is.

Thus: you can in my opinion only enjoy picking up a codebase someone else wrote if you are incredibly tolerant of bad code.

crazygringo · 5 months ago
> Developers like _writing_ and that gives the most job satisfaction.

Not me. I enjoy figuring out the requirements, the high-level design, and the clever approach that will yield high performance, or reuse of existing libraries, or whatever it is that will make it an elegant solution.

Once I've figured all that out, the actual process of writing code is a total slog. Tracking variables, remembering syntax, trying to think through every edge case, avoiding off-by-one errors. I've gone from being an architect (fun) to slapping bricks together with mortar (boring).

I'm infinitely happier if all that can be done for me, everything is broken out into testable units, the code looks plausibly correct, and the unit tests for each function cover all cases and are demonstrably correct.

pmg101 · 5 months ago
You don't really know if the system design you've architected in your mind is any good though, do you, until you've actually tried coding it. Discovering all the little edge cases at that point is hard work ("a total slog") because it's where you find out where the flaws in your thinking were, and how your beautifully imagined abstractions fall down.

Then after going back and forth between thinking about it and trying to build it a few times, after a while you discover the real solution.

Or at least that's how it's worked for me for a few decades, everyone might be different.

skydhash · 5 months ago
> Tracking variables, remembering syntax,

That's why you have short functions so you don't have to track that many variable. And use symbol completion (a standard in many editors).

> trying to think through every edge case, avoiding off-by-one errors.

That is designing, not coding. Sometimes I think of an edge case, but I'm already on a task that I'd like to finish, so I just add a TODO comment. Then at least before I submit the PR, I ripgrep the project for this keyword and other.

Sometimes the best design is done by doing. The tradeoffs become clearer when you have to actually code the solution (too much abstraction, too verbose, unwieldy,...) instead of relying on your mind (everything seems simpler)

phito · 5 months ago
Because the goal of "AI" is not to have fun, it's to solve problems and increase productivity. I have fun programming too, but you have to realize the world isn't optimizing make things more fun.
fhd2 · 5 months ago
I hear you, but without any enjoyment in the process, quality and productivity go down the drain real fast.

The Ironies of Automation paper is something I mention a lot, the core thesis is that making humans review / rubber stamp automation reduces their work quality. People just aren't wired to do boring stuff well.

lapcat · 5 months ago
> you have to realize the world isn't optimizing make things more fun.

Serious question: why not?

IMO it should be.

If "progress" is making us all more miserable, then what's the point? Shouldn't progress make us happier?

It feels like the endgame of AI is that the masses slave away for the profit of a few tech overlords.

dearilos · 5 months ago
I'm building something to solve exactly that - automating all the boring and repetitive parts of code review.
cmrdporcupine · 5 months ago
If you have a paid Copilot membership and a Github project you can request a code review from Copilot. And it doesn't do a terrible job, actually.
sublinear · 5 months ago
I will second this. I believe code review agents and search summaries are the way forward for coding with LLMs.

The ability to ignore AI and focus on solving the problems has little to do with "fun". If anything it leaves a human-auditable trail to review later and hold accountable devs who have gone off the rails and routinely ignored the sometimes genuinely good advice that comes out of AI.

If humans don't have to helicopter over developers, that's a much bigger productivity boost than letting AI take the wheel. This is a nuance missed by almost everyone who doesn't write code or care about its quality.

egberts1 · 5 months ago
Asking AI to stay true to my requested parameters is hard, THEY ALL DRIFT AWAY, RANDOMLY

When working on nftables syntax highlighters, I have 230 tokens, 2,500 state, and 50,000+ state transitions.

Some firm guidelines given to AI agents are:

1. Fully-deterministic LL(1) full syntax tree.

2. No use of Vim 'syntax keyword' statement

3. Use long group names in snake_case whose naming starts with 'nft_' prefix (avoids collision with other Vim namespaces)

4. For parts of the group names, use only nftables/src/parser_bison.y semantic action and token names as-is.

5. For each traversal down the syntax tree, append that non-terminal node name from parser_bison.y to its group names before using it.

With those 5 "simple" user-requested requirements, all AI agents drift away from at least each of the rules at seemingly random interval.

At the moment, it is dubiois to even trust the bit-length of each packet field.

Never mind their inability to construct a simple Vimscript.

I use AI agents mainly as documentation.

On the bright side, they are getting good at breaking down 'rule', 'chain_block stmt', and 'map_stmt_expr' (that '.' period we see at chaining header expressions together; just use the quoted words and paste in one of your nft rule statements.

legacynl · 5 months ago
Im a dev (who knows nothing about nftables) and I don't understand your instructions. I think maybe you could improve your situation by formulating them as "when creating new groupnames use the semantic actions and token names as defined in parser_bison.y" I.e. with if conditions so that the correct rules apply to the correct situations. Because your rules are written as if to apply to every line of code, it might unnecessarily try to incorporate context even when it's not applicable.
HarHarVeryFunny · 5 months ago
The title of this article seems way too glib.

Code review isn't the same as design review, nor are these the only type of things (coding and design) that someone may be trying to use AI for.

If you are going to use AI, and catch it's mistakes, then you need to have expertise in whatever it is you are using the AI for. Even if we limit the discussion just to coding, then being a good code reviewer isn't enough - you'd need to have skill at whatever you are asking the AI to do. One of the valuable things AI can do is help you code using languages and frameworks you are not familiar with, which then of course means you are not going to be competent to review the output, other than in most generic fashion.

A bit off topic, but it's weird to me to see the term "coding" make a comeback in this AI/LLM era. I guess it is useful as a way to describe what AI is good at - coding vs more general software developer, but how many companies nowadays hire coders as opposed to software developers (I know it used to be a thing with some big companies like IBM)? Rather than compartmentalized roles, it seems the direction nowadays is more expecting developers to be able to do everything from business analysis and helping develop requirements, to architecture/design and then full-stack development, and subsequent production support.

scuff3d · 5 months ago
My official title is "Software Engineer", in the last five years I have..

1. Stood up and managed my own Kubernetes clusters for my team

2. Docker, just so so much Docker

3. Developed CI/CD pipelines

4. Done more integration and integration testing then I care to think about

5. Written god knows how many requirements and produced and endless stream of diagrams and graphs for systems engineering teams

6. Don't a bunch of random IT crap because our infrastructure team can't be bothered

7. Wrote some code once in a while

karmakaze · 5 months ago
Seems so.

> Using AI agents correctly is a process of reviewing code. [...]

> Why is that? Large language models are good at producing a lot of code, but they don’t yet have the depth of judgement of a competent software engineer. Left unsupervised, they will spend a lot of time committing to bad design decisions.

Obviously you want to make course corrections sooner than later. Same as I would do with less experienced devs, talk through the high level operations, then the design/composition. Reviewing a large volume of unguided code is like waiting for 100k tokens to be written only to correct the premise in the first 100 and start over.