If you are good at code review, you will be good at using AI agents

This idea that you can get good results from a bad process as long as you have good quality control seems… dubious, to say the least. “Sure, it’ll produce endless broken nonsense, but as long as someone is checking, it’s fine.” This, generally, doesn’t really work. You see people _try_ it in industry a bit; have a process which produces a high rate of failures, catch them in QA, rework (the US car industry used to be notorious for this). I don’t know of any case where it has really worked out.

Imagine that your boss came to you, the tech lead of a small team, and said “okay, instead of having five competent people, your team will now have 25 complete idiots. We expect that their random flailing will sometimes produce stuff that kinda works, and it will be your job to review it all.” Now, you would, of course, think that your boss had gone crazy. No-one would expect this to produce good results. But somehow, stick ‘AI’ on this scenario, and a lot of people start to think “hey, maybe that could work.”

Manfred · 5 months ago

Reviewing code from less experienced or unmotivated people is also very taxing, both in a cognitive and emotional sense. It will never approach a really good level of quality because you just give up after 4 rounds of reviews on the same feature.

EdwardDiego · 5 months ago

Except humans learn from your PR comments and in other interactions with more experienced people, and so inexperienced devs become experienced devs eventually. LLMs are not so trainable.

btown · 5 months ago

Here’s the thing about AI though - you don’t need to worry about its confidence or impact on professional development if you’re overly critical, and it will do a turn within seconds. That gives a tremendous amount of flexibility and leverage to the code reviewer. Works better on some types of problems than others, but it’s worth exploring!

hakfoo · 5 months ago

With human co-workers, you can generally assume things you can't with AI.

My human co-workers generally have good faith. Even the developer who was clearly on the verge of getting a role elsewhere without his heart in it-- he tried to solve the problems assigned to him, not some random delusion that the words happened to echo. I don't have that level of trust with AI.

If there's a misunderstanding the problem or the context, it's probably still the product of a recognizable logic flow that you can use to discuss what went wrong. I can ask Claude "Why are you converting this amount from Serbian Dinars to Poppyseed Bagels in line 476?" but will its answer be meaningful?

Human code review often involves a bit of a shared background. We've been working with the same codebases for several years, so we're going to use existing conventions. In this situation, the "AI knows all and sees all" becomes an anti-feature-- it may optimize for "this is how most people solve this task from a blank slate" rather than "it's less of a cognitive burden for the overall process if your single change is consistent with 500 other similar structures which have been in place since the Clinton administration."

There may be ways to try to force-feed AI this behaviour, but the more effort you devote to priming and pre-configuring the machine, the less you're actually saving over doing the actual work in the first place.

HarHarVeryFunny · 5 months ago

Right, this is the exact opposite of the best practices that Edward Deming helped develop in Japan, then brought to the west.

Quality needs to come from the process, not the people.

Choosing to use a process known to be flawed, then hoping that people will catch the mistakes, doesn't seem like a great idea if the goal is quality.

The trouble is that LLMs can be used in many ways, but only some of those ways play to their strengths. Management have fantasies of using AI for everything, having either failed to understand what it is good for, or failed to learn the lessons of Japan/Deming.

thunky · 5 months ago

> Choosing to use a process known to be flawed, then hoping that people will catch the mistakes, doesn't seem like a great idea if the goal is quality.

You're also describing the software development process prior to LLMs. Otherwise code reviews wouldn't exist.

GarnetFloride · 5 months ago

Oh man, that's what I've been smelling with all this. It's the Red Bead Experiment, all over again. https://www.youtube.com/watch?v=ckBfbvOXDvU

giovannibonetti · 5 months ago

> Quality needs to come from the process, not the people.

Not sure which Japanese school of management you're following, but I think Toyota-style goes against that. The process gives more autonomy to workers than, say, Ford-style, where each tiny part of the process is pre-defined.

I got the impression that Toyota-style was considered to bring better quality to the product, even though it gives people more autonomy.

overfeed · 5 months ago

> Management have fantasies of using AI for everything, having either failed to understand what it is good for, or failed to learn the lessons of Japan/Deming.

Third option: they want to automate all jobs before the competition does. Think of it as AWS, but for labor.

stockresearcher · 5 months ago

> Deming helped develop in Japan

Deming’s process was about how to operate a business in a capital-intensive industry when you don’t have a lot of capital (with market-acceptable speed and quality). That you could continue to push it and raise quality as you increased the amount of capital you had was a side-effect, and the various Japanese automakers demonstrated widely different commitments to it.

And I’m sure you know that he started formulating his ideas during the Great Depression and refined them while working on defense manufacturing in the US during WWII.

jvanderbot · 5 months ago

What happens is a kind of feeling of developing a meta skill. It's tempting to believe the scope of what you can solve has expanded when you are self-assessed as "good" with AI.

Its the same with any "general" tech. I've seen it since genetic algorithms were all the rage. Everyone reaches for the most general tool, then assumes everything that tool might be used for is now a problem or domain they are an expert in, with zero context into that domain. AI is this times 100x, plus one layer more meta, as you can optimize over approaches with zero context.

CuriouslyC · 5 months ago

That's an oversimplification. AI can genuinely expand the scope of things you can do. How it does this is a bit particular though, and bears paying attention to.

Normally, if you want to achieve some goal, there is a whole pile of tasks you need to be able to complete to achieve it. If you don't have the ability complete any one of those tasks, you will be unable to complete the goal, even if you're easily able to accomplish all the other tasks involved.

AI raises your capability floor. It isn't very effective at letting you accomplish things that are meaningfully outside your capability/comprehension, but if there are straightforward knowledge/process blockers that don't involve deeper intuition it smooths those right out.

monkeyelite · 5 months ago

Yep. All the process in the world won’t teach you to make a system that works.

The pattern I see over and over is a team aimlessly putting a long through tickets in sprints until an engineer who knows how to solve the problem gets it on track personally.

keeda · 5 months ago

1. The flaw in this premise is that the process is bad. Aside from the countless anecdotal reports about how AI and agents are improving productivity, there are actual studies showing 25 - 55% boosts. Yes, RCTs at larger size than the METR one that keeps getting bandied about: https://news.ycombinator.com/item?id=44860577 and many more on Google Scholar: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&as_ylo...

2. Quality control is key to good processes as well. Code review is literally a best practice in the software industry. Especially in BigTech and high-performing organizations. That is, even for humans, including those that could be considered the cream of the industry, code review is a standard step of the delivery process.

3. People have posted their GitHub profiles and projects (including on this very forum) to show how AI is working out for them. Browse through some of them and see how much "endless broken nonsense" you find. And if that seems unscientific, well go back to point 1.

dingnuts · 5 months ago

I picked one of the studies in the search (!) you linked. First of all, it's a bullshit debate tactic to try to overwhelm your opponents with vague studies -- a search is complete bullshit because it puts the onus on the other person to discredit the gargantuan amount of data you've flooded them with. Many of the studies in that search don't have anything to do with programming at all.

So right off the bat, I don't trust you. Anyway, I picked one study from the search to give you the benefit of the doubt. It compared leetcode in the browser to LLM generation. This tells us absolutely nothing about real world development.

What made the METR paper interesting was that they studied real projects, in the real world. We all know LLMs can solve well bounded problems in their data sets.

As for 3 I've seen a lot of broken nonsense. Let me know when someone vibe codes up a new mobile operating system or a competitor to KDE and Gnome lol

xyzzy123 · 5 months ago

I have a play project which hits these constraints a lot.

I have been messing around with getting AI to implement novel (to me) data structures from papers. They're not rocket science or anything but there's a lot of detail. Often I do not understand the complex edge cases in the algorithms myself so I can't even "review my way out of it". I'm also working in go which is usually not a very good fit for implementing these things because it doesn't have sum types; lack of sum types oten adds so much interface{} bloat it would render the data structure pointless. Am working around with codegen for now.

What I've had to do is demote "human review" a bit; it's a critical control but it's expensive. Rather, think more holistically about "guard rails" to put where and what the acceptance criteria should be. This means that when I'm reviewing the code I am reasonably confident it's functionally correct, leaving me to focus on whether I like how that is being achieved. This won't work for every domain, but if it's possible to automate controls, it feels like this is the way to go wherever possible.

The "principled" way to do this would be to use provers etc, but being more of an engineer I have resorted to ruthless guard rails. Bench tests that automatically fail if the runtime doesn't meet requirements (e.g. is O(n) instead of O(log n)) or overall memory efficiency is too low - and enforcing 100% code coverage from both unit tests AND fuzzing. Sometimes the cli agent is running for hours chasing indexes or weird bugs; the two main tasks are preventing it from giving up, and stopping it from "punting" (wait, this isn't working, let me first create a 100% correct O(n) version...) or cheating. Also reminding it to check AGAIN for slice sharing bugs which crop up a surprising % of the time.

The other "interesting" part of my workflow right now is that I have to manually shuffle a lot between "deep research" (which goes and reads all the papers and blogs about the data structure) and the cli agent which finds the practical bugs etc but often doesn't have the "firepower" to recognise when it's stuck in a local maximum or going around in circles. Have been thinking about an MCP that lets the cli agent call out to "deep research" when it gets really stuck.

roenxi · 5 months ago

The issue with the hypothetical is if you give a team lead 25 competent people they'd also get bad results. Or at least, the "team lead" isn't really leading their team on technical matters apart from fighting off the odd attempt to migrate to MongoDB and hoping that their people are doing the right thing. The sweet spot for teams is 3-6 people and someone more interested in empire building than technical excellence can handle maybe around 9 people and still do a competent job. It doesn't depend much on the quality of the people.

The way team leads seem to get used is people who are good at code get a little more productive as more people are told to report to them. What is happening now is the senior-level engineers all automatically get the same option: a team of 1-2 mid-level engineers on the cheap thanks to AI which is entirely manageable. And anyone less capable gets a small team, a rubber duck or a mentor depending on where they fall vs LLM use.

Of course, the real question is what will happen as the AIs get into the territory traditionally associated with 130+ IQ ranges and the engineers start to sort out how to give them a bit more object persistence.

bitwize · 5 months ago

> Imagine that your boss came to you, the tech lead of a small team, and said “okay, instead of having five competent people, your team will now have 25 complete idiots. We expect that their random flailing will sometimes produce stuff that kinda works, and it will be your job to review it all.”

This is exactly the point of corporate Agile. Management believes that the locus of competence in an organization should reside within management. Depending on competent programmers is thus a risk, and what is sought is a process that can simulate a highly competent programmer's output with a gang of mediocre programmers. Kinda like the myth that you can build one good speaker out of many crappy ones, or the principle of RAID which is to use many cheap, failure-prone drives to provide the reliability guarantees of one expensive, reliable drive (which also kinda doesn't work if the drives came from the same lot and are prone to fail at about the same time). Every team could use some sort of process, but usually if you want to retain good people, this takes the form of "disciplines regarding branching, merging, code review/approval, testing, CI, etc." Something as stifling as Scrum risks scaring your good people away, or driving them nuts.

So yes, people do expect it to work, all the time. And with AI in the mix, it now gains very nice "labor is more fungible with capital" properties. We're going to see some very nice, spectacular failures in the next few years as a result, a veritable Perseid meteor shower of critical systems going boom; and those companies that wish to remain going concerns will call in human programmers to clean up the mess (but probably lowball on pay and/or try to get away with outsourcing to places with dirt-cheap COL). But it'll still be a rough few years for us while management in many orgs gets high off their own farts.

cyphar · 5 months ago

It also assumes that people who are "good" at the standard code review process (which is tuned for reviewing code written by humans with some level of domain experience and thus finding human-looking mistakes) will be able to translate their skills perfectly to reviewing code written by AI. There have been plenty of examples where this review process was shown to be woefully insufficient for things outside of this scope (for instance, malicious patches like the bad patches scandal with Linux a few years ago or the xz backdoor were only discovered after the fact).

I haven't had to review too much AI code yet, but from what I've seen it tends to be the kind of code review that really requires you to think hard and so seems likely to lead to mistakes even with decent code reviewers. (I wouldn't say that I'm a brilliant code reviewer, but I have been doing open source maintenance full-time for around a decade at this point so I would say I have some experience with code reviews.)

steelblueskies · 5 months ago

Evolution via random mutation and selection.

Or more broadly, the existence of complex or any life.

Sure, it's not the way I would pick to do most things, but when your buzzword magical thinking so deep all that you have is a hammer, even if it doesn't look like a nail you will force your wage slaves to hammer it anyway until it works.

As to your other cases.. injection molded plastic parts for things like the spinning t bar spray arm in some dishwashers. Crap molds, pass to low wage or temp to razorblade fix by hand and box up. Personally worked such a temp job before, among others so yes that bad output manual qc and fix up abounds still.

And if we are talking high failure rates... see also chip binning and foundry yields in semiconductors.

Just have to look around to see the dubious seeming is more the norm.

nurettin · 5 months ago

I went from ";" to fully working C++ production grade code with good test coverage. To my estimation, 90% of the work was done in an agent prompt. It was a side project, now it will be my job. The process is like they described.

For the core parts you cannot let go of the reins. You have to keep steering it. You have to take short breaks and reload the code into the agent as it starts acting confused. But once you get the hang of it, things that would take you months of convincing yourself and picking yourself back up to continue becomes a day's work.

Once you have a decent amount of work done, you can have the agent read your code as documentation and use it to develop further.

Terr_ · 5 months ago

> but as long as someone is checking

I predict many disastrous "AI" failures because the designers somehow believed that "some humans capable of constant vigilant attention to detail" was an easy thing they could have.

altspace · 5 months ago

What I took away from the article was that being good at code review makes the person better at guiding the agent to do the job, giving the right context and constraints at the right time… and not that the code reviewer has to fix whatever agent generated… this is also pretty close to my personal experience… LLM models are a bull which can be guided and definitely not a complete idiot…

In a strange kind of analogy, flowing water can cause a lot of damage.. but a dam built to the right specification and turbines can harness that for something very useful… the art is to learn how to build that dam

gus_massa · 5 months ago

I'm not sure about the current state of the art, but microprocessors production is (was?) very bad. You make a lot of them in a single silicon wafer, and then test them thoughtfully until you find the few that are good. You drop all the defective ones because they are very cheap piece of sand and charge a lot for the ones that works correctly to cover all the costs.

I'm not sure how this translates to programming, code review is too expensive, but for short code you can try https://en.wikipedia.org/wiki/Superoptimization

CorrectHorseBat · 5 months ago

Design for test is still a major part of (high volume) chip design. Anything that can't be tested in seconds on wafer is basically worthless for mass production.

rsynnott · 5 months ago

In that case, tho, no-one’s saying “let’s be sloppy with production and make up for it in the QA” (which really used to be a US car industry strategy until the Japanese wiped the floor with them); the process is as good as it reasonably can be, there are just physical limits. Chip manufacturers spend vast amounts on reducing the error rate.

Deleted Comment

moffkalast · 5 months ago

> Sure, it’ll produce endless broken nonsense, but as long as someone is checking, it’s fine

Well you've just described an EKF on a noisy sensor.

esafak · 5 months ago

I do not think anybody is going to get that reference. https://xkcd.com/2501/

estimator7292 · 5 months ago

Imagine a factory making injection molded plastic toys but instead of pumping out perfect parts 99.999% of the time, the machine gives you 50% and you have to pay people to pull out the bad ones from a full speed assembly line and hope no bad ones get through.

yen223 · 5 months ago

Is this not how microchips are made?

tempodox · 5 months ago

> … good results from a bad process …

Even if the process weren’t technically bad, it would still be shit. Doing code review with a human has meaning in that the human will probably learn something, and it’s an investment in the future. Baby-sitting an LLM, however, is utterly meaningless.

ben_w · 5 months ago

> I don’t know of any case where it has really worked out.

Supermarket vegetables.

HarHarVeryFunny · 5 months ago

Are you saying that supermarket vegetables/produce are good?

Quite a bit of it, like Tomatoes and Strawberries, is just crap. Form over substance. Nice color and zero flavor. Selected for delivery/shelf-life/appearance rather actually being any good.

nkmnz · 5 months ago

> This idea that you can get good results from a bad process

This idea is called "evolution"...

> as long as you have good quality control

...and it's QA is death on every single level of the systems: cell, organism, species, and ecosystem. You must consider that those devs or companies with not-good-enough QA will end up dead (from a business perspective).

dwattttt · 5 months ago

Evolution is extremely inefficient at producing good designs. Given enough time it'll explore more, because it's driven randomly, but most mutations either don't help, or downright hurt an organism's survival.

rsynnott · 5 months ago

I look forward to software which takes several million years to produce and tends to die of Software Cancer.

Like, evolution is not _good_ at ‘designing’ things.

bluefirebrand · 5 months ago

So we're software evolvers now, not engineers?

Sounds like a stupid path forward to me

ChrisMarshallNY · 5 months ago

That depends.

If the engineer, doing the implementation is top-shelf, you can get very good results from a “flawed” process (in quotes, because it’s not actually “bad.” It’s just a process that depends on the engineer being that particular one).

Silicon Valley is obsessed with process over people, manifesting “magical thinking” that a “perfect” process eliminates the need for good people.

I have found the truth to be in-between. I worked for a company that had overwhelming Process, but that process depended on good people, so it hired top graduates, and invested huge amounts of money and time into training and retention.

marklubi · 5 months ago

Said a little more crass/simply: A people hire A people. B people hire C people.

The first is phenomenal until someone makes a mistake and brings in a manager or supervisor from the C category that talks the talk but doesn't walk the walk.

If you accidentally end up in one that turns out to be the later. It's maddening trying to get anything accomplished if the task involves anyone else.

Hire slow, fire fast.

rhetocj23 · 5 months ago

Steve Jobs said this decades ago.

Its the content that matters, not the process.

TrinaryWorksToo · 5 months ago

Bayesian reasoning would lead me to think that a high rate of failures means even if QA is 99.9% amazing and dev is AI 80% slop, there still will be more poor features and bugs (99.9% * 80% = 79.92%) than if both are mediocore (%90 * %90 = 81%)

Code review is part of the job, but one of the least enjoyable parts. Developers like _writing_ and that gives the most job satisfaction. AI tools are helpful, but inherently increases the amount of code we have to review with more scrutiny than my colleagues because of how unpredictable - yet convincing - it can be. Why did we create tools that do the fun part and increase the non-fun part? Where are the "code-review" agents at?

jmcodes · 5 months ago

Maybe I'm weird but I don't actually enjoy the act of _writing_ code. I enjoy problem solving and creating something. I enjoy decomposing systems and putting them back together in a better state, but actually manually typing out code isn't something I enjoy.

When I use an LLM to code I feel like I can go from idea to something I can work with in much less time than I would have normally.

Our codebase is more type-safe, better documented, and it's much easier to refactor messy code into the intended architecture.

Maybe I just have lower expectations of what these things can do but I don't expect it to problem solve. I expect it to be decent at gathering relevant context for me, at taking existing patterns and re-applying them to a different situation, and at letting me talk shit to it while I figure out what actually needs to be done.

I especially expect it to allow me to be lazy and not have to manually type out all of that code across different files when it can just generate them it in a few seconds and I can review each change as it happens.

kiitos · 5 months ago

the time spent literally typing code into an editor is never the bottleneck in any competently-run project

if the act of writing code is something you consider a burden rather than a joy then my friend you are in the wrong profession

legacynl · 5 months ago

If natural language was an efficient way to write software we would have done it already. Fact is that it's faster to write class X { etc }; Than it is to write "create a class named X with behavior etc". If you want to think and solve problems yourself, it doesn't make sense to then increase your workload by putting your thoughts in natural language, which will be more verbose.

I therefore think it makes the most sense to just feed it requirements and issues, and telling it to provide a solution.

Also unless you're starting a new project or big feature with a lot of boiler plate, in my experience it's almost never necessary to make a lot of files with a lot of text in it at once.

skydhash · 5 months ago

Code is the ultimate fact checker, where what you write is what gets done. Specs are well written wishes.

simonw · 5 months ago

> Where are the "code-review" agents at?

OpenAI's Codex Cloud just added a new feature for code review, and their new GPT-5-Codex model has been specifically trained for code review: https://openai.com/index/introducing-upgrades-to-codex/

Gemini and Claude both have code review features that work via GitHub Actions: https://developers.google.com/gemini-code-assist/docs/review... and https://docs.claude.com/en/docs/claude-code/github-actions

GitHub have their own version of this pattern too: https://github.blog/changelog/2025-04-04-copilot-code-review...

There are also a whole lot of dedicated code review startups like https://coderabbit.ai/ and https://www.greptile.com/ and https://www.qodo.ai/products/qodo-merge/

vrighter · 5 months ago

you can't use a system with the exact same hallucination problem to check the work of another one just like it. Snake oil

aleph_minus_one · 5 months ago

> Code review is part of the job, but one of the least enjoyable parts. Developers like _writing_ and that gives the most job satisfaction.

At least for me, what gives the most satisfaction (even though this kind of satisfaction happens very rarely) if I discover some very elegant structure behind whatever has to be implemented that changes the whole way how you thought about programming (oroften even about life) for decades.

marklubi · 5 months ago

> what gives the most satisfaction (even though this kind of satisfaction happens very rarely) if I discover some very elegant structure behind whatever has to be implemented that changes the whole way how you thought about programming

A number of years ago, I wrote a caching/lookup library that is probably some of the favorite code I've ever created.

After the initial configuration, the use was elegant and there was really no reason not to use it if you needed to query anything that could be cached on the server side. Super easy to wrap just about any code with it as long as the response is serializable.

CachingCore.Instance.Get(key, cacheDuration, () => { /* expensive lookup code here */ });

Under the hood, it would check the preferred caching solution (e.g., Redis/Memcache/etc), followed by less preferred options if the preferred wasn't available, followed by the expensive lookup if it wasn't found anywhere. Defaulted to in-memory if nothing else was available.

If the data was returned from cache, it would then compare the expiration to the specified duration... If it was getting close to various configurable tolerances, it would start a new lookup in the background and update the cache (some of our lookups could take several minutes*, others just a handful of seconds).

The hardest part was making sure that we didn't cause a thundering herd type problem with looking up stuff multiple times... in-memory cache flags indicating lookups in progress so we could hold up other requests if it failed through and then let them know once it's available. While not the absolute worst case scenario, you might end up making the expensive lookups once from each of the servers that use it if the shared cache isn't available.

* most of these have a separate service running on a schedule to pre-cache the data, but things have a backup with this method.

mercutio2 · 5 months ago

Junior developers love writing code.

Senior developers love removing code.

Code review is probably my favorite part of the job, when there isn’t a deadline bearing down on me for my own tasks.

So I don’t really agree with your framing. Code reviews are very fun.

KronisLV · 5 months ago

> Developers like _writing_ and that gives the most job satisfaction.

Is it possible that this is just the majority and there’s plenty of folks that dislike actually starting from nothing and the endless iteration to make something that works, as opposed to have some sort of a good/bad baseline to just improve upon?

I’ve seen plenty of people that are okay with picking up a codebase someone else wrote and working with the patterns and architecture in there BUT when it comes to them either needing to create new mechanisms in it or create an entirely new project/repo it’s like they hit a wall - part of it probably being friction, part not being familiar with it, as well as other reasons.

> Why did we create tools that do the fun part and increase the non-fun part? Where are the "code-review" agents at?

Presumably because that’s where the most perceived productivity gain is in. As for code review, there’s CodeRabbit, I think GitLab has their thing (Duo) and more options are popping up. Conceptually, there’s nothing preventing you from feeding a Git diff into RooCode and letting it review stuff, alongside reading whatever surrounding files it needs.

aleph_minus_one · 5 months ago

> I’ve seen plenty of people that are okay with picking up a codebase someone else wrote and working with the patterns and architecture in there BUT when it comes to them either needing to create new mechanisms in it or create an entirely new project/repo it’s like they hit a wall - part of it probably being friction, part not being familiar with it, as well as other reasons.

For me, it's exactly the opposite:

I love to build things from "nothing" (if I had the possibility, I would even like to write my own kernel that is written in a novel programming language developed by me :-) ).

On the other hand, when I pick up someone else's codebase, I nearly always (if it was not written by some insanely smart programmer) immediately find it badly written. In nearly al cases I tend to be right in my judgements (my boss agrees), but I am very sensitive to bad code, and often ask myself how the programmer who wrote the original code did not yet commit seppuku, considering how much of a shame the code is.

Thus: you can in my opinion only enjoy picking up a codebase someone else wrote if you are incredibly tolerant of bad code.

crazygringo · 5 months ago

> Developers like _writing_ and that gives the most job satisfaction.

Not me. I enjoy figuring out the requirements, the high-level design, and the clever approach that will yield high performance, or reuse of existing libraries, or whatever it is that will make it an elegant solution.

Once I've figured all that out, the actual process of writing code is a total slog. Tracking variables, remembering syntax, trying to think through every edge case, avoiding off-by-one errors. I've gone from being an architect (fun) to slapping bricks together with mortar (boring).

I'm infinitely happier if all that can be done for me, everything is broken out into testable units, the code looks plausibly correct, and the unit tests for each function cover all cases and are demonstrably correct.

pmg101 · 5 months ago

You don't really know if the system design you've architected in your mind is any good though, do you, until you've actually tried coding it. Discovering all the little edge cases at that point is hard work ("a total slog") because it's where you find out where the flaws in your thinking were, and how your beautifully imagined abstractions fall down.

Then after going back and forth between thinking about it and trying to build it a few times, after a while you discover the real solution.

Or at least that's how it's worked for me for a few decades, everyone might be different.

skydhash · 5 months ago

> Tracking variables, remembering syntax,

That's why you have short functions so you don't have to track that many variable. And use symbol completion (a standard in many editors).

> trying to think through every edge case, avoiding off-by-one errors.

That is designing, not coding. Sometimes I think of an edge case, but I'm already on a task that I'd like to finish, so I just add a TODO comment. Then at least before I submit the PR, I ripgrep the project for this keyword and other.

Sometimes the best design is done by doing. The tradeoffs become clearer when you have to actually code the solution (too much abstraction, too verbose, unwieldy,...) instead of relying on your mind (everything seems simpler)

phito · 5 months ago

Because the goal of "AI" is not to have fun, it's to solve problems and increase productivity. I have fun programming too, but you have to realize the world isn't optimizing make things more fun.

fhd2 · 5 months ago

I hear you, but without any enjoyment in the process, quality and productivity go down the drain real fast.

The Ironies of Automation paper is something I mention a lot, the core thesis is that making humans review / rubber stamp automation reduces their work quality. People just aren't wired to do boring stuff well.

lapcat · 5 months ago

> you have to realize the world isn't optimizing make things more fun.

Serious question: why not?

IMO it should be.

If "progress" is making us all more miserable, then what's the point? Shouldn't progress make us happier?

It feels like the endgame of AI is that the masses slave away for the profit of a few tech overlords.

dearilos · 5 months ago

I'm building something to solve exactly that - automating all the boring and repetitive parts of code review.

cmrdporcupine · 5 months ago

If you have a paid Copilot membership and a Github project you can request a code review from Copilot. And it doesn't do a terrible job, actually.

sublinear · 5 months ago

I will second this. I believe code review agents and search summaries are the way forward for coding with LLMs.

The ability to ignore AI and focus on solving the problems has little to do with "fun". If anything it leaves a human-auditable trail to review later and hold accountable devs who have gone off the rails and routinely ignored the sometimes genuinely good advice that comes out of AI.

If humans don't have to helicopter over developers, that's a much bigger productivity boost than letting AI take the wheel. This is a nuance missed by almost everyone who doesn't write code or care about its quality.