Eight more months of agents

> I deeply appreciate hand-tool carpentry and mastery of the art, but people need houses and framing teams should obviously have skillsaws.

Where are all the new houses? I admit I am not a bleeding edge seeker when it comes to software consumption, but surely a 10x increase in the industry output would be noticeable to anyone?

amarble · a day ago

This weekend I tried what I'd call a medium scale agentic coding project[0], following what Anthropic demonstrated last week autonomously building a C-compiler [1]. Bottom line is, it's possible to make demos that look good, but it really doesn't work well enough to build software you would actually use. This naturally lends itself to the "everybody is taking about how great it is but nobody is building anything real with it" construct we're in right now. It is great, but also not really useful.

[0] https://www.marble.onl/posts/this_cost_170.html

[1] https://www.anthropic.com/engineering/building-c-compiler

parliament32 · 11 hours ago

Related, this reminds me of the time Cursor spent millions of dollars worth of tokens to write a new browser with LLMs and ended up with a non-functioning wrapper of existing browser libraries.

https://news.ycombinator.com/item?id=46646777

jofla_net · 12 hours ago

Thank you for doing this.

xyzzy123 · a day ago

Org processes have not changed. Lots of the devs I know are enjoying the speedup on mundane work, consuming it as a temporary lifestyle surplus until everything else catches up.

You can't saw faster than the wood arrives. Also the layout of the whole job site is now wrong and the council approvals were the actual bottleneck to how many houses could be built in the first place... :/

gbuk2013 · a day ago

Basically this. My last several tickets were HITL coding with AI for several hours and then waiting 1-2 days while the code worked its way through PR and CI/CD process.

Coding speed was never really a bottleneck anywhere I have worked - it’s all the processes around it that take the most time and AI doesn’t help that much there.

bunderbunder · 20 hours ago

I’m seeing it slightly differently. So much of our new slowdown is rework because we’ve seen a bunch more API and contract churn. The project I’m on has had more rework than I care to contemplate and most of it stems from everyone’s coding agents failing to stay synced up with each other on the details and their human handlers not noticing the discrepancies until we’re well into systems integration work.

If I may hijack your analogy, it would be like if all the construction crews got really fast at their work, so much so that the city decided to go for an “iterative construction” strategy because, in isolation, the cost of one team trying different designs on-site until they hit on one they liked became very small compared to the cost of getting city planners and civil engineers involved up-front. But what wasn’t considered was the rework multiplier effect that comes into play when the people building the water, sewage, electricity, telephones, roads, etc. are all repeatedly tweaking designs with minimal coordination amongst each other. So then those tweaks keep inducing additional design tweaks and rework on adjacent contractors because none of these design changes happen in a vacuum. Next thing you know all the houses are built but now need to be rewired because the electricity panel is designed for a different mains voltage from the drop and also it’s in the wrong part of the house because of a late change from overhead lines in the alleys to underground lines below the street.

Many have observed that coding agents lack object permanence so keeping them on a coherent plan requires giving them such a thoroughly documented plan up front. It actually has me wondering if optimal coding agent usage at scale resembles something of a return to waterfall (probably in more of a Royce sense than the bogeyman agile evangelists derived from the original idea) where the humans on the team mostly spend their time banging out systems specifications and testing protocols, and iteration on the spec becomes somewhat more removed from implementing it than it is in typical practice nowadays.

arrrg · a day ago

To me the hard problem isn’t building things, it’s knowing what to build (finding the things that provide value) and how to build it (e.g. finding novel approaches to doing something that makes something possible that wasn’t possible before).

I don’t see AI helping with knowing what to build at all and I also don’t see AI finding novel approaches to anything.

Sure, I do think there is some unrealized potential somewhere in terms of relatively low value things nobody built before because it just wasn’t worth the time investment – but those things are necessarily relatively low value (or else it would have been worth it to build it) and as such also relatively limited.

Software has amazing economies of scale. So I don’t think the builder/tool analogy works at all. The economics don’t map. Since you only have to build software once and then it doesn’t matter how often you use it (yeah, a simplification) even pretty low value things have always been worth building. In other words: there is tons of software out there. That’s not the issue. The issue is: what it the right software and can it solve my problems?

shinycode · a day ago

I agree. It’s really easier to build low-impact tools for personal use. I managed to produce tools I would never have had time to build and I use them everyday. But I will never sell them because it’s tailored to my needs and it makes no sense to open source anything nowadays. For work it’s different, product teams still need to decide what to build and what is helpful to the clients. Our bugs are not self-fixed by AI yet. I think Anthropic saying 100% of their code is AI generated is a marketing stunt. They have all reasons to say that to sell their tool that generates code. It sends a strong signal to the industry that if they can do it, it could be easier for smaller companies. We are not there yet from a client perspective asking a feature and the new feature is shipped 2 days later in prod without human interactions

pzo · a day ago

> To me the hard problem isn’t building things, it’s knowing what to build (finding the things that provide value) and how to build it (e.g. finding novel approaches to doing something that makes something possible that wasn’t possible before).

The problem with this that after doing this hard work someone can just copy easily your hard work and UI/UX taste. I think distribution will be very important in the future.

We might end up that in future that you have already in social media where influencers copy someones post/video and not giving credits to original author.

c048 · a day ago

I wonder what happened to the old addage of "only 10% of the time you actually spend coding, the rest of the time is figuring out what is needed".

At the same time I see people claiming 100x increases and how they produce 15k lines of code each day thanks to AI, but all I can wonder is how these people managed to find 100x work that needed to be done.

rtcoms · a day ago

For m, I'm demotivated to work on many ideas thinking that anyone can easily copy it or OpenClaw/Nanobot will easily replicate 90% of that unctionality.

So now need to think of different kind of ideas, something on line of games that may take multiple iteration to get perfected.

meowface · 21 hours ago

In a few decades, AIs will probably be better at those than most humans. Possibly even sooner.

wongarsu · 20 hours ago

At my $work this manifests as more backlog items being ticked off, more one-off internal tooling, features (and tools) getting more bells-and-whistles and much more elaborate UI. Also some long-standing bugs being fixed by claude code.

Headline features aren't much faster. You still need to gather requirements, design a good architecture, talk with stakeholders, test your implementation, gather feedback, etc. Speeding up the actual coding can only move the needle so much.

krinchan · 10 hours ago

I feel like we work at the same place. IT Husbandry/Debt Paying/KTLO whatever you call it is being ground into dust. Especially repetitive stuff that I originally would've needed a week to automate and never could get to the top of the once quarterly DevOps sprint...bam. GitHub Action workflow runs weekly to pull in the latest OS images, update and roll over a smoke test VM, monitor, roll over the rest or rollback and ping me in Slack. Done in half a day.

I've got a couple Claude Code skills set up where I just copy/paste a Slack link into it and it links people relevant docs, gives them relevant troubleshooting from our logs, and a hook on the slack tools appends a Claude signature to make sure they know they weren't worth my time.

That said, there's this weird quicksand people around me get in where they just spend weeks and weeks on their AI tools and don't actually do much of anything? Like bro you burned your 5 hour CC Enterprise limit all week and committed...nothing?

bananaflag · a day ago

I think, if there were to be a noticeable increase in software quantity due to agentic coding, we should test it by looking into indie games.

Cthulhu_ · a day ago

According to SteamDB (and Reddit), 2024 and 2025 both saw about 19.000 games released on Steam - there's a big jump between '23 and '24 of about 5000 games, but oddly it plateaued then.

https://www.reddit.com/r/pcgaming/comments/1pl7kg1/over_1900...

co_king_3 · a day ago

LLMs are unusably bad at generating game code

conartist6 · 21 hours ago

It's AI features and 10x more bugs. Microsoft is leading the way.

Cthulhu_ · a day ago

I'm sure there's plenty of new software being released and built by agents, but the same problem as handcrafted software remains - finding an audience. The easier and quicker it is to build software, or the more developers build software, the more stuff is thrown at a wall to see what sticks, but I don't think there's more capacity for sticktivity, if my analogy hasn't broken down by now.

sdoering · a day ago

Quite a few - and I know I am only speaking for myself - live on my different computers. I created a few CLI tools that make my life and that of my agent smoother sailing for information retrieval. I created, inspired by a blog post, a digital personal assistant, that really enables me to better juggle different work contexts as well as different projects within these work contexts.

I created a platform for a virtual pub quiz for my team at my day job, built multiple pandingpages for events, debugged dark table to recognize my new camera (it was to new to be included in the camera.xml file, but the specs were known). I debugged quite a few parts of a legacy shitshow of an application, did a lot of infrastructure optimization and I also created a massive ton of content as a centaur in dialog with the help of Claude Code.

But I don't do "Show HN" posts. And I don't advertise my builds - because other than those named, most are one off things, that I throw away after this one problem was solved.

To me code became way more ephemeral.

But YMMV - and that is a good thing. I also believe that way less people than the hype bubble implies are actually really into hard core usage like Pete Steinberger or Armin Ronacher and the likes.

overfeed · 19 hours ago

> Quite a few - and I know I am only speaking for myself - live on my different computers

I use AI/agents in quite similar ways, and even rekindled multiple personal projects that had stalled. However, to borrow OPs parlance, these are not "houses" - more like sheds and tree-houses. They are fun and useful, but not moving the needle on housing stock supply, so to speak.

mkvoid · a day ago

If you recall it, would you mind sharing the blog post that inspired the digital personal assistant?

csande17 · a day ago

It's not 10x, but https://www.ft.com/content/5ac2ee5f-f8bd-4f39-a759-3c5c50c8b... has some graphs suggesting a 1.5x increase in metrics like "number of new apps published in the iOS App Store" and "lines of code committed by US GitHub users".

People haven't noticed because the software industry was already mostly unoriginal slop, even prior to LLMs, and people are good at ignoring unoriginal slop.

arghwhat · a day ago

The real outcome is mostly a change in workflow and a reasonable increase in throughput. There might be a 10x or even 100x increase in creation of tiny tools or apps (yay to another 1000 budget assistant/egg timer/etc. apps on the app/play store), but hardly something one would notice.

To be honest, I think the surrounding paragraph lumps together all anti-AI sentiments.

For example, there is a big difference between "all AI output is slop" (which is objectively false) and "AI enables sloppy people to do sloppy work" (which is objectively true), and there's a whole spectrum.

What bugs me personally is not at all my own usage of these tools, but the increase in workload caused by other people using these tools to drown me in nonsensical garbage. In recent months, the extra workload has far exceeded my own productivity gains.

For the non-technical, imagine a hypochondriac using chatgpt to generate hundreds of pages of "health analysis" that they then hand to their doctor and expect a thorough read and opinion of, vs. the doctor using chatgpt for sparring on a particular issue.

pixl97 · 17 hours ago

>people using these tools to drown me in nonsensical garbage

https://en.wikipedia.org/wiki/Brandolini%27s_law

>The amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it.

8note · 15 hours ago

do you have the right expectations?

rather than new stuff for everyone to use, the future could easily be everyone building their own bespoke tools for their own problems.

PlatoIsADisease · a day ago

You aren't notcing it?

Small and mid sized companies are getting custom software now.

Small software is able to be packed with extra features instead of bare minimum.

n4pw01f · 18 hours ago

I am seeing shit tons of chatbots for everything under the sun being onboarded at corporate

nathias · a day ago

yes, there is a very large increase in TUI tools

I don't trust the idea of "not getting", "not understanding", or "being out of touch" with anti-LLM (or pro-LLM) sentiment. There is nothing complicated about this divide. The pros and cons are both as plain as anything has ever been. You can disagree - even strongly - with either side. You can't "not understand".

slfnflctd · 2 days ago

> There is nothing complicated about this divide [...] You can't "not understand"

I beg to differ. There are a whole lot of folks with astonishingly incomplete understanding about all the facts here who are going to continue to make things very, very complicated. Disagreement is meaningless when the relevant parties are not working from the same assumption of basic knowledge.

illusive4080 · 2 days ago

Bolstering your point, check out the comments in this thread: https://www.reddit.com/r/rust/comments/1qy9dcs/who_has_compl...

There’s a lot of unwillingness to even attempt to try the tools.

derefr · 2 days ago

The negative impacts of generative AI are most sharply being felt by "creatives" (artists, writers, musicians, etc), and the consumers in those markets. If the OP here is 1. a programmer 2. who works solely with other programmers and 3. who is "on the grind", mostly just consuming non-fiction blog-post content related to software development these days, rather than paying much attention to what's currently happening to the world of movies/music/literature/etc... then it'd be pretty easy for them to not be exposed very much to anti-LLM sentiment, since that sentiment is entirely occurring in these other fields that might have no relevance to their (professional or personal) life.

"Anti-LLM sentiment" within software development is nearly non-existent. The biggest kind of push-back to LLMs that we see on HN and elsewhere, is effectively just pragmatic skepticism around the effectiveness/utility/ROI of LLMs when employed for specific use-cases. Which isn't "anti-LLM sentiment" any more than skepticism around the ability of junior programmers to complete complex projects is "anti-junior-programmer sentiment."

The difference between the perspectives you find in the creative professions vs in software dev, don't come down to "not getting" or "not understanding"; they really are a question of relative exposure to these pro-LLM vs anti-LLM ideas. Software dev and the creative professions are acting as entirely separate filter-bubbles of conversation here. You can end up entirely on the outside of one or the other of them by accident, and so end up entirely without exposure to one or the other set of ideas/beliefs/memes.

(If you're curious, my own SO actually has this filter-bubble effect from the opposite end, so I can describe what that looks like. She only hears the negative sentiment coming from the creatives she follows, while also having to dodge endless AI slop flooding all the marketplaces and recommendation feeds she previously used to discover new media to consume. And her job is one you do with your hands and specialized domain knowledge; so none of her coworkers use AI for literally anything. [Industry magazines in her field say "AI is revolutionizing her industry" — but they mean ML, not generative AI.] She has no questions that ChatGPT could answer for her. She doesn't have any friends who are productively co-working with AI. She is 100% out-of-touch with pro-LLM sentiment.)

remich · 2 days ago

I think this is an interesting point, my one area of disagreement is that there is no "anti-LLM sentiment" in the programming community. Sure, plenty of folks expressing skepticism or disagreement are doing so from a genuine place, but just in reading this site and a few email newsletters I get I can say that there is a non-trivial percent in the programming world who are adamantly opposed to LLMs/AI. When I see comments from people in that subset, it's quite clear that they aren't approaching it from a place of skepticism, where they could be convinced given appropriate evidence or experiences.

wasmainiac · a day ago

> "Anti-LLM sentiment" within software development is nearly non-existent.

I see it all the time in professional and personal circles. For one, you are shifting the goalpost on what is “anti-llm”, two, people are talking about the negative social, political and environmental impacts.

What is your source here?

overgard · 2 days ago

> "Anti-LLM sentiment" within software development is nearly non-existent.

Strong disagree right there. I remember talking to a (developer) coworker a few months ago who seemed like the biggest AI proponent on our team. When we were one-on-one during a lunch though, he revealed that he really doesn't like AI that much at all, he's just afraid to speak up against it. I'm in a few Discord channels with a lot of highly skilled (senior and principal programmers) who mostly work in game development (or adjacent), and most of them either mock LLMs or have a lot of derision for it. Hacker News is kind of a weird pro-AI bubble, most other places are not nearly as keen on this stuff.

happytoexplain · 2 days ago

>"Anti-LLM sentiment" within software development is nearly non-existent

This is certainly untrue. I want to say "obviously", which means that maybe I am misunderstanding you. Below are some examples of negative sentiments programmers have - can you explain why you are not counting these?

NOTE: I am not presenting these as an "LLMs are bad" argument. My own feelings go both ways. There is a lot that's great about LLMs, and I don't necessarily agree with every word I've written below - some of it is just my paraphrasing of what other people say. I'm only listing examples of what drives existing anti-LLM sentiment in programmers.

1. Job loss, loss of income, or threat thereof

These two are exacerbated by the pace of change, since so many people already spent their lives and money establishing themselves in the career and can't realistically pivot without becoming miserable - this is the same story for every large, fast change - though arguably this one is very large and very fast even by those standards. Lots of tech leadership is focusing even more than they already were on cheap contractors, and/or pushing employees for unrealistic productivity increases. I.e. it's exacerbating the "fast > good" problem, and a lot of leadership is also overestimating how far it reduces the barrier to creating things, as opposed to mostly just speeding up a person's existing capabilities. Some leadership is also using the apparent loss of job security as leverage beyond salary suppression (even less proportion of remote work allowed, more surveillance, worse office conditions, etc).

2. Happiness loss (in regards to the job itself, not all the other stuff in this list)

This is regarding people who enjoy writing/designing programs but don't enjoy directing LLMs; or who don't enjoy debugging the types of mistakes LLMs tend to make, as opposed to the types of mistakes that human devs tend to make. For these people, it's like their job was forcibly changed to a different, almost unrelated job, which can be miserable depending on why you were good at - or why you enjoyed - the old job.

3. Uncertainty/skepticism

I'm pushing back on your dismissal of this one as "not anti-LLM sentiment" - the comparison doesn't make sense. If I was forced to only review junior dev code instead of ever writing my own code or reviewing experienced dev code, I would be unhappy. And I love teaching juniors! And even if we ignore the subset of cases where it doesn't do a good job or assume it will soon be senior-level for every use case, this still overlaps with the above problem: The mistakes it makes are not like the mistakes a human makes. For some people, it's more unnatural/stressful to keep your eyes peeled for the kinds of mistakes it makes. For these people, it's a shift away from objective, detail-oriented, controlled, concrete thinking; away from the feeling of making something with your hands; and toward a more wishy-washy creation experience that can create a feeling of lack of control.

4. Expertise loss

A lot of positive outcomes with LLMs come from being already experienced. Some argue this will be eroded - both for new devs and existing experienced devs.

5. The training data ownership/morality angle

bitwize · a day ago

Facebook's algorithm has picked up on the idea that "I like art". It has subsequently given me more examples of (human-created) art in my feed. Art from comic book artists, art from manga-style creators, "weird art", even a make-up artist who painted a scene from Where the Wild Things Are on her face.

I like this. What's more, while AI-generated art has a characteristic sameyness to it, the human-produced art stands out in its originality. It has character and soul. Even if it's bad! AI slop has made the human-created stuff seem even more striking by comparison. The market for human art isn't going anywhere, just like the audience for human-played chess went nowhere after Deep Blue. I think people will pay a premium for it, just to distinguish themselves from the slop. The same is true of writing and especially music. I know of no one who likes listening to AI-generated music. Even Sabrina Carpenter would raise less objection.

The same, I'm afraid, cannot be said for software—because there is little value for human expression in the code itself. Code is—almost entirely—strictly utilitarian. So we are now at an inflection point where LLMs can generate and validate code that's nearly as good, if not better, than what we can produce on our own. And to not make use of them is about as silly as Mel Kaye still punching in instruction opcodes in hex into the RPC-4000, while his colleagues make use of these fancy new things called "compilers". They're off building unimaginably more complex software than they could before, but hey, he gets his pick of locations on the rotating memory drum!

I'm one of the nonexistent anti-LLMers when it comes to software. I hate talking to a clanker, whose training data set I don't even have access to let alone the ability to understand how my input affects its output, just to do what I do normally with the neural net I've carried around in my skull and trained extensively for this very purpose. I like working directly with code. Code is not just a product for me; it is a medium of thought and expression. It is a formalized notation of a process that I can use to understand and shape that process.

But with the right agentic loops, LLMs can just do more, faster. There's really no point in resisting. The marginal value of what I do has just dropped to zero.

moron4hire · a day ago

Yeah, "not understanding" means they aren't engaging with the issue honestly. They go on to compare to carpentry, which is a classic sign the speaker understands neither carpentry or software development.

The anti-LLM arguments aren't just "hand tools are more pure." I would even say that isn't even a majority argument. There are plenty more arguments to make about environmental and economic sustainability, correctness, safety, intellectual property rights, and whether there are actual productivity gains distinguishable from placebo.

It's one of the reasons why "I am enjoying programming again" is such a frustrating genre of blog post right now. Like, I'm soooo glad we could fire up some old coal plants so you could have a little treat, Brian from Middle Management.

Deleted Comment

entropyneur · a day ago

xyzsparetimexyz · a day ago

> Pay through the nose for Opus or GPT-7.9-xhigh-with-cheese. Don't worry, it's only for a few years.

> You have to turn off the sandbox, which means you have to provide your own sandbox. I have tried just about everything and I highly recommend: use a fresh VM.

> I am extremely out of touch with anti-LLM arguments

'Just pay out the arse and run models without a sandbox or in some annoying VM just to see them fail. Wait, some people are against this?'

skybrian · 8 hours ago

I'm doing web development in a VM (on exe.dev) and it works quite nicely.

CGamesPlay · a day ago

In case you missed the several links, exe.dev is his startup which provides sandboxing for agents. So it makes sense he wants to get people used to paying for agents and in need of a good sandbox.

dude250711 · a day ago

Well, not 'against' per se, just watching LLM-enthusiasts tumble in the mud for now. Though I have heard that if I don't jump into the mud this instance, I will be left behind apparently for some reason. So you either get left behind or get a muddy behind, your choice.

pferde · a day ago

Everybody keeps saying the models are getting better, the tooling is getting better, people are discovering better practices...

So why not just wait out this insane initial phase, and if anything is left standing afterwards and proves itself, just learn that.

0xbadcafebee · a day ago

Local models are decent now. Qwen3 coder is pretty good and decent speed. I use smaller models (qwen2.5:1.5b) with keyboard shortcuts and speech to text to ask for man page entries, and get 'em back faster than my internet connection and a "robust" frontier model does. And web search/RAG hides a multitude of sins.

"Using anything other than the frontier models is actively harmful" - so how come I'm getting solid results from Copilot and Haiku/Flash? Observe, Orient, Decide, Act, Review, Modify, Repeat. Loops with fancy heuristics, optimized prompts, and decent tools, have good results with most models released in the past year.

mattmanser · a day ago

Have you used the frontier models recently? It's hard to communicate the difference the last 6 months has seen.

We're at the point where copilot is irrelevant. Your way of working is irrelevant. Because that's not how you interact with coding AIs anymore, you're chatting with them about the code outside the IDE.

ncruces · 20 hours ago

I have.

Just this month I've burned through 80% of my Copilot quota of Claude Opus 4.6 in a couple of days to get it to help me with a silly hobby project: https://github.com/ncruces/dbldbl

It did help. The project had been sitting for 3 years without trig and hyperbolic trig, and in a couple days of spare time I'm adding it. Some of it through rubber ducking chat and/or algorithmic papers review (give me formulas, I'll do it), some through agent mode (give me code).

But if you review the PR written in agent mode, the model still lies to my face, in trivial but hard to verify ways. Like adding tests that say cosh(1) is this number at that OEIS link, and both the number and the OEIS link are wrong, but obviously tests pass because it's a lie.

I'm not trying to bash the tech. I use it at work in limited but helpful ways, and use hobby stuff like this as a testbed precisely to try to figure out what they're good at in a low stakes setting.

But you trust the plausibly looking output of these things at your own peril.

0xbadcafebee · 19 hours ago

Just to be clear: I mean Copilot CLI. I had used the IDE, and it was terrible; I tried the CLI, and for some reason it was much better. I explain carefully what I want, and it iterates until it's done, quickly, on cheap models.

If you check the docs, smaller, faster, older models are recommended for 'lightweight' coding. There's several reasons for this. 1) a smaller model doesn't have as good deep reasoning, so it works okay for a simple ask. 2) small context, small task, small model can produce better results than big context, big task, big model. The lost-in-the-middle problem is still unsolved, leading to mistakes that get worse with big context, and longer runs exacerbate issues. So small context/task that ends and starts a new loop (with planning & learning) ends up working really well and quickly.

There's a difference between tasks and problem-solving, though. For difficult problems, you want a frontier reasoning model.

gjulianm · a day ago

Honestly, I've been using the frontier models and I'm not sure where people are seeing these massive improvements. It's not that they're bad, it's just that I don't see that much of an improvement the last 6 months. They're so inconsistent that it's hard to have a clear idea of what's happening. I usually switch between models and I don't see either those massive differences either. Not to mention that sometimes models regress in certain aspects (e.g., I've seen later models that tend to "think" more and end up at the same result but taking far more time and tokens).

timr · a day ago

> Have you used the frontier models recently?

Yes.

> It's hard to communicate the difference the last 6 months has seen.

No, it isn't. The hypebeast discovered Claude code, but hasn't yet realized that the "let the model burn tokens with access to a shell" part is the key innovation, not the model itself.

I can (and do) use GH Copilot's "agent" mode with older generation models, and it's fine. There's no step function of improvement from one model to another, though there are always specific situations where one outperforms. My current go-to model for "sit and spin" mode is actually Grok, and I will splurge for tokens when that doesn't work. Tools and skills and blahblahblah are nice to have (and in fact, part of GH Copilot now), but not at all core to the process.

joefourier · 2 days ago

The author is correct in that agents are becoming more and more capable and that you don't need the IDE to the same extent, but I don't see that as good. I find that IDE-based agentic programming actually encourages you to read and understand your codebase as opposed to CLI-based workflows. It's so much easier to flip through files, review the changes it made, or highlight a specific function and give it to the agent, as opposed to through the CLI where you usually just give it an entire file by typing the name, and often you just pray that it manages to find the context by itself. My prompts in Cursor are generally a lot more specific and I get more surgical results than with Claude Code in the terminal purely because of the convenience of the UX.

But secondly, there's an entire field of LLM-assisted coding that's being almost entirely neglected and that's code autocomplete models. Fundamentally they're the same technology as agents and should be doing the same thing: indexing your code in the background, filtering the context, etc, but there's much less attention and it does feel like the models are stagnating.

I find that very unfortunate. Compare the two workflows:

With a normal coding agent, you write your prompt, then you have to at least a full minute for the result (generally more, depending on the task), breaking your flow and forcing you to task-switch. Then it gives you a giant mass of code and of course 99% of the time you just approve and test it because it's a slog to read through what it did. If it doesn't work as intended, you get angry at the model, retry your prompt, spending a larger amount of tokens the longer your chat history.

But with LLM-powered auto-complete, when you want, say, a function to do X, you write your comment describing it first, just like you should if you were writing it yourself. You instantly see a small section of code and if it's not what you want, you can alter your comment. Even if it's not 100% correct, multi-line autocomplete is great because you approve it line by line and can stop when it gets to the incorrect parts, and you're not forced to task switch and you don't lose your concentration, that great sense of "flow".

Fundamentally it's not that different from agentic coding - except instead of prompting in a chatbox, you write comments in the files directly. But I much prefer the quick feedback loop, the ability to ignore outputs you don't want, and the fact that I don't feel like I'm losing track of what my code is doing.

coffeefirst · 2 days ago

The other thing about non-agent workflows is they’re much, much less compute intensive. This is going to matter.

wavemode · a day ago

I agree with you wholeheartedly. It seems like a lot of the work on making AI autocomplete better (better indexing, context management, codebase awareness, etc) has stagnated in favor of full-on agentic development, which simply isn't suited for many kinds of tasks.

dagss · 2 days ago

    But if you try some penny-saving cheap model like Sonnet [..bad things..]. [Better] pay through the nose for Opus.

After blowing $800 of my bootstrap startup funds for Cursor with Opus for myself in a very productive January I figured I had to try to change things up... so this month I'm jumping between Claude Code and Cursor, sometimes writing the plans and having the conversation in Cursor and dump the implementation plan into Claude.

Opus in Cursor is just so much more responsive and easy to talk to, compared to Opus in Claude.

Cursor has this "Auto" mode which feels like it has very liberal limits (amortized cost I guess) that I'm also trying to use more, but -- I don't really like to flip a coin and if it lands up head then waste half hour discovering the LLM made a mess the LLM and try again forcing the model.

Perhaps in March I'll bite the bullet and take this authors advice.

written-beyond · 2 days ago

Just use Codex 5.3 in codex cli, the $20/mo plan is basically limitless at least for me and I keep reasoning efforts high.

You can enjoy it while it lasts, OpenAI is being very liberal with their limits because of CC eating their lunch rn.

rmonvfer · 2 days ago

Yeah, I can’t recommend gpt-5.3-codex enough, it’s great! I’ve been using it with the new macOS app and I’m impressed. I’ve always been a Claude Code guy and I find myself using codex more and more. Opus is still much nicer explaining issues and walking me through implementations but codex is faster (even with xhigh effort) and gets the job done 95% of the time.

I was spending unholy amounts of money and tokens (subsidized cloud credits tho) forcing Opus for everything but I’m very happy with this new setup. I’ve also experimented with OpenCode and their Zen subscription to test Kimi K2.5 an similar models and they also seem like a very good alternative for some tasks.

What I cannot stand tho is using sonnet directly (it’s fine as a subagent), I’ve found it to be hard to control and doesn’t follow detailed instructions.

dagss · a day ago

Thanks a lot, after today I have fully switched to Codex I think.

This vscode extension makes it almost as easy to point codex to something as when doing it in cursor:

https://github.com/suzukenz/vscode-copy-selection-with-line-...

baq · a day ago

+1, codex 5.2 was really good and 5.3 seems to be better at everything; caveat - I had little time to test it.

dakolli · a day ago

I promise you you're just going to continue to light money on fire. Don't fall for this token madness, the bigger your project gets, the less capable the llm will get and the more you spend per request on average. This is literally all marketing tricks by inference providers. Save your money and code it yourself, or use very inexpensive llm methods if you must.

I think we are going to start hearing stories of people going into thousands in CC debt because they were essentially gambling with token usage thinking they would hit some startup jackpot.

Compared to the salary I loose by not taking a consulting gig for half a year, these $800 arent't all that much. (I guess depending on definition of bootstrap, mine might not be, as I support myself with saved consulting income.)

Startup is a gamble with or without the LLM costs.

I have been coding for 20 years, I have a good feel for how much time I would have spent without LLM assistance. And if LLMs vanish from the face of the earth tomorrow, I still saved myself that time.

otabdeveloper4 · a day ago

You can rent a GPU server and run your own Qwen models.

It's 90 percent the same thing as Claude but with flat-rate costs.

symfrog · a day ago

Any sufficiently complicated LLM generated program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of an open source project.

kakacik · a day ago

We had an effort recently where one much more experienced dev from our company ran Claude on our oldish codebase for one system, with the goal of transforming it into newer structure, newer libraries etc. while preserving various built in functionalities. Not the first time this guy did such a thing and he is supposed to be an expert.

I took a look at the result and its maybe half of stuff missing completely, rest is cryptic. I know that codebase by heart since I created it. From my 20+ years of experience correcting all this would take way more effort than manual rewrite from scratch by a senior. Suffice to say thats not what upper management wants to hear, llm adoption often became one of their yearly targets to be evaluated against. So we have a hammer and looking for nails to bend and crook.

Suffice to say this effort led nowhere since we have other high priority goals, for now. Smaller things here & there, why not. Bigger efforts, so far sawed-off 2-barrel shotgun loaded with buckshot right into both feet.

kjksf · a day ago

Not to take away from your experience but to offer a counterpoint.

I used claude code to port rust pdb parsing library to typescript.

My SumatraPDF is a large C++ app and I wanted visibility into where does the size of functions / data go, layout of classes. So I wanted to build a tool to dump info out of a PDB. But I have been diagnosed with extreme case of Rustophobiatis so I just can't touch rust code. Hence, the port to typescript.

With my assistance it did the work in an afternoon and did it well. The code worked. I ran it against large PDB from SumatraPDF and it matched the output of other tools.

In a way porting from one language to another is extreme case of refactoring and Claude did it very well.

I think that in general (your experience notwithstanding) Claude Caude is excellent at refactorings.

Here are 3 refactorings from SumatraPDF where I asked claude code to simplify code written by a human:

https://github.com/sumatrapdfreader/sumatrapdf/commit/a472d3...https://github.com/sumatrapdfreader/sumatrapdf/commit/5624aa...https://github.com/sumatrapdfreader/sumatrapdf/commit/a40bc9...

I hope you agree the code written by Claude is better than the code written by a human.

Granted, those are small changes but I think it generalizes into bigger changes. I have few refactorings in mind I wanted to do for a long time and maybe with Claude they will finally be feasible (they were not feasible before only because I don't have infinite amount of time to do everything I want to do).

cactusplant7374 · a day ago

All software before LLMs had a copious number of bugs, many of which were never fixed.

dmk · 2 days ago

The real insight buried in here is "build what programmers love and everyone will follow." If every user has an agent that can write code against your product, your API docs become your actual product. That's a massive shift.

anthuswilliams · 2 days ago

I'm very much looking forward to this shift. It is SO MUCH more pro-consumer than the existing SaaS model. Right now every app feels like a walled garden, with broken UX, constant redesigns, enormous amounts of telemetry and user manipulation. It feels like every time I ask for programmatic access to SaaS tools in order to simplify a workflow, I get stuck in endless meetings with product managers trying to "understand my use case", even for products explicitly marketed to programmers.

Using agents that interact with APIs represents people being able to own their user experience more. Why not craft a frontend that behaves exactly the the way YOU want it to, tailor made for YOUR work, abstracting the set of products you are using and focusing only on the actual relevant bits of the work you are doing? Maybe a downside might be that there is more explicit metering of use in these products instead of the per-user licensing that is common today. But the upside is there is so much less scope for engagement-hacking, dark patterns, useless upselling, and so on.

pjc50 · a day ago

> Right now every app feels like a walled garden, with broken UX, constant redesigns, enormous amounts of telemetry and user manipulation

OK, but: that's an economic situation.

> so much less scope for engagement-hacking, dark patterns, useless upselling, and so on.

Right, so there's less profit in it.

To me it seems this will make the market more adversarial, not less. Increasing amounts of effort will be expended to prevent LLMs interacting with your software or web pages. Or in some cases exploit the user's agentic LLM to make a bad decision on their behalf.

13pixels · 21 hours ago

This extends further than most people realize. If agents are the primary consumers of your product surface, then the entire discoverability layer shifts too. Right now Google indexes your marketing page -- soon the question is whether Claude or GPT can even find and correctly describe what your product does when a user asks.

We're already seeing this with search. Ask an LLM "what tools do X" and the answer depends heavily on structured data, citation patterns, and how well your docs/content map to the LLM's training. Companies with great API docs but zero presence in the training data just won't exist to these agents.

So it's not just "API docs = product" -- it's more like "machine-legible presence = existence." Which is a weird new SEO-like discipline that barely has a name yet.