Claude Sonnet 4 now supports 1M tokens of context

This is definitely one of my CORE problem as I use these tools for "professional software engineering." I really desperately need LLMs to maintain extremely effective context and it's not actually that interesting to see a new model that's marginally better than the next one (for my day-to-day).

However. Price is king. Allowing me to flood the context window with my code base is great, but given that the price has substantially increased, it makes sense to better manage the context window into the current situation. The value I'm getting here flooding their context window is great for them, but short of evals that look into how effective Sonnet stays on track, it's not clear if the value actually exists here.

ants_everywhere · 6 months ago

> I really desperately need LLMs to maintain extremely effective context

The context is in the repo. An LLM will never have the context you need to solve all problems. Large enough repos don't fit on a single machine.

There's a tradeoff just like in humans where getting a specific task done requires removing distractions. A context window that contains everything makes focus harder.

For a long time context windows were too small, and they probably still are. But they have to get better at understanding the repo by asking the right questions.

onion2k · 6 months ago

Large enough repos don't fit on a single machine.

I don't believe any human can understand a problem if they need to fit the entire problem blem domain in their head, and the scope of a domain that doesn't fit on a computer. You have to break it down into a manageable amount of information to tackle it in chunks.

If a person can do that, so can an LLM prompted to do that by a person.

sdesol · 6 months ago

> But they have to get better at understanding the repo by asking the right questions.

How I am tackling this problem is making it dead simple for users to create analyzers that are designed to enriched text data. You can read more about how it would be used in a search at https://github.com/gitsense/chat/blob/main/packages/chat/wid...

The basic idea is, users would construct analyzers with the help of LLMs to extract the proper metadata that can be semantically searched. So when the user does an AI Assisted search with my tool, I would load all the analyzers (description and schema) into the system prompt and the LLM can determine which analyzers can be used to answer the question.

A very simplistic analyzer would be to make it easy to identify backend and frontend code so you can just use the command `!ask find all frontend files` and the LLM will construct a deterministic search that knows to match for frontend files.

stuartjohnson12 · 6 months ago

> An LLM will never have the context you need to solve all problems.

How often do you need more than 10 million tokens to answer your query?

mock-possum · 6 months ago

> The context is in the repo

Agreed but that’s a bit different from “the context is the repo”

It’s been my experience that usually just picking a couple files out to add to the context is enough - Claude seems capable of following imports and finding what it needs, in most cases.

I’m sure it depends on the task, and the structure of the codebase.

injidup · 6 months ago

All the more reason for good software engineering. Folders of files managing one concept. Files tightly focussed on sub problems of that concept. Keep your code so that you can solve problems in self contained context windows at the right level of abstraction

manmal · 6 months ago

> The context is in the repo

No it’s in the problem at hand. I need to load all related files, documentation, and style guides into the context. This works really well for smaller modules, but currently falls apart after a certain size.

alvis · 6 months ago

Everything in context hurts focus. It's like some people suffering from hyperthymesia. They are easily get distracted when the recall something

rootnod3 · 6 months ago

Flooding the context also means increasing the likelihood of the LLM confusing itself. Mainly because of the longer context. It derails along the way without a reset.

Wowfunhappy · 6 months ago

I keep reading this, but with Claude Code in particular, I consistently find it gets smarter the longer my conversations go on, peaking right at the point where it auto-compacts and everything goes to crap.

This isn't always true--some conversations go poorly and it's better to reset and start over--but it usually is.

aliljet · 6 months ago

How do you know that?

alexchamberlain · 6 months ago

I'm not sure how, and maybe some of the coding agents are doing this, but we need to teach the AI to use abstractions, rather than the whole code base for context. We as humans don't hold the whole codebase in our hear, and we shouldn't expect the AI to either.

LinXitoW · 6 months ago

They already do, or at least Claude Code does. It will search for a method name, then only load a chunk of that file to get the method signature, for example.

It will use the general information you give it to make educated guesses of where things are. If it knows the code is Vue based and it has to do something with "users", it might seach for "src/*/User.vue.

This is also the reason why the quality of your code makes such a large difference. The more consistent the naming of files and classes, the better the AI is at finding them.

sdesol · 6 months ago

LLMs (current implementation) are probabilistic so it really needs the actual code to predict the most likely next tokens. Now loading the whole code base can be a problem in itself, since other files may negatively affect the next token.

anthonypasq · 6 months ago

the fact we cant keep the repo in our working memory is a flaw of our brains. i cant see how you could possibly make the argument that if you were somehow able to keep the entire codebase in your head that it would be a disadvantage.

Deleted Comment

siwatanejo · 6 months ago

I do think AIs are already using abstractions, otherwise you would be submitting all the source code of your dependencies into the context.

F7F7F7 · 6 months ago

There are a billion and one repos that claim to help do this. Let us know when you find one.

throwaway314155 · 6 months ago

/compact in Claude Code is effectively this.

HarHarVeryFunny · 6 months ago

Even 1 MB context is only roughly 20K LOC so pretty limiting, especially if you're also trying to fit API documents or any other lengthy material into the context.

Anthropic also recently said that they think that longer/compressed context can serve as an alternative (not sure what was the exact wording/characterization they used) to continual/incremental learning, so context space is also going to be competing with model interaction history if you want to avoid groundhog day and continually having to tell/correct the model the same things over and over.

It seems we're now firmly in the productization phase of LLM development, as opposed to seeing much fundamental improvement (other than math olympiad etc "benchmark" results, released to give the impression of progress). Yannic Kilcher is right, "AGI is not coming", at least not in the form of an enhanced LLM. Demis Hassabis' very recent estimate was for 50% chance of AGI by 2030 (i.e. still 15 years out).

While we're waiting for AGI, it seems a better approach to needing everything in context would be to lean more heavily on tool use, perhaps more similar to how a human works - we don't memorize the entire code base (at least not in terms of complete line-by-line detail, even though we may have a pretty clear overview of a 10K LOC codebase while we're in the middle of development) but rather rely on tools like grep and ctags to locate relevant parts of source code on an as-needed basis.

km144 · 6 months ago

As you alluded to at the end of your post—I'm not really convinced 20k LOC is very limiting. How many lines of code can you fit in your working mental model of a program? Certainly less than 20k concrete lines of text at any given time.

In your working mental model, you have broad understandings of the broader domain. You have broad understandings of the architecture. You summarize broad sections of the program into simpler ideas. module_a does x, module_b does y, insane file c does z, and so on. Then there is the part of the software you're actively working on, where you need more concrete context.

So as you move towards the central task, the context becomes more specific. But the vague outer context is still crucial to the task at hand. Now, you can certainly find ways to summarize this mental model in an input to an LLM, especially with increasing context windows. But we probably need to understand how we would better present these sorts of things to achieve performance similar to a human brain, because the mechanism is very different.

HarHarVeryFunny · 6 months ago

Just as a self follow-up, another motivation to lean on tool use rather than massive context (cf. short-term memory) is to keep LLM/AI written/modified code understandable to humans ...

At least part of the reason that humans use hierarchical decomposition and divide-and-conquor is presumably because of our own limited short term memory, since hierarchical organization (modules, classes, methods, etc) allows us to work on a problem at different levels of abstraction while only needing to hold that level of the hierarchy in memory.

Imagine what code might look like if written by something with no context limit - just a flat hierarchy of functions, perhaps, at least until it perhaps eventually learned, or was told, the other reasons for hierarchical and modular design/decomposition to assist in debugging and future enhancement, etc!

aorobin · 6 months ago

>"Demis Hassabis' very recent estimate was for 50% chance of AGI by 2030 (i.e. still 15 years out)."

2030 is only 5 years out

brookst · 6 months ago

1M tokens ~= 3.5M characters ~= 58k LOC at an average of 60 chars/line. 88k LOC at 40 chars/line

benterix · 6 months ago

> it's not clear if the value actually exists here.

Having spent a couple of weeks on Claude Code recently, I arrived to the conclusion that the net value for me from agentic AI is actually negative.

I will give it another run in 6-8 months though.

ericmcer · 6 months ago

Agreed, daily Cursor user.

Just got out of a 15m huddle with someone trying to understand what they were doing in a PR before they admitted Claude generated everything and it worked but they weren't sure why... Ended up ripping about 200 LoC out because what Claude "fixed" wasn't even broken.

So never let it generate code, but the autocomplete is absolutely killer. If you understand how to code in 2+ languages you can make assumptions about how to do things in many others and let the AI autofill the syntax in. I have been able to swap to languages I have almost no experience in and work fairly well because memorizing syntax is irrelevant.

cambaceres · 6 months ago

For me it’s meant a huge increase in productivity, at least 3X.

Since so many claim the opposite, I’m curious to what you do more specifically? I guess different roles/technologies benefit more from agents than others.

I build full stack web applications in node/.net/react, more importantly (I think) is that I work on a small startup and manage 3 applications myself.

flowerthoughts · 6 months ago

What type of work do you do? And how do you measure value?

Last week I was using Claude Code for web development. This week, I used it to write ESP32 firmware and a Linux kernel driver. Sure, it made mistakes, but the net was still very positive in terms of efficiency.

greenie_beans · 6 months ago

same. agents are good with easy stuff and debugging but extremely bad with complexity. has no clue about chesterson's fence, and it's hard to parse the results especially when it creates massive diffs. creates a ton of abandoned/cargo code. lots of misdirection with OOP.

chatting witch claude and copy/pasting code between my IDE and claude is still the most effective for more complex stuff, at least for me.

mikepurvis · 6 months ago

For a bit more nuance, I think I would my overall net is about break even. But I don't take that as "it's not worth it at all, abandon ship" but rather that I need to hone my instinct of what is and is not a good task for AI involvement, and what that involvement should look like.

Throwing together a GHA workflow? Sure, make a ticket, assign it to copilot, check in later to give a little feedback and we're golden. Half a day of labour turned into fifteen minutes.

But there are a lot of tasks that are far too nuanced where trying to take that approach just results in frustration and wasted time. There it's better to rely on editor completion or maybe the chat interface, like "hey I want to do X and Y, what approach makes sense for this?" and treat it like a rubber duck session with a junior colleague.

mark_l_watson · 6 months ago

I am sort of with you. I am down to asking Gemini Pro a couple of questions a day, use ChatGPT just a few times a week, and about once a week use gemini-cli (either a short free session, or a longer session where I provide my API key.)

That said I spend (waste?) an absurdly large amount of time each week experimenting with local models (sometimes practical applications, sometimes ‘research’).

9cb14c1ec0 · 6 months ago

The more I use Claude Code, the more aware I become of its limitations. On the whole, it's a useful tool, but the bigger the codebase the less useful. I've noticed a big difference on its performance on projects with 20k lines of code versus 100k. (Yes, I know. A 100k line project is still very small in the big picture)

meowtimemania · 6 months ago

For me it depends on the task. For some tasks (maybe things that don't have good existing examples in my codebase?)

I'll spend 3x the time repeatedly asking claude to do something for me

revskill · 6 months ago

Truth. To some extend, the agent doesn't know what it's doing at all, it lacks real brain, maybe we should just treat them as the hard worker.

jmartrican · 6 months ago

Maybe that is a skills issue.

sorhaindop · 6 months ago

This exact phrase has been said by 3 different users... weird.

wahnfrieden · 6 months ago

Did you try with using Opus exclusively?

TZubiri · 6 months ago

"However. Price is king. Allowing me to flood the context window with my code base is great"

I don't vibe code, but in general having to know all of the codebase to be able to do something is a smell, it's spagghetti, it's lack of encapsulation.

When I program I cannot think about the whole database, I have a couple of files open tops and I think about the code in those files.

This issue of having to understand the whole codebase, complaining about abstractions, microservices, and OOP, and wanting everything to be in a "simple" monorepo, or a monolith; is something that I see juniors do, almost exclusively.

seanmmward · 6 months ago

The primary use case isn't just about shoving more code in context, although depending on the task, there is an irredicible minimum context needed for it to capture all the needed understanding. The 1M context model is a unique beast in terms of how you need to feed it, and its real power is being able to tackle long horizon tasks which require iterative exploration, in context learning, and resynthesis. Ie, some problems are breadth (go fix an api change in 100 files), other however require depth (go learn from trying 15 different ways to solve this problem). 1M Sonnet is unique in its capabilities for the latter in particular.

sdesol · 6 months ago

> I really desperately need LLMs to maintain extremely effective context

I actually built this. I'm still not ready to say "use the tool yet" but you can learn more about it at https://github.com/gitsense/chat.

The demo link is not up yet as I need to finalize an admin tool but you should be able to follow the npm instructions to play around with.

The basic idea is, you should be able to load your entire repo or repos and use the context builder to help you refine it. Or you can can create custom analyzers that you can do 'AI Assisted' searches with like execute `!ask find all frontend code that does [this]` and the because the analyzer knows how to extract the correct metadata to support that query, you'll be able to easily build the context using it.

hirako2000 · 6 months ago

Not clear how it gets around what is, ultimately, a context limit.

I've been fiddling with some process too, would be good if you shared the how. The readme looks like yet another full fledged app.

handfuloflight · 6 months ago

Doesn't Claude Code do all of this automatically?

msikora · 6 months ago

Why not build this as an MCP so that people can plug it into their favorite platform?

kvirani · 6 months ago

Wait that's not how Cursor etc work? (I made assumptions)

hinkley · 6 months ago

Sounds to me like your problem has shifted from how much the AI tool costs per hour to how much it costs per token because resetting a model happens often enough that the price doesn't amortize out per hour. That giant spike every ?? months overshadows the average cost per day.

I wonder if this will become more universal, and if we won't see a 'tick-tock' pattern like Intel used, where they tweak the existing architecture one or more times between major design work. The 'tick' is about keeping you competitive and the 'tock' is about keeping you relevant.

scotty79 · 6 months ago

Maybe use a cheaper model to compose a relevant context for the more expensive one?

Even better, use expensive model to create a general set of guidelines for picking the right context for your project, that the cheaper model will use in the future to pick the right context.

khalic · 6 months ago

This is a major issue with LLMs altogether, it probably has to do with the transformer architecture. We need another breakthrough in the field for this to become reality.

jack_pp · 6 months ago

maybe we need LLMs trained on ASTs or create a new symbolic way to represent software that's faster to grok by LLMs and have a translator so we can verify the code

energy123 · 6 months ago

You could probably build a decent agentic harness that achieves something similar.

Show the LLM a tree and/or call-graph representation of your codebase (e.g. `cargo diagram` and `cargo-depgraph`), which is token efficient.

And give the LLM a tool call to see the contents of the desired subtree. More precise than querying a RAG chunk or a whole file.

You could also have another optional tool call which routes the text content of the subtree through a smaller LLM that summarizes it into a maximum density snippet, which the LLM can use for a token efficient understanding of that subtree during early the planning phase.

But I'd agree that an LLM built natively around AST is a pretty cool idea.

dberge · 6 months ago

> the price has substantially increased

I’m assuming the credits required per use won’t increase in Cursor.

Hopefully this puts pressure on them to lower credits required for gpt-5.

NuclearPM · 6 months ago

Problems

Deleted Comment

fgbarben · 6 months ago

Allow me to flood the fertile plains of its consciousness with my seed... yes, yes, let it take root... this is important to me

fgbarben · 6 months ago

Let me despoil the rich geography of your context window with my corrupted b2b SaaS workflows and code... absorb the pollution, rework it, struggling against the weight... yes, this pleases me, it is essential for the propagation of my germline

1. Build context for the work you're doing. Put lots of your codebase into the context window. 2. Do work, but at each logical stopping point hit double escape to rewind to the context-filled checkpoint. You do not spend those tokens to rewind to that point. 3. Tell Claude your developer finished XYZ, have it read it into context and give high level and low level feedback (Claude will find more problems with your developer's work than with yours).

My experience with the current tools so far:

1. It helps to get me going with new languages, frameworks, utilities or full green field stuff. After that I expend a lot of time parsing the code to understand what it wrote that I kind of "trust" it because it is too tedious but "it works".

2. When working with languages or frameworks that I know, I find it makes me unproductive, the amount of time I spend writing a good enough prompt with the correct context is almost the same or more that if I write the stuff myself and to be honest the solution that it gives me works for this specific case but looks like a junior code with pitfalls that are not that obvious unless you have the experience to know it.

I used it with Typescript, Kotlin, Java and C++, for different scenarios, like websites, ESPHome components (ESP32), backend APIs, node scripts etc.

Botton line: usefull for hobby projects, scripts and to prototypes, but for enterprise level code it is not there.

brulard · 6 months ago

For me it was like this for like a year (using Cline + Sonnet & Gemini) until Claude Code came out and until I learned how to keep context real clean. The key breakthrough was treating AI as an architect/implementer rather than a code generator.

Most recently I ask first CC to create a design document for what we are going to do. He has instructions to look into the relevant parts of the code and docs to reference them. I review it and few back-and-forths we have defined what we want to do. Next step is to chunk it into stages and even those to smaller steps. All this may take few hours, but after this is well defined, I clear the context. I then let him read the docs and implement one stage. This goes mostly well and if it doesn't I either try to steer him to correct it, or if it's too bad, I improve the docs and start this stage over. After stage is complete, we commit, clear context and proceed to next stage.

This way I spend maybe a day creating a feature that would take me maybe 2-3. And at the end we have a document, unit tests, storybook pages, and features that gets overlooked like accessibility, aria-things, etc.

At the very end I like another model to make a code review.

Even if this didn't make me faster now, I would consider it future-proofing myself as a software engineer as these tools are improving quickly

imiric · 6 months ago

This is a common workflow that most advanced users are familiar with.

Yet even following it to a T, and being really careful with how you manage context, the LLM will still hallucinate, generate non-working code, steer you into wrong directions and dead ends, and just waste your time in most scenarios. There's no magical workflow or workaround for avoiding this. These issues are inherent to the technology, and have been since its inception. The tools have certainly gotten more capable, and the ecosystem has matured greatly in the last couple of years, but these issues remain unsolved. The idea that people who experience them are not using the tools correctly is insulting.

I'm not saying that the current generation of this tech isn't useful. I've found it very useful for the same scenarios GP mentioned. But the above issues prevent me from relying on it for anything more sophisticated than that.

aatd86 · 6 months ago

For me it's the opposite. As long as I ask for small tasks, or error checking, it can help. But I'd rather think of the overall design myself because I tend to figure out corner cases or superlinear complexities much better. I develop better mental models than the NNs. That's somewhat of a relief.

Also the longer the conversation goes, the less effective it gets. (saturated context window?)

john-tells-all · 6 months ago

I've seen this referred to as Chain of Thought. I've used it with great success a few times.

https://martinfowler.com/articles/2023-chatgpt-xu-hao.html

ramshanker · 6 months ago

Same here. A small variation: I explicitly use website to manage what context it gets to see.

viccis · 6 months ago

I agree. For me it's a modern version of that good ol "rails new" scaffolding with Ruby on Rails that got you started with a project structure. It makes sense because LLMs are particularly good at tasks that require little more knowledge than just a near perfect knowledge of the documentation of the tooling involved, and creating a well organized scaffold for a greenfield project falls squarely in that area.

For legacy systems, especially ones in which a lot of the things they do are because of requirements from external services (whether that's tech debt or just normal growing complexity in a large connected system), it's less useful.

And for tooling that moves fast and breaks things (looking at you, Databricks), it's basically worthless. People have already brought attention to the fact that it will only be as current as its training data was, and so if a bunch of terminology, features, and syntax have changed since then (ahem, Databricks), you would have to do some kind of prompt engineering with up to date docs for it to have any hope of succeeding.

pvorb · 6 months ago

I'm wondering what exact issue you are referring to with Databricks? I can't remember a time I had to change a line I wrote during the past 2.5 years I've been using it. Or are you talking about non-breaking changes?

jeremywho · 6 months ago

My workflow is to use Claude desktop with the filesystem mcp server.

I give claude the full path to a couple of relevant files related to the task at hand, ie where the new code should hook into or where the current problem is.

Then I ask it to solve the task.

Claude will read the files, determine what should be done and it will edit/add relevant files. There's typically a couple of build errors I will paste back in and have it correct.

Current code patterns & style will be maintained in the new code. It's been quite impressive.

This has been with Typescript and C#.

I don't agree that what it has produced for me is hobby-grade only...

taberiand · 6 months ago

I've been using it the same way. One approach that's worked well for me is to start a project and first ask it to analyse and make a plan with phases for what needs to be done, save that plan into the project, then get it to do each phase in sequence. Once it completes a phase, have it review the code to confirm if the phase is complete. Each phase of work and review is a new chat.

This way helps ensure it works on manageable amounts of code at a time and doesn't overload its context, but also keeps the bigger picture and goal in sight.

hamandcheese · 6 months ago

Any particular reason you prefer that over Claude code?

JyB · 6 months ago

That's exactly how you should do it. You can also plug in an MCP for your CI or mention cli.github.com in your prompt to also make it iterate on CI failures.

Next you use claude code instead and you make several work on their own clone on their own workspace and branches in the background; So you can still iterate yourself on some other topic on your personal clone.

Then you check out its tab from time to time and optionally checkout its branch if you'd rather do some updates yourself. It's so ingrained in my day-to-day flow now it's been super impressive.

nwatson · 6 months ago

One can also integrate with, say, a running PyCharm with the Jetbrains IDE MCP server. Claude Desktop can then interact directly with PyCharm.

alfalfasprout · 6 months ago

The bigger problem I'm seeing is engineers that become over reliant on vibe coding tools are starting to lose context on how systems are designed and work.

As a result, their productivity might go up on simple "ticket like tasks" where it's basically just simple implementation (find the file(s) to edit, modify it, test it) but when they start using it for all their tasks suddenly they don't know how anything works. Or worse, they let the LLM dictate and bad decisions are made.

These same people are also very dogmatic on the use of these tools. They refuse to just code when needed.

Don't get me wrong, this stuff has value. But I just hate seeing how it's made many engineers complacent and accelerated their ability to add to tech debt like never before.

pqs · 6 months ago

I'm not a programmer, but I need to write python and bash programs to do my work. I also have a few websites and other personal projects. Claude Code helps me implement those little projects I've been wanting to do for a very long time, but I couldn't due to the lack of coding experience and time. Now I'm doing them. Also now I can improve my emacs environment, because I can create lisp functions with ease. For me, this is the perfect tool, because now I can do those little projects I couldn't do before, making my life easier.

chamomeal · 6 months ago

LLMs totally kick ass for making bash scripts

zingar · 6 months ago

Big +1 to customizing emacs! Used to feel so out of reach, but now I basically rolled my own cursor.

dekhn · 6 months ago

For context I'm a principal software engineer who has worked in and out of machine learning for decades (along with a bunch of tech infra, high performance scientific computing, and a bunch of hobby projects).

In the few weeks since I've started using Gemini/ChatGPT/Claude, I've

1. had it read my undergrad thesis and the paper it's based on, implementing correct pytorch code for featurization and training, along wiht some aspects of the original paper that I didn't include in my thesis. I had been waiting until retirement until taking on this task.

2. had it write a bunch of different scripts for automating tasks (typically scripting a few cloud APIs) which I then ran, cleaning up a long backlog of activities I had been putting off.

3. had it write a yahtzee game and implement a decent "pick a good move" feature . It took a few tries but then it output a fully functional PyQt5 desktop app that played the game. It beat my top score of all time in the first few plays.

4. tried to convert the yahtzee game to an android app so my son and I could play. This has continually failed on every chat agent I've tried- typically getting stuck with gradle or the android SDK. This matches my own personal experience with android.

5. had it write python and web-based g-code senders that allowed me to replace some tools I didn't like (UGS). Adding real-time vis of the toolpath and objects wasn't that hard either. Took about 10 minutes and it cleaned up a number of issues I saw with my own previous implementations (multithreading). It was stunning how quickly it can create fully capable web applications using javascript and external libraries.

6. had it implement a gcode toolpath generator for basic operations. At first I asked it to write Rust code, which turned out to be an issue (mainly because the opencascade bindings are incomplete), it generated mostly functional code but left it to me to implement the core algorithm. I asked it to switch to C++ and it spit out the correct code the first time. I spent more time getting cmake working on my system than I did writing the prompt and waiting for the code.

7. had it Write a script to extract subtitles from a movie, translate them into my language, and re-mux them back into the video. I was able to watch the movie less than an hour after having the idea- and most of that time was just customizing my prompt to get several refinements.

8. had it write a fully functional chemistry structure variational autoencoder that trains faster and more accurate than any I previously implemented.

9. various other scientific/imaging/photography related codes, like impleemnting multi-camera rectification, so I can view obscured objects head-on from two angled cameras.

With a few caveats (Android projects, Rust-based toolpath generation), I have been absolutely blown away with how effective the tools are (especially used in a agent which has terminal and file read/write capabilities). It's like having a mini-renaissance in my garage, unblocking things that would have taken me a while, or been so frustrating I'd give up.

I've also found that AI summaries in google search are often good enough that I don't click on links to pages (wikipedia, papers, tutorials etc). The more experience I get, the more limitations I see, but many of those limitations are simply due to the extraordinary level of unnecessary complexity required to do nearly anything on a modern computer (see my comments about about Android apps & gradle).

MangoCoffee · 6 months ago

At the end of the day, all tools are made to make their users' lives easier.

I use GitHub Copilot. I recently did a vibe code hobby project for a command line tool that can display my computer's IP, hard drive, hard drive space, CPU, etc. GPT 4.1 did coding and Claude did the bug fixing.

The code it wrote worked, and I even asked it to create a PowerShell script to build the project for release

apimade · 6 months ago

Many who say LLMs produce “enterprise-grade” code haven’t worked in mid-tier or traditional companies, where projects are held together by duct tape, requirements are outdated, and testing barely exists. In those environments, enterprise-ready code is rare even without AI.

For developers deeply familiar with a codebase they’ve worked on for years, LLMs can be a game-changer. But in most other cases, they’re best for brainstorming, creating small tests, or prototyping. When mid-level or junior developers lean heavily on them, the output may look useful.. until a third-party review reveals security flaws, performance issues, and built-in legacy debt.

That might be fine for quick fixes or internal tooling, but it’s a poor fit for enterprise.

bityard · 6 months ago

I work in the enterprise, although not as a programmer, but I get to see how the sausage is made. And describing code as "enterprise grade" would not be a compliment in my book. Very analogous to "contractor grade" when describing home furnishings.

typpilol · 6 months ago

I've found having a ton of linting tools can help the AI write much better and secure code.

My eslint config is a mess but the code it writes comes out pretty good. Although it makes a few iterations after the lint errors pop for it to rewrite it, the code it writes is way better.

Aeolun · 6 months ago

Umm, Claude Code is a lot better than a lot of enterprise grade code I see. And it actually learns from mistakes with a properly crafted instruction xD

hoppp · 6 months ago

I used it with Tyopescript and Go, SQL, Rust

Using it with rust is just horrible imho. Lots and lots of errors, I cant wait to stop this rust project already. But the project itself is quite complex

Go on the other hand is super productive, mainly because the language is already very simple. I can move 2x fast

Typescript is fine, I use it for react components and it will do animations Im lazy to do...

SQL and postgresql is fine, I can do it without it also, I just dont like to write stored functions cuz of the boilerplatey syntax, a little speed up saves me from carpal tunnel

jiggawatts · 6 months ago

Something I’ve discovered is that it may be worthwhile writing the prompt anyway, even for a framework you’re an expert with. Sometimes the AIs will surprise me with a novel approach, but the real value is that the prompt makes for excellent documentation of the requirements! It’s a much better starting point for doc-comments or PR blurbs than after-the-fact ramblings.

epolanski · 6 months ago

I really find your experience strikingly different than mine, I'll share you my flow:

- step A: ask AI to write a featureA-requirements.md file at the root of the project, I give it a general description for the task, then have it ask me as many questions as possible to refine user stories and requirements. It generally comes up with a dozen or more of questions, of which multiples I would've not thought about and found out much later. Time: between 5 and 40 minutes. It's very detailed.

- step B: after we refine the requirements (functional and non functional) we write together a todo plan as featureA-todo.md. I refine the plan again, this is generally shorter than the requirements and I'm generally done in less than 10 minutes.

- step C: implementation phase. Again the AI does most of the job, I correct it at each edit and point flaws. Are there cases where I would've done that faster? Maybe. I can still jump in the editor and do the changes I want. This step in general includes comprehensive tests for all the requirements and edge cases we have found in step A, both functional, integration and E2Es. This really varies but it is generally highly tied to the quality of phase A and B. It can be as little as few minutes (especially true when we indeed come up with the most effective plan) and as much as few hours.

- step D: documentation and PR description. With all of this context (in requirements and todos) at this point updating any relevant documentation and writing the PR description is a very short experiment.

In all of that: I have textual files with precise coding style guidelines, comprehensive readmes to give precise context, etc that get referenced in the context.

Bottom line: you might be doing something profoundly wrong, because in my case, all of this planning, requirements gathering, testing, documenting etc is pushing me to deliver a much higher quality engineering work.

mcintyre1994 · 6 months ago

You’d probably like Kiro, it seems to be built specifically for this sort of spec-driven development.

drums8787 · 6 months ago

My experience is the opposite I guess. I am having a great time using claude to quickly implement little "filler features" that require a good amount of typing and pulling from/editing different sources. Nothing that requires much brainpower beyond remembering the details of some sub system, finding the right files, and typing.

Once the code is written, review, test and done. And on to more fun things.

Maybe what has made it work is that these tasks have all fit comfortably within existing code patterns.

My next step is to break down bigger & more complex changes into claude friendly bites to save me more grunt work.

unlikelytomato · 6 months ago

I wish I shared this experience. There are virtually no filter features for me to work on. When things feel like filler on my team, it's generally a sign of tech debt and we wouldn't want to have it generate all the code it would take. What are some examples of filler features for you?

On the other hand, it does cost me about 8 hours a week debugging issues created by bad autocompletes from my team. The last 6 months have gotten really bad with that. But that is a different issue.

flowerthoughts · 6 months ago

I predict microservices will get a huge push forward. The question then becomes if we're good enough at saying "Claude, this is too big now, you have to split it in two services" or not.

If LLMs maintain the code, the API boundary definitions/documentation and orchestration, it might be manageable.

urbandw311er · 6 months ago

Why not just cleanly separated code in a single execution environment? No need to actually run the services in separate execution environments just for the sake of an LLM being able to parse it, that’s crazy! You can just give it the files or folders it needs for the particular services within the project.

Obviously there’s still other reasons to create micro services if you wish, but this does not need to be another reason.

fsloth · 6 months ago

Why microservices? Monoliths with code-golfed minimal implementation size (but high quality architecture) implemented in strongly typed language would consume far less tokens (and thus would be cheaper to maintain).

arwhatever · 6 months ago

Won’t this cause [insert LLM] to lose context around the semantics of messages passed between microservices?

You could then put all services in 1 repo, or point LLM at X number of folders containing source for all X services, but then it doesn’t seem like you’ll have gained anything, and at the cost of added network calls and more infra management.

stpedgwdgfhgdd · 6 months ago

For enterprise software development CC is definitely there. 100k Go code paas platform, micro services architecture, mono repo is manageable.

The prompt needs to be good, but in plan mode it will iteratively figure it out.

You need to have automated tests. For enterprise software development that actually goes without saying.

dclowd9901 · 6 months ago

It also steps right over easy optimizations. I was doing a query on some github data (tedious work) and rather than preliminarily filter down using the graphql search method, it wanted to comb through all PRs individually. This seems like something it probably should have figured out.

mnky9800n · 6 months ago

Yea that’s right. It’s kind of annoying how useful it is for hobby projects and it is suddenly useless on anything at work. Haha. I love Claude code for some stuff (like generating a notebook to analyse some data). But it really just disconnects you from the problem you are solving without you going through everything it writes. And I’m really bullish on ai coding tools haha, for example:

https://open.substack.com/pub/mnky9800n/p/coding-agents-prov...

johnisgood · 6 months ago

> but for enterprise level code it is not there

It is good for me in Go but I had to tell it what to write and how.

sdesol · 6 months ago

I've been able to create a very advanced search engine for my chat app that is more than enterprise ready. I've spent a decade thinking about search, but in a different language. Like you, I needed to explain what I knew about writing a search engine in Java for the LLM, to write it in JavaScript using libraries I did not know and it got me 95% of the way there.

It is also incredibly important to note that the 5% that I needed to figure out was the difference between throw away code and something useful. You absolutely need domain knowledge but LLMs are more than enterprise ready in my opinion.

Here is some documentation on how my search solution is used in my app to show that it is not a hobby feature.

https://github.com/gitsense/chat/blob/main/packages/chat/wid...

tonyhart7 · 6 months ago

it depends on model but sonnet is more than capable for enterprise code

when you stuck at claude doing dumb shit, you didnt give the model enough context to know better the system

after following spec driven development, works with LLM in large code base make it so much easier than without it like its heaven and hell differences

but also it increase in token cost exponentially, so there's that

fpauser · 6 months ago

Same conclusion here. Also good for analyzing existing codebases and to generate documentation for undocumented projects.

j45 · 6 months ago

It's quite good at this, I have been tying in Gemini Pro with this too.

amelius · 6 months ago

It is very useful for small tasks like fixing network problems, or writing regexp patterns based on a few examples.

MarcelOlsz · 6 months ago

Here's how YOU can save $200/mo!

risyachka · 6 months ago

Pretty much my experience too.

I usually go to option 2 - just write it by myself as it is same time-wise but keeps skills sharp.

fpauser · 6 months ago

Not to degenerate is really challenging these days. There are the bubbles that simulate multiple realities to us and try to untrain us logic thinking. And there are the llms that try to convice us that self thinking is unproductive. I wonder when this digitalophily suddenly turns into digitalophobia.

therealpygon · 6 months ago

I mostly agree, with the caveat that I would say it can certainly be useful when used appropriately as an “assistant”. NOT vibe coding blindly and hoping what I end up with is useful. “Implement x specific thing” (e.g. add an edit button to component x), not “implement a whole new epic feature that includes changes to a significant number of files”. Imagine meeting a house builder and saying “I want a house”, then leaving and expecting to come back to exactly the house you dreamed of.

I get why, it’s a test of just how intuitive the model can be at planning and execution which drives innovation more than 1% differences in benchmarks ever will. I encourage that innovation in the hobby arena or when dogfooding your AI engineer. But as a replacement developer in an enterprise where an uncaught mistake could cost millions? No way. I wouldn’t even want to be the manager of the AI engineering team, when they come looking for the only real person to blame for the mistake not being caught.

For additional checks/tasks as a completely extra set of eyes, building internal tools, and for scripts? Sure. It’s incredibly useful with all sorts of non- application development tasks. I’ve not written a batch or bash script in forever…you just don’t really need to much anymore. The linear flow of most batch/bash/scripts (like you mentioned) couldn’t be a more suitable domain.

Also, with a basic prompt, it can be an incredibly useful rubber duck. For example, I’ll say something like “how do you think I should solve x problem”(with tools for the codebase and such, of course), and then over time having rejected and been adversarial to every suggestion, I end up working through the problem and have a more concrete mental design. Think “over-eager junior know-it-all that tries to be right constantly” without the person attached and you get a better idea of what kind of LLM output you can expect including following false leads to test your ideas. For me it’s less about wanting a plan from the LLM, and more about talking through the problems I think my plan could solve better, when more things are considered outside the LLMs direct knowledge or access.

“We can’t do that, changing X would break Y external process because Z. Summarize that concern into a paragraph to be added to the knowledge base. Then, what other options would you suggest?”