For teams actively using AI coding assistants (Copilot, Cursor, Windsurf, etc.), I'm noticing a frustrating pattern: the more complex your codebase, the more time developers spend preventing AI from breaking things.
Some common scenarios I keep running into:
* AI suggests code that completely ignores existing patterns
* It recreates components we already have
* It modifies core architecture without understanding implications
* It forgets critical context from previous conversations
* It needs constant reminders about our tech stack decisions
For engineering teams using AI tools:
1. How often do you catch AI trying to make breaking changes?
2. How much time do you spend reviewing/correcting AI suggestions?
3. What workflows have you developed to prevent AI from going off track?
Particularly interested in experiences from teams with mature codebases.
I think this is the wrong way to think about the problem. The AI isn't breaking your code, the engineers are. AI is just a tool, and you should always keep this in mind when talking about this sort of thing.
But I understand some of the problems you describe while using AI. And the way I work around is to never let the AI write more than a few lines at a time, ideally it should just complete the line I started to type.
I often use a coding assistant to handle boilerplate for me or write additional tests after I've given it some correct ones to work of off, but when I read accounts like these I always wonder what's different about our experiences that you feel you can give it bigger asks like that.
Teams with CAPS LOCK on their keyboards, HOW DO YOU PREVENT THIS FROM GOING OUT OF CONTROL?
(A common trick is to unbind CAPS LOCK forcing developers to hold down shift while typing uppercase letters.)
In general approach it expecting to explain everything and evaluate whether it makes more sense to use it or just do it yourself. Often the second option is much faster due to muscle memory and keyboard proficiency.
Even when it comes to boilerplate it will not respect the standards (unless you throw more files in the context) so you need to be specific. In cursor you can give more files to the context in all the chats I believe (except the simple one in the current editor window) and it will do a better job.
I think too many people treat AI as a junior coder that has been exposed to the business/practices and can give him short sentences and it will understand the task, but no, it's as good as detailed is your input (which is often not worth the hassle).
In cursor you can save prompt templates in composer by the way.
That being said there are situations where LLMs can severely outperform us. An example is maintenance of legacy code.
E.g. Cursor with Claude very good at is explaining you code, the less understandable it is, the more it shines. I don't know if you've noticed but LLMs can de-obfuscate obfuscated code with ease and can make poorly written code understandable rather quick.
I've entered an 800 lines of code function that computed the final price of configurations the other day (imagine configuring a car or other items in an e-commerce) and it was impossible to understand without a very huge multi-day effort. Too many business features got glued together all in a single giant body and I was able to both refactor it (find a better name for this, document that, explain this, refactor this block to a separate function, suggest improvements and so on).
Another great use case is learning new tools/languages. Didn't use Python for a decade and I quickly setup a Jupyter Notebook for financial simulations without having to really understand the language (you can simply ask it).
AI is not limited to what we can keep in mind at the same time (between 4 or 6 informations in our short-term memory) 800 lines of context all together is nothing and you can quickly iterate over such code.
Don't misplace your expectations about LLMs, they are a tool, it makes experienced engineers shine when used for the right purpose, but you have to understand the limitations as for any other tool out there.
Some pragmatic tips:
- Work with AI like a junior developer, except with unlimited energy and no problem being corrected repeatedly.
- Provide the AI with guidance about conventions you expect in your code base, overall architecture, etc [1]. You can often use AI tools to help write the first draft of such "conventions" documents.
- Break the work down into self-contained bite sized steps. Don't ask AI to boil the ocean in one iteration. Ask it to make a sequence of changes that each move the code towards the goal.
- Be willing to explore for a few steps with the AI. If it's going sideways, undo/revert. Hopefully your AI tool has good checkpoint/undo/git support [2].
- Lint and test the code after each AI change. Hopefully your AI tool can automatically do that, and fix problems [3].
- If the AI is stuck, just code yourself until you get past the tricky part. Then resume AI coding when the going is easier.
- Build intuition for what AI tools are good at, use them when they're helpful. Code yourself when not.
Some things AI coding is very helpful for:
- Rough first draft of a change or feature. AI can often speed through a bunch of boilerplate. Then you can polish the final touches.
- Writing tests.
- Fixing fairly simple bugs.
- Efficiently solving problems with packages/libraries you may not have known about.
[0] https://aider.chat/HISTORY.html
[1] https://aider.chat/docs/usage/conventions.html
[2] https://aider.chat/docs/git.html
[3] https://aider.chat/docs/usage/lint-test.html
Trying to figure out what a good workflow looks like.
Because if it is, the answer seems fairly simple: you install a proper code review process, and you don't allow shit code into your codebase.
1. We never catch AI trying to make breaking changes, but we catch developers who do. Since using AI tools we haven’t seen a huge change in those patterns.
2. Prior to opening a PR; developers are now spending more time reviewing code instead of writing it. During the code review process, we use AI to highlight potential issues faster.
3. Human in the middle
The objective of my solver was to get good solutions using only RAG (no embeddings) and with minimal cost (low token count).
Three techniques, combined, yielded good results. The first was to take a TDD approach, first generating a test and then requiring the LLM to pass the test (without failing others). It can also trace the test execution to see exactly what code participates in the feature.
The second technique was to separate “planning” from “coding”. The planner is freed from implementation details, and can worry more about figuring out which files to change, following existing code conventions, not duplicating code, etc. In the coding phase, the LLM is working from a predefined plan, and has little freedom to deviate. It just needs to create a working, lint-free implementation.
The third technique was a gentle pressure on the solver to make small changes in a minimum number of files (ideally, one).
AI coding tools today generally don’t incorporate any of this. They don’t favor TDD, they don’t have a bias towards making minimal changes, and they don’t work from a pre-approved design.
Good human developers do these things, and this is a pretty wide gap between adept human coders and AI.