Ask HN: Teams using AI – how do you prevent it from breaking your codebase?

I generally use AI as an auto complete and rarely have it write features.

In general approach it expecting to explain everything and evaluate whether it makes more sense to use it or just do it yourself. Often the second option is much faster due to muscle memory and keyboard proficiency.

Even when it comes to boilerplate it will not respect the standards (unless you throw more files in the context) so you need to be specific. In cursor you can give more files to the context in all the chats I believe (except the simple one in the current editor window) and it will do a better job.

I think too many people treat AI as a junior coder that has been exposed to the business/practices and can give him short sentences and it will understand the task, but no, it's as good as detailed is your input (which is often not worth the hassle).

In cursor you can save prompt templates in composer by the way.

That being said there are situations where LLMs can severely outperform us. An example is maintenance of legacy code.

E.g. Cursor with Claude very good at is explaining you code, the less understandable it is, the more it shines. I don't know if you've noticed but LLMs can de-obfuscate obfuscated code with ease and can make poorly written code understandable rather quick.

I've entered an 800 lines of code function that computed the final price of configurations the other day (imagine configuring a car or other items in an e-commerce) and it was impossible to understand without a very huge multi-day effort. Too many business features got glued together all in a single giant body and I was able to both refactor it (find a better name for this, document that, explain this, refactor this block to a separate function, suggest improvements and so on).

Another great use case is learning new tools/languages. Didn't use Python for a decade and I quickly setup a Jupyter Notebook for financial simulations without having to really understand the language (you can simply ask it).

AI is not limited to what we can keep in mind at the same time (between 4 or 6 informations in our short-term memory) 800 lines of context all together is nothing and you can quickly iterate over such code.

Don't misplace your expectations about LLMs, they are a tool, it makes experienced engineers shine when used for the right purpose, but you have to understand the limitations as for any other tool out there.

I use aider to work on the aider code base, which is approaching 30k lines of python. Aider writes about 70% of the new code in each release [0]. So it's doing some fairly heavy lifting in a non-trivial code base.

Some pragmatic tips:

- Work with AI like a junior developer, except with unlimited energy and no problem being corrected repeatedly.

- Provide the AI with guidance about conventions you expect in your code base, overall architecture, etc [1]. You can often use AI tools to help write the first draft of such "conventions" documents.

- Break the work down into self-contained bite sized steps. Don't ask AI to boil the ocean in one iteration. Ask it to make a sequence of changes that each move the code towards the goal.

- Be willing to explore for a few steps with the AI. If it's going sideways, undo/revert. Hopefully your AI tool has good checkpoint/undo/git support [2].

- Lint and test the code after each AI change. Hopefully your AI tool can automatically do that, and fix problems [3].

- If the AI is stuck, just code yourself until you get past the tricky part. Then resume AI coding when the going is easier.

- Build intuition for what AI tools are good at, use them when they're helpful. Code yourself when not.

Some things AI coding is very helpful for:

- Rough first draft of a change or feature. AI can often speed through a bunch of boilerplate. Then you can polish the final touches.

- Writing tests.

- Fixing fairly simple bugs.

- Efficiently solving problems with packages/libraries you may not have known about.

[0] https://aider.chat/HISTORY.html

[1] https://aider.chat/docs/usage/conventions.html

[2] https://aider.chat/docs/git.html

[3] https://aider.chat/docs/usage/lint-test.html

alireza94 · 7 months ago

i know it might be too much to ask, but can you record one of your coding session and share it on youtube? like, one hour session using aider to implement some features for aider.

zith · 7 months ago

When using aider for development, do you typically review the generated diffs in aider itself or do you use an external tool?

Trying to figure out what a good workflow looks like.

anotherpaulg · 7 months ago

Mostly I just review the chat output. But I also use /diff to see actual unified diffs of the last AI change. Or if I have made a sequence of changes, I'll use my normal git tools to diff the entire branch, etc.

nzach · 7 months ago

> How often do you catch AI trying to make breaking changes?

I think this is the wrong way to think about the problem. The AI isn't breaking your code, the engineers are. AI is just a tool, and you should always keep this in mind when talking about this sort of thing.

But I understand some of the problems you describe while using AI. And the way I work around is to never let the AI write more than a few lines at a time, ideally it should just complete the line I started to type.

from-nibly · 7 months ago

100% ownership is required. Blaming AI is a craftsman blaming the tools. The craftsman is the one that buys/makes the tools, and maintains them. Things can not have responsibility only people can.

codazoda · 7 months ago

That’s what I used to do but now I think it is a major handicap. I haven’t perfected it yet, but I’ve had good luck letting AI write entire (smallish) programs and do major updates to those.

Volundr · 7 months ago

I find this fascinating because even using the last models more often than not asking the AI to generate even a test file for a simple module with some controlled state will result in an incorrect test setup that simply won't run, or nonsensical tests that clearly don't reflect what the code is doing.

I often use a coding assistant to handle boilerplate for me or write additional tests after I've given it some correct ones to work of off, but when I read accounts like these I always wonder what's different about our experiences that you feel you can give it bigger asks like that.

sshine · 7 months ago

> The AI isn't breaking your code, the engineers are. AI is just a tool,

Teams with CAPS LOCK on their keyboards, HOW DO YOU PREVENT THIS FROM GOING OUT OF CONTROL?

(A common trick is to unbind CAPS LOCK forcing developers to hold down shift while typing uppercase letters.)

epolanski · 7 months ago

secult · 7 months ago

IMO the biggest issue is that instead of reviewing the code of your colleagues, you review some random generated stuff. You know what kind of code you can expect from your colleagues, not anymore. Also you expect that code reviews promote knowledge and consistency amongst the team and helps them to become better in programming. Not anymore either.

aerhardt · 7 months ago

There is an implication here that you're letting AI commit to your codebase without reviewing it first. Or that humans are committing AI generated code but without reviewing it first. Is that the case?

Because if it is, the answer seems fairly simple: you install a proper code review process, and you don't allow shit code into your codebase.

IMTDb · 7 months ago

AI does not commit code; people do. The origin of the code does not matter, processes remain the same. So:

1. We never catch AI trying to make breaking changes, but we catch developers who do. Since using AI tools we haven’t seen a huge change in those patterns.

2. Prior to opening a PR; developers are now spending more time reviewing code instead of writing it. During the code review process, we use AI to highlight potential issues faster.

3. Human in the middle

bradhe · 7 months ago

You’re asking the wrong questions. No one who has and idea what they’re doing has AI wholesale write new features of add new code in an unsupervised way.

kgilpin · 7 months ago

I wrote a SWE bench solver. The SWE bench issues are on mature projects like Django.

The objective of my solver was to get good solutions using only RAG (no embeddings) and with minimal cost (low token count).

Three techniques, combined, yielded good results. The first was to take a TDD approach, first generating a test and then requiring the LLM to pass the test (without failing others). It can also trace the test execution to see exactly what code participates in the feature.

The second technique was to separate “planning” from “coding”. The planner is freed from implementation details, and can worry more about figuring out which files to change, following existing code conventions, not duplicating code, etc. In the coding phase, the LLM is working from a predefined plan, and has little freedom to deviate. It just needs to create a working, lint-free implementation.

The third technique was a gentle pressure on the solver to make small changes in a minimum number of files (ideally, one).

AI coding tools today generally don’t incorporate any of this. They don’t favor TDD, they don’t have a bias towards making minimal changes, and they don’t work from a pre-approved design.

Good human developers do these things, and this is a pretty wide gap between adept human coders and AI.