I have OPA and set policies on each tool I provide at the gateway level. It makes this stuff way easier.
I have OPA and set policies on each tool I provide at the gateway level. It makes this stuff way easier.
title.replace(/(i now assume that |on apple news )/ig, '')I work on internal LLM tooling for a F100 at $DAYJOB and was nodding vigorously while reading this, especially when it comes to things like letting users freely switch between models, and the affordances you need to be able to provide good UX around streaming and tool calling, which seem barely thought-out in things like the MCP spec (which at least now has a way to get friendly display names for tools since the last time I looked at it).
We spend a ton of time looking at the code and blocking merges, and the end result is still full of bugs. AI code review only provides a minor improvement. The only reason we do code review at all is humans don't trust that the code works. Know another way to tell if code works? Running it. If our code is so utterly inconceivable that we can't make tests that can accurately assess if the code works, then either our code design is too complicated, or our tests suck.
OTOH, if the reason you're doing code review is to ensure the code "is beautiful" or "is maintainable", again, this is a human concern; the AI doesn't care. In fact, it's becoming apparent that it's easier to replace entire sections of code with new AI generated code than to edit it.
No amount of telling the LLM to "Dig up! Make no mistakes!" will help with non-designed slop code actively poisoning the context, but you have to admire the attempt when you see comments added while removing code, referring to the code that's being removed.
It's weird to see tickets now effectively go from "ready for PR" to 0% progress, but at least you're helping that person meet whatever the secret AI* usage quota is for their performance review this year.
Somebody needs to explain to my lying eyes where these 100xers are hiding. They seem to live in comments on the internet, but I'm not seeing the teams around me increase their output by two orders of magnitude.