Also I don't think this tool should be in the developer flow as in my experience it is unlikely to run it on the regular. It should be something that is done as part of the QA process before PR acceptance.
I hope this helps and good luck.
On when to run it, fair point. Autofix Bot is currently meant for local use (TUI, Claude Code plugin, MCP). We're integrating this pipeline into DeepSource[2], which will have inline comments in pull requests, that fits the QA/pre-merge flow you're describing.
That said, if you're using AI agents to write code, running it at checkpoints locally keeps feedback tight.
Thanks for the feedback!
[1] https://github.com/ossf-cve-benchmark/ossf-cve-benchmark
"Hey claude, I get this error message: <X>", and it'll often find the root cause quicker than I could.
"Hey claude, anything I could do to improve Y?", and it'll struggle beyond the basics that a linter might suggest.
It suggested enthusiastically a library for <work domain> and it was all "Recommended" about it, but when I pointed out that the library had been considered and rejected because <issue>, it understood and wrote up why that library suffered from that issue and why it was therefore unsuitable.
There's a significant blind-spot in current LLMs related to blue-sky thinking and creative problem solving. It can do structured problems very well, and it can transform unstructured data very well, but it can't deal with unstructured problems very well.
That may well change, so I don't want to embed that thought too deeply into my own priors, because the LLM space seems to evolve rapidly. I wouldn't want to find myself blind to the progress because I write it off from a class of problems.
But right now, the best way to help an LLM is have a deep understanding of the problem domain yourself, and just leverage it to do the grunt-work that you'd find boring.
Static analysis has the opposite problem - very structured, deterministic, but limited to predefined patterns and overwhelms you in false positives.
The sweet spot seems to be to give structure to what the LLM should look for, rather than letting it roam free on an open-ended "review this" prompt.
We built Autofix Bot[1] around this idea.
[1] https://autofix.bot (disclosure: founder)
Any chance you could try and share results? Full disclosure, I built Kingfisher
Btw, we use Kingfisher's validation system internally for generating request/expected_response pairs for a given secret, as the last step of the pipeline. We don't run/call the validation queries ourselves, due to rate limit issues. But, we add this information in a structured format as part of the response which can be executed on the client side (or) by the user who is integrating via the API. Thanks for building it :)
[1] https://autofix.bot/benchmarks/#benchmarks-secrets-detection
[1] https://deepsource.com/directory