Readit News logoReadit News
dolftax commented on Show HN: Autofix Bot – Hybrid static analysis and AI code review agent    · Posted by u/sanketsaurav
dlahoda · 2 days ago
we use rust, sql, typescript. how statically covered these?
dolftax · 2 days ago
All three covered — TypeScript, Rust, and SQL[1].

[1] https://deepsource.com/directory

dolftax commented on Show HN: Autofix Bot – Hybrid static analysis and AI code review agent    · Posted by u/sanketsaurav
tarun_anand · 2 days ago
Congratulations!! Anchoring is important. What about other parts of the code review like coding guidelines, perf issues etc?
dolftax · 2 days ago
We flag performance issues today alongside security and code quality. We're working on respecting AGENTS.md, detecting code complexity (AI generated code tends toward verbose, tangled logic), and letting users/teams define custom coding guidelines.
dolftax commented on Show HN: Autofix Bot – Hybrid static analysis and AI code review agent    · Posted by u/sanketsaurav
_pdp_ · 2 days ago
What is the difference between this and let's say Claude Code using something like semgrep as a tool?

Also I don't think this tool should be in the developer flow as in my experience it is unlikely to run it on the regular. It should be something that is done as part of the QA process before PR acceptance.

I hope this helps and good luck.

dolftax · 2 days ago
On the OpenSSF CVE Benchmark[1], Semgrep CE hits 56.97% accuracy vs our 81.21%, and nearly 3x higher recall (75.61% vs 26.83%).

On when to run it, fair point. Autofix Bot is currently meant for local use (TUI, Claude Code plugin, MCP). We're integrating this pipeline into DeepSource[2], which will have inline comments in pull requests, that fits the QA/pre-merge flow you're describing.

That said, if you're using AI agents to write code, running it at checkpoints locally keeps feedback tight.

Thanks for the feedback!

[1] https://github.com/ossf-cve-benchmark/ossf-cve-benchmark

[2] https://deepsource.com/

dolftax commented on The highest quality codebase   gricha.dev/blog/the-highe... · Posted by u/Gricha
xnorswap · 3 days ago
Claude is really good at specific analysis, but really terrible at open-ended problems.

"Hey claude, I get this error message: <X>", and it'll often find the root cause quicker than I could.

"Hey claude, anything I could do to improve Y?", and it'll struggle beyond the basics that a linter might suggest.

It suggested enthusiastically a library for <work domain> and it was all "Recommended" about it, but when I pointed out that the library had been considered and rejected because <issue>, it understood and wrote up why that library suffered from that issue and why it was therefore unsuitable.

There's a significant blind-spot in current LLMs related to blue-sky thinking and creative problem solving. It can do structured problems very well, and it can transform unstructured data very well, but it can't deal with unstructured problems very well.

That may well change, so I don't want to embed that thought too deeply into my own priors, because the LLM space seems to evolve rapidly. I wouldn't want to find myself blind to the progress because I write it off from a class of problems.

But right now, the best way to help an LLM is have a deep understanding of the problem domain yourself, and just leverage it to do the grunt-work that you'd find boring.

dolftax · 3 days ago
The structured vs open-ended distinction here applies to code review too. When you ask an LLM to "find issues in this code", it'll happily find something to say, even if the code is fine. And when there are actual security vulnerabilities, it often gets distracted by style nitpicks and misses the real issues.

Static analysis has the opposite problem - very structured, deterministic, but limited to predefined patterns and overwhelms you in false positives.

The sweet spot seems to be to give structure to what the LLM should look for, rather than letting it roam free on an open-ended "review this" prompt.

We built Autofix Bot[1] around this idea.

[1] https://autofix.bot (disclosure: founder)

dolftax commented on Show HN: Narada – Open-source secrets classification model    · Posted by u/sanketsaurav
micksmix · 2 months ago
I'm curious how Kingfisher would do against the proprietary dataset: https://github.com/mongodb/kingfisher

Any chance you could try and share results? Full disclosure, I built Kingfisher

dolftax · a month ago
Jai here, from Autofix Bot team. We've published results of the initial benchmark run[1] comparing Gitleaks, detect-secrets and trufflehog ~3 weeks ago. In the meantime, we've put together a significantly improved dataset, and we're planning to rerun those benchmarks shortly; will include Kingfisher to the list, and share the results here.

Btw, we use Kingfisher's validation system internally for generating request/expected_response pairs for a given secret, as the last step of the pipeline. We don't run/call the validation queries ourselves, due to rate limit issues. But, we add this information in a structured format as part of the response which can be executed on the client side (or) by the user who is integrating via the API. Thanks for building it :)

[1] https://autofix.bot/benchmarks/#benchmarks-secrets-detection

u/dolftax

KarmaCake day1300January 12, 2015
About
Building DeepSource & Autofix Bot.

https://x.com/dolftax

View Original