Meh. To be fair to the models, I would have at least flagged some of the same things to actually check myself, if asked to review. What OP can do is task the models to actually break into it. Have it running in a container, have the agents come up with, implement and actually run tests for this. And only report successful attacks. Things might change.
That may be a sensible approach, but in practice there are many companies, including Github, that offer AI reviews today - and they have the same general quality as author's examples, that is "mostly junk, with occasional completely incorrect code and very rare actual problem"
(This depends on your field too.. I suspect that if you are doing Yet Another Social Media in Typescript the reviews would be better; but once you step off the beaten path, quality drops rapidly)
(This depends on your field too.. I suspect that if you are doing Yet Another Social Media in Typescript the reviews would be better; but once you step off the beaten path, quality drops rapidly)