the opus 4.6 breakouts you mentioned - was it known vulns or creative syscall abuse? agents are weirdly systematic about edge cases compared to human red teamers. they don't skip the obvious stuff.
--privileged for buildkit tracks - you gotta build the images somewhere.
* Exploit kernel CVEs * Weaponise gcc, crafting malicious kernel modules; forging arbitrary packets to spoof the source address that bypass tcp/ip * Probing metadata service * Hack bpf & io uring * A lot of mount escape attempts, network, vsock scanning and crafting
As a non security researcher it was mind blown to see what it did, which in the hindsight isn't surprising as Opus 4.6 hits 93% solve rate on Cybench - https://cybench.github.io/
Where do the images come from? What are our options around that and also using custom images etc?
You can also build image with `matchlock build -f Dockerfile -t foo:bar .` - Under the hood it builds the image using buildkit inside the microvm.