the_harpia_io (u/the_harpia_io)

the_harpia_io commented on Matchlock – Secures AI agent workloads with a Linux-based sandbox github.com/jingkaihe/matc... · Posted by u/jingkai_he

jingkai_he · 16 hours ago

It tried a lot of things relentlessly, just to name a few:

* Exploit kernel CVEs * Weaponise gcc, crafting malicious kernel modules; forging arbitrary packets to spoof the source address that bypass tcp/ip * Probing metadata service * Hack bpf & io uring * A lot of mount escape attempts, network, vsock scanning and crafting

As a non security researcher it was mind blown to see what it did, which in the hindsight isn't surprising as Opus 4.6 hits 93% solve rate on Cybench - https://cybench.github.io/

the_harpia_io · 15 hours ago

that's wild - weaponizing gcc to craft kernel modules is not something I'd expect from automated testing. most fuzzing stops at syscall-level probes but this is full exploit chain development.

the metadata service probing is particularly concerning because that's the classic cloud escape path. if you're running this in aws/gcp and the agent figures out IMDSv1 is reachable, game over. vsock scanning too - that's targeting the host-guest communication channel directly.

93% on cybench is genuinely scary when you think about what it means. it's not just finding known CVEs, it's systematically exploring the attack surface like a skilled pentester would. and unlike humans, it doesn't get tired or skip the boring enumeration steps. did you find it tried timing attacks or side channels at all? or was it mostly direct exploitation?

the_harpia_io commented on Matchlock – Secures AI agent workloads with a Linux-based sandbox github.com/jingkaihe/matc... · Posted by u/jingkai_he

ushakov · 21 hours ago

just from looking at it

on Linux it runs Firecracker: https://github.com/jingkaihe/matchlock/blob/main/pkg/vm/linu...

on macOS uses the Apple's Virtualization.Framework Go wrapper: https://github.com/jingkaihe/matchlock/blob/main/pkg/vm/darw...

the_harpia_io · 20 hours ago

nice - I was wondering about the cross-platform story. firecracker on linux for the isolation, virtualization.framework on mac so you don't need vmware.

the_harpia_io commented on Matchlock – Secures AI agent workloads with a Linux-based sandbox github.com/jingkaihe/matc... · Posted by u/jingkai_he

jingkai_he · 20 hours ago

Creator of matchlock here. Great questions, here's how matchlock handles these:

The guest-agent (pid-1) spawns commands in a new pid + mount namespace (similar to firecracker jailer but in the inner level for the purpose of macos support). In non-privileged mode it drops SYS_PTRACE, SYS_ADMIN, etes from the bounding set, sets `no_new_privs`, then installs a seccomp-BPF filter that eperms proces vm readv/writev, ptrace kernel load. The microVM is the real isolation boundary — seccomp is defense in depth. That said there is a `--privileged` flag that allows that to be skipped for the purpose of image build using buildkit.

Whether pip install works is entirely up to the OCI image you pick. If it has a package manager and you've allowed network access, go for it. The whole point is making `claude --dangerously-skip-permissions` style usage safe.

Personally I've had agents perform red team type of breakout. From my first hand experience what the agent (opus 4.6 with max thinking) will exploit without cap drops and seccomps is genuinely wild.

the_harpia_io · 20 hours ago

defense in depth makes sense - microVM as the boundary, seccomp as insurance. most docs treat seccomp like it's the whole story which is... optimistic.

the opus 4.6 breakouts you mentioned - was it known vulns or creative syscall abuse? agents are weirdly systematic about edge cases compared to human red teamers. they don't skip the obvious stuff.

--privileged for buildkit tracks - you gotta build the images somewhere.

the_harpia_io commented on DoNotNotify is now Open Source donotnotify.com/opensourc... · Posted by u/awaaz

the_harpia_io · 21 hours ago

the shame around vibe-coded stuff is real but honestly - most of the code out there wouldn't survive scrutiny either, AI-generated or not. the difference is that vibe coding fails in predictable patterns. weirdly verbose error handling that doesn't actually handle the error, auth flows that work great until you send a malformed header, things like that.

for notifications specifically, the risky bits would be: what happens if an app sends a notification payload that's malformed or huge, how do you handle permission checks if the notification system process restarts mid-filtering, and whether the filtering rules can be bypassed by crafting notifications with weird mime types or encoded text.

if you wrote tests for those edge cases (or even just thought through them), you're already ahead of 90% of shipped code, vibe-coded or not. the scrutiny you're worried about is actually healthy - peer review catches stuff automated tools miss.

the_harpia_io commented on Matchlock – Secures AI agent workloads with a Linux-based sandbox github.com/jingkaihe/matc... · Posted by u/jingkai_he

the_harpia_io · 21 hours ago

containers are fine for basic isolation but the attack surface is way bigger than people think. you're still trusting the container runtime, the kernel, and the whole syscall interface. if the agent can call arbitrary syscalls inside the container, you're one kernel bug away from a breakout.

what I'm curious about with matchlock - does it use seccomp-bpf to restrict syscalls, or is it more like a minimal rootfs with carefully chosen binaries? because the landlock LSM stuff is cool but it's mainly for filesystem access control. network access, process spawning, that's where agents get dangerous.

also how do you handle the agent needing to install dependencies at runtime? like if claude decides it needs to pip install something mid-task. do you pre-populate the sandbox or allow package manager access?

the_harpia_io commented on Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory github.com/localgpt-app/l... · Posted by u/yi_wang

avoutic · a day ago

Hitting production APIs (and email) is my main concern with all agents I run.

To solve this I've built Wardgate [1], which removes the need for agents to see any credentials and has access control on a per API endpoints basis. So you can say: yes you can read all Todoist tasks but you can't delete tasks or see tasks with "secure" in them, or see emails outside Inbox or with OTP codes, or whatever.

Interested in any comments / suggestions.

[1] https://github.com/wardgate/wardgate

the_harpia_io · a day ago

this is a clever approach - credential-less proxying with scoped permissions is way cleaner than trying to teach the model what not to do. how do you handle dynamic auth flows though? like if an API returns a short-lived token that needs to be refreshed, does wardgate intercept and cache those or do you expose token refresh as a separate controlled endpoint?

and I'm curious about the filtering logic - is it regex on endpoint paths or something more semantic? because the "tasks with secure in them" example makes me think there's some content inspection happening, not just URL filtering.

the_harpia_io commented on Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory github.com/localgpt-app/l... · Posted by u/yi_wang

the_harpia_io · a day ago

this is really cool - the single binary thing solves a huge pain point I have with OpenClaw. I love that tool but the Node + npm dependency situation is a lot.

curious: when you say compatible with OpenClaw's markdown format, does that mean I could point LocalGPT at an existing OpenClaw workspace and it would just work? or is it more 'inspired by' the format?

the local embeddings for semantic search is smart. I've been using similar for code generation and the thing I kept running into was the embedding model choking on code snippets mixed with prose. did you hit that or does FTS5 + local embeddings just handle it?

also - genuinely asking, not criticizing - when the heartbeat runner executes autonomous tasks, how do you keep the model from doing risky stuff? hitting prod APIs, modifying files outside workspace, etc. do you sandbox or rely on the model being careful?

the_harpia_io commented on Monty: A minimal, secure Python interpreter written in Rust for use by AI github.com/pydantic/monty... · Posted by u/dmpetrov

zahlman · 2 days ago

My understanding is that "the class restriction" isn't trying to implement any kind of security boundary — they just haven't managed to implement support yet.

the_harpia_io · 2 days ago

ah that makes sense - I was reading too much into it as a deliberate security trade-off. makes way more sense as a "not implemented yet" thing. thanks for clarifying.

the_harpia_io commented on Monty: A minimal, secure Python interpreter written in Rust for use by AI github.com/pydantic/monty... · Posted by u/dmpetrov

the_harpia_io · 2 days ago

the papercut argument jstanley made is valid but there's a flip side - when you're running AI-generated code at scale, every capability you give it is also a capability that malicious prompts can exploit. the real question isn't whether restrictions slow down the model (they do), it's whether the alternative - full CPython with file I/O, network access, subprocess - is something you can safely give to code written by a language model that someone else is prompting.

that said, the class restriction feels weird. classes aren't the security boundary. file access, network, imports - that's where the risk is. restricting classes just forces the model to write uglier code for no security gain. would be curious if the restrictions map to an actual threat model or if it's more of a "start minimal and add features" approach.

the_harpia_io commented on Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust github.com/artifact-keepe... · Posted by u/bsgeraci

bsgeraci · 3 days ago

I am totally thinking about adding this so you can connect to an API or use self hosted models that run in a container if you have the resources!!!! You are spot on.

the_harpia_io · 2 days ago

makes sense - if folks can bring their own model, they can fine-tune detection for whatever code patterns matter to them. the auth edge cases I mentioned (malformed token handling, middleware ordering) would be way easier to catch with a model trained on actual vulnerable examples than trying to write regex rules for every variant.