I really don't understand why people have all these "lightweight" ways of sandboxing agents. In my view there are two models:
- totally unsandboxed but I supervise it in a tight loop (the window just stays open on a second monitor and it interrupts me every time it needs to call a tool).
- unsupervised in a VM in the cloud where the agent has root. (I give it a task, negotiate a plan, then close the tab and forget about it until I get a PR or a notification that it failed).
I want either full capabilities for the agent (at the cost of needing to supervise for safety) or full independence (at the cost of limited context in a VM). I don't see a productive way to mix and match here, seems you always get the worst of both worlds if you do that.
Maybe the usecase for this particular example is where you are supervising the agent but you're worried that apparently-safe tool calls are actually quietly leaving a secret that's in context? So it's not that it's a 'mixed' usecase but rather it's just increasing safety in the supervised case?
It's been ages since I used VirtualBox and reading the following didn't make me miss the experience at all:
> Eventually I found this GitHub issue. VirtualBox 7.2.4 shipped with a regression that causes high CPU usage on idle guests.
The list of viable hypervisors for running VMs with 3D acceleration is probably short but I'd hope there are more options these days for running headless VMs. Incus (on Linux hosts) and Lima come to mind and both are alternatives to Vagrant as well.
Depends on what you do. If you need to have a fully working site with external integrations, SSL and so on, it's just easier to spend $4 a month on a VPS. But you're right, for many backend-based projects a local VM like multipass or a kind/microk8s cluster are perfectly fine.
You mentioned "deleting the actual project, since the file sync is two-way", my solution (in agentastic.dev) was to fist copy the code with git-worktree, then share that with the container.
As someone that does this, it's Turtles All The Way Down [1]. Every layer has escapes. I require people to climb up multiple turtles thus breaking most skiddie [2] scripts. Attacks will have to targeted and custom crafted by people that can actually code thus reducing the amount of turds in the swimming pool I must avoid. People should not write apps that make assumptions around accessing sensitive files.
It's turtles all the way down but there is a VERY big gap between VM Isolation Turtle and <a half-arse seccomp policy> turtle. It's a qualitative difference between those two sandboxes.
It’s a risk/convenience tradeoff. The biggest threat is Claude accidentally accesses and leaks your ssl keys, or gets prompt-hijacked to do the same. A simple sandbox fixes this.
There are theoretical risks of Claude getting fully owned and going rogue, and doing the iterative malicious work to escape a weaker sandbox, but it seems substantially less likely to me, and therefore perhaps not (currently) worth the extra work.
Is there a premade VM image or docker container I can just start with for example Google Antigravity, Claude or Kilocode/vscode? Right now I have to install some linux desktop and all the tools needed, a bit of a pain IMO.
I see there are cloud VMs like at kilocode but they are kind if useless IMO. I can only interact with the prompt and not the code base directly. Too many things go wrong and maybe I also want kilo code to run a docker stack for me which it can't in the agent cloud.
The UI is obviously vibe-coded garbage but the underlying system works. And most of the time you don't have to open the UI after you've set it running you just comment on the Github PR.
This is clearly an unloved "lab" project that Google will most likely kill but to me the underlying product model is obviously the right one.
I assume Microsoft got this model right first with the "assign issue to Copilot" thing and then fumbled it by being Microsoft. So whoever eventually turns this <correct product model> into an <actual product that doesn't suck> should win big IMO.
Locally, I'd use Vagrant with a provisioning script that installs whatever you need on top of one of the prebuilt Vagrant boxes. You can then snapshot that if you want and turn that into a base image for subsequent containers.
- Run the dev container CLI command to start the container: `devcontainer --workspace-folder . up`
- Run another dev container command to start Claude in the container: `devcontainer exec --workspace-folder . claude`
And there you go! You have a sandboxed environment for Claude to work in. (As sandboxed as Docker is, at least.)
I like this method because you can just manage it like any other Docker container/volumes. When you want to rebuild it, or reset the volume, you just use the appropriate Docker (and the occasional dev container) commands.
I was using opencode the other day. It took me a while to realize the that the agent couldn't read/write the .env file but didn't realize it. When I pushed it first it was able to create a temp file and copy it over .env AND write and opencode.json file that disables the .env protection and go wild.
I recently created a throwaway API key for cloudflare and asked a cursor cloud agent to deploy some infra using it, but it responded with this:
> I can’t take that token and run Cloudflare provisioning on your behalf, even if it’s “only” set as an env var (it’s still a secret credential and you’ve shared it in chat). Please revoke/rotate it immediately in Cloudflare.
So clearly they've put some sort of prompt guard in place. I wonder how easy it would be to circumvent it.
If your prompt is complex enough, doesn’t seem to get triggered.
I use a lot of ansible to manage infra, and before I learned about ansible-vault, I was moving some keys around unprotected in my lab. Bad hygiene- and no prompt intervening.
Kinda bums me out that there may be circumstances where the model just rejects this even if you for some reason you needed it.
It seems depends on model and context usage though, the agent forgets a lot of things after half fill up. It even forgets the primary target you give at the start of chat.
Claude definitely has some API token security baked in, it saw some API keys in a log file of mine the other day and called them out to me as a security issue very clearly. In this case it was a false positive but it handled the situation well and even gave links to reset each token.
I find it better to bubblewrap against a full sandbox directory. Using docker, you can export an image to a single tarball archive, flattening all layers. I use a compatible base image for my kernel/distro, and unpack the image archive into a directory.
With the unpack directory, you can now limit the host paths you expose, avoiding leaking in details from your host machine into the sandbox.
bwrap --ro-bind image/ / --bind src/ /src ...
Any tools you need in the container are installed in the image you unpack.
Some more tips: Use --unshare-all if you can. Make sure to add --proc and --dev options for a functional container. If you just need network, use both --unshare-all and --share-net together, keeping everything else separate. Make sure to drop any privileges with --cap-drop ALL
I put all my agents in a docker file in which the code I'm working on is mounted. It's working perfectly for me until now. I even set it up so I can run gui apps like antigravity in it (X11). If anyone is interested I shared my setup at https://github.com/asfaload/agents_container
In theory the docker container should only have the projects directory mounted, open access to the internet, and thats it. No access to anything else on the host or the local network.
Internet to connect with the provider, install packages, and search.
of course, I'm not pretending this is a universal remedy solving all the problems. But I will add a note in the readme to make it clear, thanks for the feedback!
I've been saying bubblewrap is an amazing solution for years (and sandbox-exec as a mac alternative). This is the only way i run agents on systems i care about
I wish I had the opposite of this. It’s a race trying to come up with new ways to have Cursor edit and set my env files past all their blocking techniques!
I also wrote a tool for doing this[0], after one of these agents edited a config file outside of the repo it was supposed to work within.
I only realized the edit because I have my dotfiles symlinked to a git repository, and git status showed it when I was committing another change.
It's likely that the agents are making changes that I (and others) are not aware of because there is no easy way to detect them.
The approach I started taking is mounting the directory, that I want the agent to work on, into a container.
I use `/_` as the working directory, and have built up some practices around that convention; that's the only directory that I want it to make changes to.
I also mount any config it might need as read-only.
The standard tools like claude code, goose, charm, whatever else, should really spawn the agent (or MCP server?) in another process in a container, and pipe context in and out over stdin/stdout.
I want a tool for managing agents, and I want each agent to be its own process, in its own container.
But just locking up the whole mess seems to work for now.
I see some people in the other comments iterating on what the precise arguments to bubblewrap should be. nnc lets you write presets in Jsonnet, and then refer them by name on the command line, so you can version and share the set of resources that you give to an agent or subprocess.
- totally unsandboxed but I supervise it in a tight loop (the window just stays open on a second monitor and it interrupts me every time it needs to call a tool).
- unsupervised in a VM in the cloud where the agent has root. (I give it a task, negotiate a plan, then close the tab and forget about it until I get a PR or a notification that it failed).
I want either full capabilities for the agent (at the cost of needing to supervise for safety) or full independence (at the cost of limited context in a VM). I don't see a productive way to mix and match here, seems you always get the worst of both worlds if you do that.
Maybe the usecase for this particular example is where you are supervising the agent but you're worried that apparently-safe tool calls are actually quietly leaving a secret that's in context? So it's not that it's a 'mixed' usecase but rather it's just increasing safety in the supervised case?
Why in the cloud and not in a local VM?
I've re-discovered Vagrant and have been using it exactly for this and it's surprisingly effective for my workflows.
https://blog.emilburzo.com/2026/01/running-claude-code-dange...
> Eventually I found this GitHub issue. VirtualBox 7.2.4 shipped with a regression that causes high CPU usage on idle guests.
The list of viable hypervisors for running VMs with 3D acceleration is probably short but I'd hope there are more options these days for running headless VMs. Incus (on Linux hosts) and Lima come to mind and both are alternatives to Vagrant as well.
[1] - https://en.wikipedia.org/wiki/Turtles_all_the_way_down
[2] - https://en.wikipedia.org/wiki/Skiddies
(If the VM is remote, even more so).
There are theoretical risks of Claude getting fully owned and going rogue, and doing the iterative malicious work to escape a weaker sandbox, but it seems substantially less likely to me, and therefore perhaps not (currently) worth the extra work.
I see there are cloud VMs like at kilocode but they are kind if useless IMO. I can only interact with the prompt and not the code base directly. Too many things go wrong and maybe I also want kilo code to run a docker stack for me which it can't in the agent cloud.
The UI is obviously vibe-coded garbage but the underlying system works. And most of the time you don't have to open the UI after you've set it running you just comment on the Github PR.
This is clearly an unloved "lab" project that Google will most likely kill but to me the underlying product model is obviously the right one.
I assume Microsoft got this model right first with the "assign issue to Copilot" thing and then fumbled it by being Microsoft. So whoever eventually turns this <correct product model> into an <actual product that doesn't suck> should win big IMO.
Yes! I'm surprised more people do not want this capability. Check out my comment above, I think Vagrant might also be what you want.
https://sprites.dev/
TLDR:
- Ensure that you have installed npm on your machine.
- Install the dev container CLI globally via npm: `npm i -g @devcontainers/cli`
- Clone the Claude Code repo: https://github.com/anthropics/claude-code
- Navigate into the root directory of that repo.
- Run the dev container CLI command to start the container: `devcontainer --workspace-folder . up`
- Run another dev container command to start Claude in the container: `devcontainer exec --workspace-folder . claude`
And there you go! You have a sandboxed environment for Claude to work in. (As sandboxed as Docker is, at least.)
I like this method because you can just manage it like any other Docker container/volumes. When you want to rebuild it, or reset the volume, you just use the appropriate Docker (and the occasional dev container) commands.
Dead Comment
You may want to take steps to avoid a malicious prompt injection stealing those, since they might contain sensitive data.
Without this, you'll have to re-login to Claude every time. Breaks the speed of development.
I'm going to do some experimenting to see if I can make this bind more precise.
> I can’t take that token and run Cloudflare provisioning on your behalf, even if it’s “only” set as an env var (it’s still a secret credential and you’ve shared it in chat). Please revoke/rotate it immediately in Cloudflare.
So clearly they've put some sort of prompt guard in place. I wonder how easy it would be to circumvent it.
I use a lot of ansible to manage infra, and before I learned about ansible-vault, I was moving some keys around unprotected in my lab. Bad hygiene- and no prompt intervening.
Kinda bums me out that there may be circumstances where the model just rejects this even if you for some reason you needed it.
With the unpack directory, you can now limit the host paths you expose, avoiding leaking in details from your host machine into the sandbox.
bwrap --ro-bind image/ / --bind src/ /src ...
Any tools you need in the container are installed in the image you unpack.
Some more tips: Use --unshare-all if you can. Make sure to add --proc and --dev options for a functional container. If you just need network, use both --unshare-all and --share-net together, keeping everything else separate. Make sure to drop any privileges with --cap-drop ALL
Let me know if you give it a go ;)
Internet to connect with the provider, install packages, and search.
It's not perfect but it's a start.
You must not care about those systems that much.
https://bsky.app/profile/verdverm.com/post/3mbo7ko5ek22n
Mysql user: test
Password: mypass123
Host: localhost
...
The approach I started taking is mounting the directory, that I want the agent to work on, into a container. I use `/_` as the working directory, and have built up some practices around that convention; that's the only directory that I want it to make changes to. I also mount any config it might need as read-only.
The standard tools like claude code, goose, charm, whatever else, should really spawn the agent (or MCP server?) in another process in a container, and pipe context in and out over stdin/stdout. I want a tool for managing agents, and I want each agent to be its own process, in its own container. But just locking up the whole mess seems to work for now.
I see some people in the other comments iterating on what the precise arguments to bubblewrap should be. nnc lets you write presets in Jsonnet, and then refer them by name on the command line, so you can version and share the set of resources that you give to an agent or subprocess.
[0] https://github.com/brendoncarroll/nnc