We are currently also building a solution InstaVM which is ideologically the same but for cloud https://instavm.io
We are currently also building a solution InstaVM which is ideologically the same but for cloud https://instavm.io
Also, the term “remote code execution” in the beginning is misused. Ironically, remote code execution refers to execution of code locally - by a remote attacker. Claude Code does in fact have that, but I’m not sure if that’s what they’re referring to.
I also used the tool to generate an Adult Chess improvers FIDE rank list for all federations around the world. Here are the July 2025 rankings though it still needs major improvements in filtering - https://chess-ranking.pages.dev
------------------
Another idea that I have been working on for sometime is connecting my Gmail which is a source of truth for all financial, travel, personal related stuff to a LLM that can do isolated code execution to generate beautiful infographics, charts, etc. on my travels, spending patterns. The idea is to do local processing on my emails while generating the actual queries blindly using a powerful remote LLM by only providing a schema and an emails 'fingerprint' kind of file that gives the LLM a sense of what country, region, interests we might be talking about without actually transmitting personal data. The level of privacy of the 'fingerprint' vs the quality of queries generated is something I have been very confused with.
This is also not Gemini-cli specific and you could use the sandbox with any of the popular LLMs or even with your local ones.
We built CodeRunner (https://github.com/BandarLabs/coderunner) on top of Apple Containers recently and have been using it for sometime. This works fine but still needs some improvement to work across very arbitrary prompts.
Why should a language model be good at chess or similar numerical/analytical tasks?
In what way does language resemble chess?
Apple containers have been great especially that each of them maps 1:1 to a dedicated lightweight VM. Except for a bug or two that appeared in the early releases, things seem to be working out well. I believe not a lot of projects are leveraging it.
A general code execution sandbox for AI code or otherwise that used Apple containers is https://github.com/instavm/coderunner It can be hooked to Claude code and others.