Programmatic tool use feels like the way it always should have worked, and where agents seem to be going more broadly: acting within sandboxed VMs with a mix of custom code and programmatic interfaces to external services. This is a clear improvement over the LangChain-style Rupe Goldberg machines that we dealt with last year.
It uses their Python sandbox, is available via API, and exposes the tool calls themselves as normal tool calls to the API client - should be really simple to use!
Batch tool calling has been a game-changer for the AI assistant we've built into our product recently, and this sounds like a further evolution of this, really (primarily, it's about speed; if you can accomplish 2x more tools calls in one turn, it will usually mean your agent is now 2x faster).
It works as an MCP proxy of sorts that converts all the child MCP tools into typescript annotations, asks your LLM to generate typescript, then executes those tool calls in a restricted VM to do the tool calls that way. It allows parellel process, passing data between tools without coming back to the LLM for a full loop, etc. The agents are pretty good at debugging issues they create too and trying again.
Most LLMs are naturally better at code generation than they are at tool calling with code understanding being more foundational to their knowledge and tool calling being pound into models in later stages during fine tuning. It can also burn an excessive number of tokens passing data between tools via LLMs in these agent orchestrators. But if you move the tool calling to be done by code rather than directly by the LLMs in the agents and have the LLMs generate that code, it can produce significantly better results for complex cases and reduce overhead with passing data between tool calls.
This implementation works as an MCP server proxy basically. As an MCP server, it is also an MCP client to your child servers. In the middle it hosts a node VM to execute code generated by the LLM to make tool calls indirectly. By introspecting the child MCP servers and converting their tool call interfaces to small condensed typescript API declarations, your LLM can generate code that invokes these tools in the provided node VM instead of invoking directly and do the complex processing of the response handling and errors in code instead of directly. This can be really powerful with when doing multiple tool calls in parallel or with logic around processing. And since it's a node VM, it has access to standard node models and built in standard libraries there.
One issue is if your tool calls are actually simple, like doing a basic web search or a single tool call, this can a bit more unnecessary overhead. But the more complex the prompt, the more this approach can significantly improve the quality of the output and lower your inference billing costs.
I would say you are almost always better buying this + a mini-pc then a synology at this point, or a Ugreen NAS + TrueNAS if you want to do amost everything a synology can do.
It's creating a void that is getting filled with Ugreen, Minisforum, beelink, Aoostor for invoative platforms from China and classic competitors like Qnap, Asustor, Teramaster, etc for innovation for the small to mid-tier needs. 45drives in the larger spaces for folks wanting to manage things more on their own but have enterprise scale needs. Dell and HP have always competed on the high-end enterprise space and also becoming a better option, even though synology is so easy as an appliance.
It doesn't solve how you package your wheels specifically, that problem is still pushed on your downstream users because of boneheaded packaging decisions by PyTorch themselves but as the consumer, Pixi soften's blow. The condaforge builds of PyTorch also are a bit more sane.
In the age of AI, I reduced my load on small utility libraries and just have the bigger ones that I'll follow semver and update to manager versions when it make sense and always take small patches but still look at the release notes for what changed.