In this case the "agent" definition they are using is the one from the https://github.com/openai/openai-agents-python Python library, which they are running in the browser via Pyodide and WASM.
That library defines an agent as a system prompt and optional tools - notable because many other common agent definitions have the tools as required, not optional.
> notable because many other common agent definitions have the tools as required, not optional.
This feels weird to me. I would think of an agent with no tools as the trivial case of a "null agent" or "empty agent".
It would be like saying you can't have a list with no elements because that's not a list at all... but an empty list is actually quite useful in many contexts. Which is why almost all implementations in all languages allow empty lists and add something distinct like a NonEmptyList to handle that specific case.
We looked at Pyodide and WASM along with other options like firecracker for our need of multi-step tasks that require running LLM generated code locally via Ollama etc. with some form of isolation than running it directly on our dev machines and figured it would be too much work with the various external libraries we have to install. The idea was to get code generated by a powerful remote LLM for general purpose stuff like video editing via ffmpeg, beautiful graphs generation via JS + chromium and stuff and execute it locally with all dependencies being installed before execution.
We built CodeRunner (https://github.com/BandarLabs/coderunner) on top of Apple Containers recently and have been using it for sometime. This works fine but still needs some improvement to work across very arbitrary prompts.
For the Gemini-cli integration, is the only difference between code runner with Gemini-cli, and gemini-cli itself, is that you are just using Gemini-cli in a container?
No, Gemini-cli still is on your local machine, when it generates some code based on your prompt, with Coderunner, the code runs inside a container (which is inside a new lightweight VM courtesy Apple and provides VM level isolation), installs libraries requested, executes the generated code inside it and returns the result back to Gemini-cli.
This is also not Gemini-cli specific and you could use the sandbox with any of the popular LLMs or even with your local ones.
This is trying to use the word agent to make it sound cool, but it doesn't make a case for why it's particularly about agents and not just basic level AI stuff.
> The agent code is nothing more than a Python script that relies on the openai-agents-python library to run an AI agent backed by an LLM served via an OpenAI-compatible API.
The openai-agents-python code is useful for writing agents but it is possible to use it to write code that isn't very agentic. None of the examples are very agentic.
When I saw the title, I thought this was running models in the browser. IMO that's way more interesting and you can do it with transformers.js and onnx runtime. You don't even need a gpu.
From a quick gander.
WASM is not to talk to the servers.
WASM can be utilized to run AI Agents to talk to local LLMs from a sandboxed environment through the browser.
For example in the next few years if Operating System companies and PC producers make small local models stock standards to improve the operating system functions and other services.
This local LLM engine layer can be used by browser applications too and that being done through WASM without having to write Javascript and using WASM sandboxed layer to safely expose the this system LLM Engine Layer.
They're using some python libraries like openai-agents so presumably it's to save on development efforts of calling/prompting/managing the HTTP endpoints. But yes this could just be done in regular JS in the browser, they'd have to write a lot of boilerplate for an ecosystem which is mainly Python.
You never need WASM (or any other language, bytecode format, etc) to talk to LLMs. But WASM provides things people might like for agents, eg. strict sandboxing by default.
I recently wrote some Javascript to automate clicking coupons. The website checks for non-human clicks using event.isTrusted. Firefox allowed me to bypass this by rewriting the JS to replace s/isTrusted/true, while Chrome Manifest V3 doesn't allow it. Anyway, Firefox might be the future of agents, due to its extensibility.
Mildly interesting article - I mean, you can already run a ton of libraries that talk to an inference backend. The only difference here is that the client-side code is in Python, which by itself doesn't make creating agents any simpler - I would argue that it complicates things a tone.
Also, connecting a model to a bunch of tools and dropping it into some kind of workflow is maybe 5% of the actual work. The rest is spent on observability, background tasks, queueing systems, multi-channel support for agents, user experience, etc., etc., etc.
Nobody talks about that part, because most of the content out there is just chasing trends - without much real-world experience running these systems or putting them in front of actual customers with real needs.
Agreed, regarding the other parts of the "LLM" stack, have a look at the, IMO best, LLM coordination / Observability platform TS library:
https://mastra.ai/
That library defines an agent as a system prompt and optional tools - notable because many other common agent definitions have the tools as required, not optional.
That explains why their "hello world" demo just runs a single prompt: https://github.com/mozilla-ai/wasm-agents-blueprint/blob/mai...
This feels weird to me. I would think of an agent with no tools as the trivial case of a "null agent" or "empty agent".
It would be like saying you can't have a list with no elements because that's not a list at all... but an empty list is actually quite useful in many contexts. Which is why almost all implementations in all languages allow empty lists and add something distinct like a NonEmptyList to handle that specific case.
Can't do that without tools!
We built CodeRunner (https://github.com/BandarLabs/coderunner) on top of Apple Containers recently and have been using it for sometime. This works fine but still needs some improvement to work across very arbitrary prompts.
This is also not Gemini-cli specific and you could use the sandbox with any of the popular LLMs or even with your local ones.
> The agent code is nothing more than a Python script that relies on the openai-agents-python library to run an AI agent backed by an LLM served via an OpenAI-compatible API.
The openai-agents-python code is useful for writing agents but it is possible to use it to write code that isn't very agentic. None of the examples are very agentic.
https://huggingface.co/spaces/webml-community/llama-3.2-webg...
I can't run it on Linux since WebGPU is not working for me...
Why would you need WASM for this?
For example in the next few years if Operating System companies and PC producers make small local models stock standards to improve the operating system functions and other services. This local LLM engine layer can be used by browser applications too and that being done through WASM without having to write Javascript and using WASM sandboxed layer to safely expose the this system LLM Engine Layer.
Surely that's a prime use for AI?
https://galqiwi.github.io/aqlm-rs
Deleted Comment
Also, connecting a model to a bunch of tools and dropping it into some kind of workflow is maybe 5% of the actual work. The rest is spent on observability, background tasks, queueing systems, multi-channel support for agents, user experience, etc., etc., etc.
Nobody talks about that part, because most of the content out there is just chasing trends - without much real-world experience running these systems or putting them in front of actual customers with real needs.