The decision to pass all params as a JSON string to --params makes it unfriendly for humans to experiment with, although Claude Code managed to one-shot the right command for me, so I guess this is fine. This is an intentional design per https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-ag...
Claude 4.6 Opus and Gemini 3.1 Pro can to some degree, although the 3D models they produce are often deficient in some way that my eval didn't capture.
My eval used OpenSCAD simply due to familiarity and not having time to experiment with build123d/CadQuery. There is an academic paper where they were successful at fine-tuning a small VLM to do CadQuery: https://arxiv.org/pdf/2505.14646
I maintain [1], which provides the models with the ability to render a screenshot from any angle and as far as I can tell, visually driven feedback does not work that well as this point. The models probably don't get enough of "lovecraftian garbled 3D model mess" in the training data or something...
The simulator lets the LLM request renders from different angles/times, so the LLM can get visual feedback. For failures, the simulator also returns status codes like `object_fell` or `mount_initially_collided_with_object` depending on what happened. You can see what the tool call looks like by looking at the Transcript tab, e.g. here https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__...
I agree it's not clear how much benefit models get from iteration. Many of the successful runs are one-shots. You can see some examples of basic spatial reasoning e.g. here https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__... :
> The initial collision is because the mount was positioned at the same height as the mug's body center (z=-22), causing overlap. I need to lower the mount significantly so the mug starts above it and drops into the cradle.
a = b = []
has the same semantics here as b = []
a = b
which I don't find surprising.Unfortunately certain commands like `rg` will return non-zero by design when there are no matches, which could be an intentional outcome.
I think this tech has become "production-ready" recently due to a combination of research progress (the seminal paper was published in 2023 https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/) and improvements to differentiable programming libraries (e.g. PyTorch) and GPU hardware.