You're right that archgw handles routing at the infrastructure level, which is perfect for centralized control. any-llm simply gives you the option to handle routing in your application code when that makes sense (For example, premium users get Opus-4). We leave the architectural choice to you, whether that's adding a proxy, keeping routing in your app, or using both, or just using any-llm directly.
And was pleased with what I was able to do. Thanks
Arch-Router takes a different approach. Instead of focusing benchmark scores, we lets developers define routing policies in plain language based on their preferences — like “contract analysis → GPT-4o” or “lightweight brainstorming → Gemini Flash.” Our 1.5B model learns to map prompts (along with conversational context) to these policies, enabling routing decisions that align with real-world expectations, not abstract leaderboards. Also our approach doesn't require router model retraining when new LLMs are swapped in or when preferences change.
Hope this helps.
Today we’re extending that approach to Claude Code via Arch Gateway[2], bringing multi-LLM access into a single CLI agent with two main benefits:
1. Model Access: Use Claude Code alongside Grok, Mistral, Gemini, DeepSeek, GPT or local models via Ollama.
2. Preference-aware Routing: Assign different models to specific coding tasks, such as – Code generation – Code reviews and comprehension – Architecture and system design – Debugging
Why not route based on public benchmarks? Most routers lean on performance metrics — public benchmarks like MMLU or MT-Bench, or raw latency/cost curves. The problem: they miss domain-specific quality, subjective evaluation criteria, and the nuance of what a “good” response actually means for a particular user. They can be opaque, hard to debug, and disconnected from real developer needs.
[1] Arch-Router: https://huggingface.co/katanemo/Arch-Router-1.5B
[2] Arch Gateway: https://github.com/katanemo/archgw