honorable_coder (u/honorable_coder)

honorable_coder commented on Show HN: Claude Code 2.0 router – preference-aligned routing to multiple LLMs github.com/katanemo/archg... · Posted by u/adilhafeez

honorable_coder · 3 months ago

You mean Claude Code 2.0 Router? What's 2.0 about your router, isn't it the v1? And its not packaged into a CLI agent - its integrated into Claude Code (meaning you don't support other agents yet). Correct?

honorable_coder commented on Claude Code 2.0 Router – Aligning LLM routing to preferences, not benchmarks github.com/katanemo/archg... · Posted by u/honorable_coder

honorable_coder · 3 months ago

Hi HN — we're the team behind Arch-Router (https://huggingface.co/katanemo/Arch-Router-1.5B), A 1.5B preference-aligned LLM router that guides model selection by matching queries to user-defined domains (e.g., travel) or action types (e.g., image editing). Offering a practical mechanism to encode preferences and subjective evaluation criteria in routing decisions.

Today we’re extending that approach to Claude Code via Arch Gateway[1], bringing multi-LLM access into a single CLI agent with two main benefits:

1. Model Access: Use Claude Code alongside Grok, Mistral, Gemini, DeepSeek, GPT or local models via Ollama.

2. Preference-based Routing: Assign different models to specific coding tasks, such as – Code generation – Code reviews and comprehension – Architecture and system design – Debugging

Why not route based on public benchmarks? Most routers lean on performance metrics — public benchmarks like MMLU or MT-Bench, or raw latency/cost curves. The problem: they miss domain-specific quality, subjective evaluation criteria, and the nuance of what a “good” response actually means for a particular user. They can be opaque, hard to debug, and disconnected from real developer needs.

[1] Arch Gateway repo: https://github.com/katanemo/archgw

honorable_coder commented on Model literals, semantic aliases, and preference-aligned routing for LLMs docs.archgw.com/guides/ll... · Posted by u/honorable_coder

honorable_coder · 3 months ago

Today we’re shipping a major update to ArchGW (an edge and service proxy for agents [1]): a unified router that supports three strategies for directing traffic to LLMs — from explicit model names, to semantic aliases, to dynamic preference-aligned routing. Here’s how each works on its own, and how they come together.

Preference-aligned routing decouples task detection (e.g., code generation, image editing, Q&A) from LLM assignment. This approach captures the preferences developers establish when testing and evaluating LLMs on their domain-specific workflows and tasks. So, rather than relying on an automatic router trained to beat abstract benchmarks like MMLU or MT-Bench, developers can dynamically route requests to the most suitable model based on internal evaluations — and easily swap out the underlying moodel for specific actions and workflows. This is powered by our 1.5B Arch-Router LLM [2]. We also published our research on this recently[3]

Modal-aliases provide semantic, version-controlled names for models. Instead of using provider-specific model names like gpt-4o-mini or claude-3-5-sonnet-20241022 in your client you can create meaningful aliases like "fast-model" or "arch.summarize.v1". This allows you to test new models, swap out the config safely without having to do code-wide search/replace every time you want to use a new model for a very specific workflow or task.

Model-literals (nothing new) lets you specify exact provider/model combinations (e.g., openai/gpt-4o, anthropic/claude-3-5-sonnet-20241022), giving you full control and transparency over which model handles each request.

P.S. we routinely get asked why we didn't build semantic/embedding models for routing use cases or use some form of clustering technique. Clustering/embedding routers miss context, negation, and short elliptical queries, etc. An autoregressive approach conditions on the full context, letting the model reason about the task and generate an explicit label that can be used to match to an agent, task or LLM. In practice, this generalizes better to unseen or low-frequency intents and stays robust as conversations drift, without brittle thresholds or post-hoc cluster tuning.

[1] https://github.com/katanemo/archgw [2] https://huggingface.co/katanemo/Arch-Router-1.5B [2] https://arxiv.org/abs/2506.16655