sparacha (u/sparacha)

sparacha commented on · Posted by u/sparacha

sparacha · 3 months ago

Hi HN — we’re the team behind Arch-Router [1], A 1.5B preference-aligned LLM router that guides model selection by matching queries to user-defined domains (e.g., travel) or action types (e.g., image editing). Offering a practical mechanism to encode preferences and subjective evaluation criteria in routing decisions.

Today we’re extending that approach to Claude Code via Arch Gateway[2], bringing multi-LLM access into a single CLI agent with two main benefits:

1. Model Access: Use Claude Code alongside Grok, Mistral, Gemini, DeepSeek, GPT or local models via Ollama.

2. Preference-aware Routing: Assign different models to specific coding tasks, such as – Code generation – Code reviews and comprehension – Architecture and system design – Debugging

Why not route based on public benchmarks? Most routers lean on performance metrics — public benchmarks like MMLU or MT-Bench, or raw latency/cost curves. The problem: they miss domain-specific quality, subjective evaluation criteria, and the nuance of what a “good” response actually means for a particular user. They can be opaque, hard to debug, and disconnected from real developer needs.

[1] Arch-Router: https://huggingface.co/katanemo/Arch-Router-1.5B

[2] Arch Gateway: https://github.com/katanemo/archgw

sparacha commented on Arch-Router: Aligning LLM Routing with Human Preferences arxiv.org/abs/2506.16655... · Posted by u/handfuloflight

sparacha · 5 months ago

Hey! I built this. AMA. The model router is built into the proxy layer here: https://github.com/katanemo/archgw

sparacha commented on Show HN: Any-LLM – Lightweight router to access any LLM Provider github.com/mozilla-ai/any... · Posted by u/AMeckes

AMeckes · 5 months ago

To clarify: any-llm is just a Python library you import, not a service to run. When I said "put it behind a proxy," I meant your app (which imports any-llm) can run behind a normal proxy setup.

You're right that archgw handles routing at the infrastructure level, which is perfect for centralized control. any-llm simply gives you the option to handle routing in your application code when that makes sense (For example, premium users get Opus-4). We leave the architectural choice to you, whether that's adding a proxy, keeping routing in your app, or using both, or just using any-llm directly.

sparacha · 5 months ago

But you can also use tokens to implement routing decisions in a proxy. You can make RBAC natively available to all agents outside code. The incremental feature work in code vs an out of process server is the trade off. One gets you going super fast the other offers a design choice that (I think) scales a lot better

sparacha commented on Show HN: Any-LLM – Lightweight router to access any LLM Provider github.com/mozilla-ai/any... · Posted by u/AMeckes

sparacha · 5 months ago

There is liteLLM, OpenRouter, Arch (although that’s an edge/service proxy for agents) and now this. We all need a new problem to solve

sparacha commented on Show HN: ArchGW – An intelligent edge and service proxy for agents github.com/katanemo/archg... · Posted by u/honorable_coder

mutant · 5 months ago

Huh, this is pretty dope. I tried this example https://github.com/katanemo/archgw/blob/main/demos/samples_p...

And was pleased with what I was able to do. Thanks

sparacha · 5 months ago

That’s an example of what the edge component could do. Did you give the preference-based automatic routing a try?

sparacha commented on Show HN: Arch-Router – 1.5B model for LLM routing by preferences, not benchmarks · Posted by u/adilhafeez

_nh_ · 6 months ago

How do you compare with RouteLLM?

sparacha · 6 months ago

RouteLLM is essentially a benchmark-driven approach. Their framework chooses between a weak and a strong model and helps developers optimize for a metric called APGR (Average Performance Gap Recovered) — a measure of how much of the stronger model’s performance can be recovered when routing some queries to the weaker, cheaper model. However, their routing models are trained to maximize performance on public benchmarks like MMLU, BBH, or MT-Bench. These benchmarks may not capture subjective, domain-specific quality signals that surface in practice.

Arch-Router takes a different approach. Instead of focusing benchmark scores, we lets developers define routing policies in plain language based on their preferences — like “contract analysis → GPT-4o” or “lightweight brainstorming → Gemini Flash.” Our 1.5B model learns to map prompts (along with conversational context) to these policies, enabling routing decisions that align with real-world expectations, not abstract leaderboards. Also our approach doesn't require router model retraining when new LLMs are swapped in or when preferences change.

Hope this helps.

sparacha commented on Show HN: Core – open source memory graph for LLMs – shareable, user owned github.com/RedPlanetHQ/co... · Posted by u/Manik_agg

_joel · 6 months ago

You can run OpenAI compatible servers locally. vLLM, ollama, LMStudio and others.

sparacha · 6 months ago

https://news.ycombinator.com/item?id=44436031