1. Balance / schedule incoming requests to the right backend
2. Model server replicas that can run on multiple hardware topologies
3. Prefix caching hierarchy with well-tested variants for different use cases
So it's a 3-tier architecture. The biggest difference with Dynamo is that llm-d is using the inference gateway extension - https://github.com/kubernetes-sigs/gateway-api-inference-ext... - which brings Kubernetes owned APIs for managing model routing, request priority and flow control, LoRA support etc.
* The "stack-centric" approach such as vLLM production stack, AIBrix, etc. These set up an entire inference stack for you including KV cache, routing, etc.
* The "pipeline-centric" approach such as NVidia Dynamo, Ray, BentoML. These give you more of an SDK so you can define inference pipelines that you can then deploy on your specific hardware.
It seems like LLM-d is the former. Is that right? What prompted you to go down that direction, instead of the direction of Dynamo?
- They used QwQ to generate training data (with some cleanup using GPT-4o-mini)
- The training data was then used to FT Qwen2.5-32B-Instruct (non-reasoning model)
- Result was that Sky-T1 performs slightly worse than QwQ but much better than Qwen2.5 on reasoning tasks
There are a few dismissive comments here but I actually think this is pretty interesting as it shows how you can FT a foundation model to do better at reasoning.
We're a well-funded, pre-seed cybersecurity startup focused on data security. I'm looking for a founding AI lead with experience in fine-tuning LLMs (expertise around RL + reasoning models a big plus). This person would own the full AI stack from data to training to eval to test-time compute.
Who's a good fit:
* If you've always thought about starting a company, but for whatever reason (funding, life, idea), this is a great opportunity to be part of the founding team. We're 2 people right now.
* You enjoy understanding customer problems and their use cases, and then figuring out the best solution (sometimes technical, sometimes not) to their problems.
* You want to help figure out what a company looks like in this AI era.
* You enjoy teaching and sharing knowledge.
Questions, interest, just email jobs@polarsky.ai.