https://github.com/ScalingIntelligence/tokasaurus/blob/65efb...
I’m honestly impressed that a pure python implementation can beat out vLLM and SGLang. Granted they lean on FlashInfer, and of course torch.compile has gotten incredibly powerful in the last few years. Though dynamic shapes have still been a huge thorn in my side, I’ll need to look closer at how they pulled it off…
In addition to Dev Discuss, a number of core contributors are also active on Twitter. Two particularly helpful and prolific voices are @ezyang and @cHHillee.
Finally, don’t overlook GitHub issues—they’re a surprisingly effective way to start conversations. If you’ve found a bug or have ideas on how to improve the APIs, opening an issue is always welcome.