*this is incorrect per the author’s response, my apologies.
For instance, it goes into (nano)vLLM internals and doesn’t mention PagedAttention once (one of the core ideas that vLLM is based on)[1].
Also mentions that Part 2 will cover dense vs MoE’s, which is weird because nanovllm hardcodes a dense Qwen3 into the source.
Here are better (imo) explainers about how vLLM works:
- https://hamzaelshafie.bearblog.dev/paged-attention-from-firs...
- https://www.aleksagordic.com/blog/vllm
- https://huggingface.co/blog/continuous_batching
Aleksa’s blog is a bit in the weeds for my taste but it’s really worth working through.
A lot of the magic of vLLM happens in the PagedAttention kernels, which are really succinctly implanted in nanovllm. And the codebase is great and readable by itself!
—
https://groups.google.com/g/golang-announce/search?q=form
5 fixes in 2 years related to HTTP form (url-encoded and multipart).
- Go 1.20.1 / 1.19.6: Multipart form parsing could consume excessive memory and disk (unbounded memory accounting and unlimited temp files)
- Go 1.20.3 / 1.19.8: Multipart form parsing could cause CPU and memory DoS due to undercounted memory usage and excessive allocations
- Go 1.20.3 / 1.19.8: HTTP and MIME header parsing could allocate far more memory than required from small inputs
- Go 1.22.1 / 1.21.8: Request.ParseMultipartForm did not properly limit memory usage when reading very long form lines, enabling memory exhaustion.
- Go 1.25.6 / 1.24.12: Request.ParseForm (URL-encoded forms) could allocate excessive memory when given very large numbers of key-value pairs.
Probably every HTTP server implementation in every language has similar vulnerabilities. And these are logic errors, not even memory safety bugs.
https://soniox.com/compare