Readit News logoReadit News
lukax commented on Soniox: Real-time transcription in 60 languages   soniox.com/... · Posted by u/lukax
lukax · 11 days ago
Also see how it compares to other providers:

https://soniox.com/compare

lukax commented on Nano-vLLM: How a vLLM-style inference engine works   neutree.ai/blog/nano-vllm... · Posted by u/yz-yu
jbarrow · 13 days ago
The whole thing feels AI written, generated from the codebase.*

*this is incorrect per the author’s response, my apologies.

For instance, it goes into (nano)vLLM internals and doesn’t mention PagedAttention once (one of the core ideas that vLLM is based on)[1].

Also mentions that Part 2 will cover dense vs MoE’s, which is weird because nanovllm hardcodes a dense Qwen3 into the source.

Here are better (imo) explainers about how vLLM works:

- https://hamzaelshafie.bearblog.dev/paged-attention-from-firs...

- https://www.aleksagordic.com/blog/vllm

- https://huggingface.co/blog/continuous_batching

Aleksa’s blog is a bit in the weeds for my taste but it’s really worth working through.

A lot of the magic of vLLM happens in the PagedAttention kernels, which are really succinctly implanted in nanovllm. And the codebase is great and readable by itself!

1. https://arxiv.org/abs/2309.06180

lukax · 13 days ago
Not really in the PagedAttention kernels. Paged attention was integrated into FlashAttention so that FlashAttention kernels can be used both for prefill and decoding with paged KV. The only paged attention specific kernels are for copying KV blocks (device to device, device to host and host to device). At least for FA2 and FA3, vLLM maintained a fork of FA with paged attention patches.
lukax commented on Vibecoding #2   matklad.github.io/2026/01... · Posted by u/ibobev
jacobtomlinson · 25 days ago
Instructions unclear, Claude just spent three days and millions of tokens rebuilding SLURM from the ground up. /s
lukax · 25 days ago
Maybe AWS ParallelCluster which is a managed SLURM on AWS.
lukax commented on CVEs affecting the Svelte ecosystem   svelte.dev/blog/cves-affe... · Posted by u/tobr
lukax · a month ago
It's not that simple to safely parse HTTP request form. Just look at Go security releases related to form parsing (a new fix released just today).

https://groups.google.com/g/golang-announce/search?q=form

5 fixes in 2 years related to HTTP form (url-encoded and multipart).

- Go 1.20.1 / 1.19.6: Multipart form parsing could consume excessive memory and disk (unbounded memory accounting and unlimited temp files)

- Go 1.20.3 / 1.19.8: Multipart form parsing could cause CPU and memory DoS due to undercounted memory usage and excessive allocations

- Go 1.20.3 / 1.19.8: HTTP and MIME header parsing could allocate far more memory than required from small inputs

- Go 1.22.1 / 1.21.8: Request.ParseMultipartForm did not properly limit memory usage when reading very long form lines, enabling memory exhaustion.

- Go 1.25.6 / 1.24.12: Request.ParseForm (URL-encoded forms) could allocate excessive memory when given very large numbers of key-value pairs.

Probably every HTTP server implementation in every language has similar vulnerabilities. And these are logic errors, not even memory safety bugs.

lukax commented on Locating a Photo of a Vehicle in 30 Seconds with GeoSpy   geospy.ai/blog/locating-a... · Posted by u/kachapopopow
avidiax · a month ago
Wondering how theives can sell a stolen car. Do they have fake paperwork?
lukax · a month ago
You can buy a totaled car for cheap and use its VIN.
lukax commented on VSCode rebrands as "The open source AI code editor"   code.visualstudio.com... · Posted by u/michidk
lukax · 2 months ago
I guess there's a lot of pressure from Cursor and Google's Antigravity. Also with Zed you can bring your own API key which VS Code didn't support for a long time.

u/lukax

KarmaCake day784January 11, 2012View Original