Unlike your average LLM benchmark, this benchmark focuses on location and time as variables since these are the biggest factors for networking systems (I was a developer for networking tools in a past life). The idea is to run benchmarks from multiple geographic locations over time to see how each platform performs under different conditions.
Basic setup: echo agent servers can create and connect to temporary rooms to echo back after receiving messages. Since Pipecat (Daily) and LiveKit Python SDKs can't coexist in the same process, I have to run separate agent processes on different ports. Benchmark runner clients send pings over WebRTC data channels and measure RTT for each message. Raw measurements get stored in InfluxDB, then the dashboard calculates aggregate stats (P50/P95/P99, jitter, packet loss) and visualizes everything with filters and side-by-side comparisons.
I struggled with creating a fair comparison since each platform has different APIs. Ended up using data channels (not audio) for consistency, though this only measures data message transport, not the full audio pipeline (codecs, jitter buffers, etc).
Latency is hard to measure precisely, so I'm estimating based on server processing time - admittedly not perfect. Only testing data channels, not full audio path. And it's just Pipecat (Daily) and LiveKit for now, would like to add Agora, etc.
The README screenshot shows synthetic data resembling early results. Not posting raw results yet since I'm still working out some measurement inaccuracies and need more data points across locations over time to draw solid conclusions.
This is functional but rough around the edges. Happy to keep building it out if people find it useful. Any ideas on better methodology for fair comparisons or improving measurements? What platforms would you want to see added?
Stack: Python, TypeScript (React), InfluxDB
If you keep your PRs small I guess the end result is the same, but even then I like things in individual commits for ease of review.
Its not a if. it's necessary for the sake of people reviewing your code. Unless you work alone on your pet project and always push to master you never work alone.
Dead Comment
The helix situation is still miles better for up and running asap compared to dancing with files/lua on lazyvim. Just having to refer to docs to install a plugin, writing sane remaps etc eats up time. If you really just speedrun everything under an hour good for you. But for the rest, a lsp is a one package manager install away (even on windows scoop seems to have become the de facto), editing a toml is much easier than fiddling with the lua api/vimscript "just" to set some variables.
(Not a helix user though I have tried both vim/nvim/helix)
The only problem for me was the keybindings work good unless my vim instincts kick in where I become slow. The other one was lack of plugins.