- Pinned to 6 cores: 28k QPS
- Pinned to 12 cores: 56k QPS
- All 24 cores: 62k QPS
I'm not sure how this applies to realistic workloads where you're using all of the cores but not maxing them out, but it looks like hyperthreading only adds ~10% performance in this case.
(1) Measure MIPS with perf (2) Compare that to max MIPS for your processor
Unfortunately, MIPS is too vague since the amount of work done depends on the instruction, and there's no good way to measure max MIPS for most processors. (╯°□°)╯︵ ┻━┻
Up to a hair over 60% utilization the queuing delays on any work queue remain essentially negligible. At 70 they become noticeable, and at 80% they've doubled. And then it just turns into a shitshow from there on.
The rule of thumb is 60% is zero, and 80% is the inflection point where delays go exponential.
The biggest cluster I ran, we hit about 65% CPU at our target P95 time, which is pretty much right on the theoretical mark.
It's gotta be at least 2 out of every 3 chip generations going back to the original implementation, where you're better off without it than with.
- Pinned to 6 cores: 28k QPS
- Pinned to 12 cores: 56k QPS
- All 24 cores: 62k QPS
I'm not sure how this applies to realistic workloads where you're using all of the cores but not maxing them out, but it looks like hyperthreading only adds ~10% performance in this case.
At 51% reported CPU utilization, it's doing about 80% of the maximum requests per second, and it can't get above 80% utilization.
I also added a section: https://www.brendanlong.com/cpu-utilization-is-a-lie.html#bo...