On my older system I had a WD_BLACK SN850X but had it connected to an M.1 slot which may be limiting. This is where I measured 1-2ms latency.
Is there any good place to get numbers of what is possible with enterprise hardware today? I've struggled for some time to find a good source.
[1] https://blocksandfiles.com/2023/08/07/kioxias-rocketship-dat...
Update: about 800us on a more modern system.
In my last job, we used Rabbit to move about 15k messages per sec across about 2000 queues with 200 producers (which produced to all queues) and 2000 consumers (which each read from their own queues). Any time any of the consumers would slow down of fail, rabbit would run out of memory and crash, causing sitewide failure.
Additionally, Rabbit would invent network partitions out of thin air, which would cause it to lose messages, as when partitions are healed, all messages on an arbitrarily chosen side of the partition are discarded. (See https://aphyr.com/posts/315-jepsen-rabbitmq for more details about Rabbit's issues and some recommendations for running Rabbit, which sound worse than just using something else to me.)
We experimented with "high availability" mode, which caused the cluster to crash more frequently and lose more messages, "durability", which caused the cluster to crash more frequently and lose more messages, and trying to colocate all of our Rabbit nodes on the same rack (which did not fix the constant partitions, and caused us to totally fail when this rack lost power, as you'd expect.)
These are not theoretical problems. At one point, I spent an entire night fighting with this stupid thing alongside 4 other competent infrastructure engineers. The only long term solution that we found was to completely deprecate our use of Rabbit and use Kafka instead.
To anyone considering Rabbit, please reconsider! If you're OK with losing messages, then simply making an asynchronous fire-and-forget RPC directly to the relevant consumers may be a better solution for you, since at least there isn't more infrastructure to maintain.
This worked fine until things got behind and then we couldn't keep up. We were able to work around that by using a hashed exchange that spread messages across 4 queues. It hashed based on timestamp inserted by a timestamp plugin. Since all operations for a queue happen in the same event loop, any sort of backup led to pub and sub operations fighting for CPU time. By spreading this across 4 queues we wound up with 4x the CPU capacity for this particular exchange. With 2000 queues you probably didn't run into that issue very often.