A lock-free queue by itself isn't very useful. You need a polling strategy that doesn't involve a busy loop. If you use mutexes and condvars, you've basically turned it into a lock based queue. Eventcounts work much better.
If I run more threads than CPUs and enough work so I get time slice ends, I get about 1160 nsecs avg enq/deq for mutex version, and about 146 nsecs for eventcount version.
Timings will vary based on how man threads you use and cpu affinity that takes your hw thread/core/cache layout into consideration. I have gen 13 i5 that runs this slower than my gen 10 i5 because of the former's efficiency cores even though it is supposedly faster.
And yes, a queue is a poster child for cache contention problems, une enfant terrible. I tried a back off strategy at one point but it didn't help any.
I rewrote the entire queue with lock-free CAS to manage insertions/removals on the list and we finally got some better numbers. But not always! We found it worked best either as a single thread, or during massive contention. With a normal load it wasn't really much better.
I asked it about a paper I was looking at (SLOG [0]) and it basically lost the context of what "slog" referred to after 3 prompts.
1. I asked for an example transaction illustrating the key advantages of the SLOG approach. It responded with some general DB transaction stuff.
2. I then said "no use slog like we were talking about" and then it gave me a golang example using the log/slog package
Even without the weird political things around Grok, it just isn't that good.
[0] https://www.vldb.org/pvldb/vol12/p1747-ren.pdf
I'll say that grok is really excellent at helping my understand the codebase, but some miss-named functions or variables will trip it up..