The bug being that the precedence of || is higher than the precedence of != ? Consider writing it if ((event->state != PERF_EVENT_STATE_ACTIVE) || (event->hw_state & PERF_HES_STOPPED))
This coming from a person who has too many scars from not parenthesizing my expressions in conditionals to ensure they work the way I meant them to work.
[1]: https://www.youtube.com/watch?v=u7-S-Hn-7Do
[2]: https://netflixtechblog.com/netflix-flamescope-a57ca19d47bb
Heatmaps are amazing for pattern spotting. I also use them when hunting irregular hiccups or outliers. More people should know about this feature.
In most cases it's even better to just store a maximum per thread separately and loop over all threads once to compute the current maximum if you really need it.
Edit: I could not find any pass with a pattern matching to replace CAS loops. The closest thing I could find is this pass: https://github.com/llvm/llvm-project/blob/06fb26c3a4ede66755... I reckon one could write a similar pass to recognize CAS idioms, but its usefulness would be probably rather limited and not worth the effort/risks.
I do not have much experience with it, but I think you can see the kernel call stack with it and I know you can also see the return value (in eax). That would be less effort than qemu + gdb + disabling kernel aslr, etc.