Readit News logoReadit News
jerrinot commented on A kernel bug froze my machine: Debugging an async-profiler deadlock   questdb.com/blog/async-pr... · Posted by u/bluestreak
broken_broken_ · 4 hours ago
Nice article, thank you. Did you also consider using bpftrace while debugging?

I do not have much experience with it, but I think you can see the kernel call stack with it and I know you can also see the return value (in eax). That would be less effort than qemu + gdb + disabling kernel aslr, etc.

jerrinot · 4 hours ago
I have no practical experience with bpftrace, so it did not occur to me. I'll give it a try and perhaps there's gonna be a 2nd part of this investigation.
jerrinot commented on A kernel bug froze my machine: Debugging an async-profiler deadlock   questdb.com/blog/async-pr... · Posted by u/bluestreak
everlier · 13 hours ago
I'm glad to hear I'm not alone. Due to the nature of what I do, I'm often accumulating ~800-900GB of Docker images and volumes on my machine, sometimes running 20-30 containers at once starting/stopping them concurrently. Somehow, very rarely, but still quite often (once every couple of weeks) - it leads to a complete deadlock somewhere inside of the kernel due to some crazy race condition that I'm absolutely in no way able to reliably reproduce.
jerrinot · 13 hours ago
It's much tougher when it's so hard to reproduce. Perhaps the NMI watchdog could help? https://docs.kernel.org/admin-guide/lockup-watchdogs.html
jerrinot commented on A kernel bug froze my machine: Debugging an async-profiler deadlock   questdb.com/blog/async-pr... · Posted by u/bluestreak
ChuckMcM · 14 hours ago
Question, isn't this a bug? static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer) { - if (event->state != PERF_EVENT_STATE_ACTIVE) + if (event->state != PERF_EVENT_STATE_ACTIVE || + event->hw.state & PERF_HES_STOPPED) return HRTIMER_NORESTART;

The bug being that the precedence of || is higher than the precedence of != ? Consider writing it if ((event->state != PERF_EVENT_STATE_ACTIVE) || (event->hw_state & PERF_HES_STOPPED))

This coming from a person who has too many scars from not parenthesizing my expressions in conditionals to ensure they work the way I meant them to work.

jerrinot · 14 hours ago
Wow, someone is actually reading the article in detail, that's a good feeling! In C, the != operator has higher precedence than the || operator. That said, extra parentheses never hurt readability.
jerrinot commented on A kernel bug froze my machine: Debugging an async-profiler deadlock   questdb.com/blog/async-pr... · Posted by u/bluestreak
SerCe · 14 hours ago
Great article! Just yesterday I watched a Devoxx talk by Andrei Pangin [1], the creator of async-profiler where I learned about the new heatmap support. To many folks it might not sound that exciting, until you realise that these heatmaps make it much easier to see patterns over time. If you’re interested there’s a solid blog post [2] from Netflix that walks through the format and why it can be incredibly useful.

[1]: https://www.youtube.com/watch?v=u7-S-Hn-7Do

[2]: https://netflixtechblog.com/netflix-flamescope-a57ca19d47bb

jerrinot · 14 hours ago
Thanks for the kind words!

Heatmaps are amazing for pattern spotting. I also use them when hunting irregular hiccups or outliers. More people should know about this feature.

jerrinot commented on A kernel bug froze my machine: Debugging an async-profiler deadlock   questdb.com/blog/async-pr... · Posted by u/bluestreak
jerrinot · 15 hours ago
Author here. I've always been kernel-curious despite never having worked on one myself. Consider this either a collection of impractical party tricks or a hands-on way to get a feel for kernel internals.
jerrinot commented on IBM to acquire Confluent   confluent.io/blog/ibm-to-... · Posted by u/abd12
mliezun · 8 days ago
Maybe this whole thing it's because Snowflake acquired redpanda earlier this year: https://www.investors.com/news/technology/snowflake-stock-re...
jerrinot · 8 days ago
Snowflake did not acquire RP after all.
jerrinot commented on From Rust to reality: The hidden journey of fetch_max   questdb.com/blog/rust-fet... · Posted by u/bluestreak
delifue · 3 months ago
When reading I expected it to mention that each thread maintain thread local max and periodically sync to a global atomic can improve performance
jerrinot · 3 months ago
I expect candidates to suggest similar optimisations, but I felt it was unnecessary for the article itself.
jerrinot commented on From Rust to reality: The hidden journey of fetch_max   questdb.com/blog/rust-fet... · Posted by u/bluestreak
orlp · 3 months ago
Aarch64 does indeed have a proper atomic max, but even on x86-64 you can get a wait-free atomic max as long as you only need to support integers up to 64. In that case you can simply do a `lock or` with 1 << i as your maximum. You can even support larger sizes by using multiple registers, e.g. four 64-bit registers for a u8 maximum.

In most cases it's even better to just store a maximum per thread separately and loop over all threads once to compute the current maximum if you really need it.

jerrinot · 3 months ago
That’s a neat trick, albeit with limited applicability given the very narrow range. Thanks for sharing!
jerrinot commented on From Rust to reality: The hidden journey of fetch_max   questdb.com/blog/rust-fet... · Posted by u/bluestreak
yshui · 3 months ago
That's a cool find. I wonder if LLVM also does the other way around operation, where it pattern matches handwritten CAS loops and transform them into native ARM64 instructions.
jerrinot · 3 months ago
That's a very good question. A proper compiler engineer would know, but I will do my best to find something and report back.

Edit: I could not find any pass with a pattern matching to replace CAS loops. The closest thing I could find is this pass: https://github.com/llvm/llvm-project/blob/06fb26c3a4ede66755... I reckon one could write a similar pass to recognize CAS idioms, but its usefulness would be probably rather limited and not worth the effort/risks.

u/jerrinot

KarmaCake day269October 8, 2017View Original