It's easy to use, and pairs beautifully with the unintrusive perf tool, which makes the combination a joy to use.
And, if combined with a codebase opened in QtCreator, you can click on a hotspot in the flamegraph, and it will bring you automagically to the correct file and line in QtCreator, without any explicit linking required between the two programs. I discovered that feature accidentally, and the fact that it just worked seamlessly really impressed me. (Tested on a Debian-based Linux).
A big thanks to KDAB for making this tool available to us!
And to people using other IDE/editors - you can configure which one gets opened when you click on a source line from the hotspot settings. QtCreator is just the default (when that is installed).
but they all support gdb
I'm not debating that manual GDB sampling has its place and value. I'm debating that perf is "lying" or that it's impossible to get hold of off-CPU samples, or profiling of multithreaded code in general.
I think, firstly, that spending 15s trying the CTRL-c approach is a worthwhile tradeoff. If you don't find anything, then sure, spend another 30m - 60m setting up perf, KDAB, etc. Maybe more if you're on an embedded device.
Secondly, the author seems to say that he's used this on embedded devices with no output but a serial line for the debugger. This is also a 15s effort[1].
It's basically a very low effort task, takes seconds to determine if it worked or not, and if it doesn't work you've only lost a few seconds.
[1] I'm assuming that if you're developing on a device supporting a serial GDB connection, you've already got the debugger working.
Furthermore, note how your reasoning is quite different from what the website you linked to says - it basically says "there are no good tools" (which is untrue) whereas you are saying "manual GDB sampling might be good enough and is easier to setup than a good tool" (which is certainly true).
to GP: What you describe sounds like https://github.com/koute/not-perf to me
eu-stack -i -p $(pidof ...)
Thanks to debuginfod this will even give you good backtraces right away (at the cost of some initial delay to load the data from the web, consecutive runs are fast). If you get a "permission denied" error, you probably need to tweak kernel.yama.ptrace_scope=0they are easy to setup, work extremely efficient and can nowadays also catch off-cpu time. ctrl+c debugging finds you trivial stuff, but dismissing the real tools as unusable or inferior is simply ignorant.
> Unlike perf, gdb will give you a callstack even if the program was compiled without frame pointer support.
There is `--call-graph dwarf` and it exists since many years, and with `-z` it is pretty efficient and just works too - unless the stack gets too long, but even then it's good enough for profiling purposes...
> Also, sampling profilers are bad at tail latency - if something is usually fast but occasionally slow, you won’t be there to Ctrl-C it when it’s slow.
Very true, thankfully the flight recorder modes for profilers can help with that to a certain degree, with a potentially large sampling frequency.