This makes things so, so, so much easier. Otherwise, a lot of effort has to built into creating an unwinder in ebpf code, essentially porting .eh_frame cfa/ra/bp calculations.
They claim to have event profilers for non-native languages (e.g. python). Does this mean that they use something similar to https://github.com/benfred/py-spy ? Otherwise, it's not obvious to me how they can read python state.
Lastly, the github repo https://github.com/facebookincubator/strobelight is pretty barebones. Wonder when they'll update it
1) native unwinding: https://www.polarsignals.com/blog/posts/2022/11/29/dwarf-bas...
2) python: https://www.polarsignals.com/blog/posts/2023/10/04/profiling...
Both available as part of the Parca open source project.
(Disclaimer I work on Parca and am the founder of Polar Signals)
I have multiple questions if you don’t mind answering them:
Is there significant overhead to native unwinding and python in ebpf? EBPF needs to constantly read & copy from user space to read data structures.
I ask this because unwinding with frame pointers can be done by reading without copying in userland.
Python can be ran with different engines (cpython, pypy, etc) and versions (3.7, 3.8,…) and compilers can reorganize offsets. Reading from offsets in seems me to be handwavy. Does this work well in practice/when did it fail?
Overhead ultimately depends on the frequency, it defaults to 19hz per core, at which it’s less than 1%, which is tried and tested with all sorts of super heavy python, JVM, rust, etc. workloads. Since it’s per core it tends to be plenty of stacks to build statistical significance quickly. The profiler is essentially a thread-per-core model, which certainly helps for perf.
The offset approach has evolved a bit, it’s mixed with some disassembling today, with that combination it’s rock solid. It is dependent on the engine, and in the case of python only support cpython today.