brancz (u/brancz) - Readit News

brancz commented on Strobelight: A profiling service built on open source technology engineering.fb.com/2025/0... · Posted by u/birdculture

maknee · 6 months ago

Thanks! Those blogs are incredibly useful. Nice work on the profiler. :)

I have multiple questions if you don’t mind answering them:

Is there significant overhead to native unwinding and python in ebpf? EBPF needs to constantly read & copy from user space to read data structures.

I ask this because unwinding with frame pointers can be done by reading without copying in userland.

Python can be ran with different engines (cpython, pypy, etc) and versions (3.7, 3.8,…) and compilers can reorganize offsets. Reading from offsets in seems me to be handwavy. Does this work well in practice/when did it fail?

brancz · 6 months ago

Thank you!

Overhead ultimately depends on the frequency, it defaults to 19hz per core, at which it’s less than 1%, which is tried and tested with all sorts of super heavy python, JVM, rust, etc. workloads. Since it’s per core it tends to be plenty of stacks to build statistical significance quickly. The profiler is essentially a thread-per-core model, which certainly helps for perf.

The offset approach has evolved a bit, it’s mixed with some disassembling today, with that combination it’s rock solid. It is dependent on the engine, and in the case of python only support cpython today.

brancz commented on Strobelight: A profiling service built on open source technology engineering.fb.com/2025/0... · Posted by u/birdculture

maknee · 6 months ago

"All of this is made possible with the inclusion of frame pointers in all of Meta’s user space binaries, otherwise we couldn’t walk the stack to get all these addresses (or we’d have to do some other complicated/expensive thing which wouldn’t be as efficient)"

This makes things so, so, so much easier. Otherwise, a lot of effort has to built into creating an unwinder in ebpf code, essentially porting .eh_frame cfa/ra/bp calculations.

They claim to have event profilers for non-native languages (e.g. python). Does this mean that they use something similar to https://github.com/benfred/py-spy ? Otherwise, it's not obvious to me how they can read python state.

Lastly, the github repo https://github.com/facebookincubator/strobelight is pretty barebones. Wonder when they'll update it

brancz · 6 months ago

Already been done:

1) native unwinding: https://www.polarsignals.com/blog/posts/2022/11/29/dwarf-bas...

2) python: https://www.polarsignals.com/blog/posts/2023/10/04/profiling...

Both available as part of the Parca open source project.

https://www.parca.dev/

(Disclaimer I work on Parca and am the founder of Polar Signals)

brancz commented on Strobelight: A profiling service built on open source technology engineering.fb.com/2025/0... · Posted by u/birdculture

brancz · 6 months ago

We’re working hard to bring a lot of Strobelight to everyone through Parca[0] as OSS and Polar Signals[1] as the commercial version. Some parts already exists much to come this year! :)

[0] https://www.parca.dev/

[1] https://www.polarsignals.com/

(Disclaimer: founder of polar signals)

brancz commented on Rust: Doubling Throughput with Continuous Profiling and Optimization polarsignals.com/blog/pos... · Posted by u/mesto1

fbergen · 6 months ago

Shouldn't "long term snapshot for profiling and inverting call stacks" be table stakes by now?

brancz · 6 months ago

The point of the first one is that you can create snapshots from within the product where profiling data isn't forever. This is so you can use the pprof.me link in a GitHub issue, PR, or elsewhere and trust that the data never goes away even if the original data went out of retention. We actually originally built pprof.me out of frustration that users of Prometheus (several of us are Prometheus maintainers) at best submitted screenshots of profiling data when all we wanted was an easy way to explore it.

I agree that neither of these are terribly complicated features, but as far as I know no other product on the market actually has this combination. (yes, you can export data from most systems and use a different visualization tool but the point of products is to provide a single integrated package)

(disclaimer: Founder of the company that offers the product featured in this case study.)

brancz commented on Rust: Doubling Throughput with Continuous Profiling and Optimization polarsignals.com/blog/pos... · Posted by u/mesto1

zigzag312 · 6 months ago

I thought it's going to be about some form of PGO.

brancz · 6 months ago

Keep an eye out on our blog we're working on some interesting things in this area!

brancz commented on Rust: Doubling Throughput with Continuous Profiling and Optimization polarsignals.com/blog/pos... · Posted by u/mesto1

diath · 6 months ago

This is such a weird article, not only it seems like it's the company itself trying to glaze themselves in third person, but claiming that it's a single line of change when they're swapping out a library feature is just disingenuous; you also don't need sophisticated tooling to find an unoptimized hotspot, perf record would suffice.

brancz · 6 months ago

I mentioned this on another thread as well, but the point isn't that perf can't catch something like this, but it's that having continuous profiling set up makes it way easier to make profiling data an everyday part of your development process, and ultimately nothing behaves quite like production. This allows things that gradually sneak into the codebase to be easily spotted since you don't have to go through the whole shebang of getting representative production profiling data over time, because you just always have it available. Continuous profiling also allows you to spot intermittent things easier and so on.

(disclaimer: I'm the founder of the product showcased in this case study.)

brancz commented on Rust: Doubling Throughput with Continuous Profiling and Optimization polarsignals.com/blog/pos... · Posted by u/mesto1

josephg · 6 months ago

I don’t really see what “continuous profiling and optimisation” has to do with your code change here. 68% of your cpu time is spent computing SHA256? That’ll pop out as a hotspot using any competent profiling tool. It doesn’t matter what tool you use, or what approach. It just matters that you look, at all.

I’m not saying your special profiling tools are bad. But I can’t tell what’s special about them when they’re simply doing something you could do using literally any profiler.

brancz · 6 months ago

Nobody is saying that a regular profiling tool can't detect it. However, it's one of those things that if you don't profile it regularly then these are easy things to miss. With a continuous profiler set up you skip everything regarding collection of the right data (and you can see aggregates across time which in the second example of the blog post was important because the memory allocations aren't necessarily seen in a 10s profile). It makes the barrier for including profiling data in your everyday software engineering way easier.

You can capture memory or CPU metrics with top as well, and that's useful, but it's not the same thing as a full-blown metrics system eg. Prometheus.

brancz commented on Rust: Doubling Throughput with Continuous Profiling and Optimization polarsignals.com/blog/pos... · Posted by u/mesto1

Blackarea · 6 months ago

I am in the midst of writing a little toy axum project for my pi and also want to measure cpu/mem performance for that. In jvm land I used Prometheus for such tasks. Is there a straight forward tool/crate that anyone can recommend for this?

brancz · 6 months ago

Prometheus gives you CPU/memory total metrics, the profiler used in the article gets you much higher resolution: down to the line number of your source code.

If you're looking to optimize your project I would recommend using a profiler rather than metrics that just tell you the total.

(disclaimer: I'm the founder of the company that offers the product shown in the blog post, but I also happen to be a Prometheus maintainer)

brancz commented on Rust: Doubling Throughput with Continuous Profiling and Optimization polarsignals.com/blog/pos... · Posted by u/mesto1

mesto1 · 6 months ago

It would have been nice to see that one line change, it's not really clear to me what it would be.

brancz · 6 months ago

Thank you for the feedback! Quickly worked with the S2 team to get the screenshot from the change added (it's just enabling the hardware acceleration feature in the sha2 crate)!

brancz commented on Show HN: Perforator – cluster-wide profiling tool for large data centers github.com/yandex/perfora... · Posted by u/BigRedEye

BigRedEye · 7 months ago

Yes. Although we are studying CSSPO, which uses a mixed (LBR + software-sampled stacks) approach.

brancz · 7 months ago

I'm familiar with the paper, but it doesn't improve the situation in terms of LBR availability on cloud providers, does it?