I don’t really see what “continuous profiling and optimisation” has to do with your code change here. 68% of your cpu time is spent computing SHA256? That’ll pop out as a hotspot using any competent profiling tool. It doesn’t matter what tool you use, or what approach. It just matters that you look, at all.
I’m not saying your special profiling tools are bad. But I can’t tell what’s special about them when they’re simply doing something you could do using literally any profiler.
Nobody is saying that a regular profiling tool can't detect it. However, it's one of those things that if you don't profile it regularly then these are easy things to miss. With a continuous profiler set up you skip everything regarding collection of the right data (and you can see aggregates across time which in the second example of the blog post was important because the memory allocations aren't necessarily seen in a 10s profile). It makes the barrier for including profiling data in your everyday software engineering way easier.
You can capture memory or CPU metrics with top as well, and that's useful, but it's not the same thing as a full-blown metrics system eg. Prometheus.
How do you not profile something? It is right there on the call stack millions of times. Do you explicitly filter it? Do you forego the idea of profiling when something slows down the system?
This is a good demonstration that autodetection-based CPU acceleration should be on by default.
Also, it looks like several of your projects use zlib. You might take a look at zlib-rs, which is at this point faster than zlib-ng and easier to make use of from Rust. It's available as a flate2 backend. (Note that the folks working on zlib-rs are looking for funding for its development at the moment, if you're interested.)
I am in the midst of writing a little toy axum project for my pi and also want to measure cpu/mem performance for that. In jvm land I used Prometheus for such tasks. Is there a straight forward tool/crate that anyone can recommend for this?
Prometheus gives you CPU/memory total metrics, the profiler used in the article gets you much higher resolution: down to the line number of your source code.
If you're looking to optimize your project I would recommend using a profiler rather than metrics that just tell you the total.
(disclaimer: I'm the founder of the company that offers the product shown in the blog post, but I also happen to be a Prometheus maintainer)
Yeah my comment wasn't that well thought through. Obviously if I want to find bottle necks in my code profiling is what I want. If I want to monitor actual load and peaks in prod I wanna go for some kind of monitoring. I have good experiences with Prometheus, but I went with sysinfo crate. Which is probably lightyears more basic/big picture. I love that I can just record my own metrics log them to csv and summarize peaks for live view exposed through axum in html. It's neat for my toy project which doesn't aim for industry standard.
I'm not sure if this is what you are looking for but I found https://github.com/wolfpld/tracy to work rather well. There is an integration for the tracing crate that can get you very far: https://lib.rs/crates/tracing-tracy. If you're just looking for a very high level report then this might be a bit too much detail.
> Depending on the configuration Tracy may broadcast discovery packets to the local network and expose the data it collects in the background to that same network. Traces collected by Tracy may include source and assembly code as well.
I'm more curious on how much manual effort was required to find said line of code? Strikes me as moest of these optimizations are super easy to verify, very difficult to find
(S2 dev) I think it took a bit of time to figure out what was going on as it was more of a game of enabling a feature in the sha2 crate since the profile showed us that it was using `soft` while we needed the hardware optimized. We thought being on neoverse-v1 would automatically detect to use hardware optimization, but that wasn't the case and we ended up looking at the sha2 crate closely only to figure that enabling the asm feature fixes it!
Cool, makes you wonder how many more of these oneliners are scattered across the codebase (in application and also OS etc).
Sidenote, I wonder how close we are to detect these automatically. From profiling many various applications, it feels like that should be a tractable problem.
This is such a weird article, not only it seems like it's the company itself trying to glaze themselves in third person, but claiming that it's a single line of change when they're swapping out a library feature is just disingenuous; you also don't need sophisticated tooling to find an unoptimized hotspot, perf record would suffice.
I mentioned this on another thread as well, but the point isn't that perf can't catch something like this, but it's that having continuous profiling set up makes it way easier to make profiling data an everyday part of your development process, and ultimately nothing behaves quite like production. This allows things that gradually sneak into the codebase to be easily spotted since you don't have to go through the whole shebang of getting representative production profiling data over time, because you just always have it available. Continuous profiling also allows you to spot intermittent things easier and so on.
(disclaimer: I'm the founder of the product showcased in this case study.)
hi, S2 dev here. I found that if you explicitly set the algorithm for the checksum (crc32c) aws SDK would ignore the provided checksum and we were doing both. I also found a related issue https://github.com/awslabs/aws-sdk-rust/issues/1103
In my experience (working on a heavily checksummed system) if you don't carefully think about where you're actually required to take checksums for data integrity, it's very easy to end up with wasteful checksums with every library/system/etc., taking a checksum "just in case", and as a bonus you might miss some important checksums that aren't obvious when transforming data.
I’m not saying your special profiling tools are bad. But I can’t tell what’s special about them when they’re simply doing something you could do using literally any profiler.
You can capture memory or CPU metrics with top as well, and that's useful, but it's not the same thing as a full-blown metrics system eg. Prometheus.
Also, it looks like several of your projects use zlib. You might take a look at zlib-rs, which is at this point faster than zlib-ng and easier to make use of from Rust. It's available as a flate2 backend. (Note that the folks working on zlib-rs are looking for funding for its development at the moment, if you're interested.)
If you're looking to optimize your project I would recommend using a profiler rather than metrics that just tell you the total.
(disclaimer: I'm the founder of the company that offers the product shown in the blog post, but I also happen to be a Prometheus maintainer)
> Depending on the configuration Tracy may broadcast discovery packets to the local network and expose the data it collects in the background to that same network. Traces collected by Tracy may include source and assembly code as well.
Sidenote, I wonder how close we are to detect these automatically. From profiling many various applications, it feels like that should be a tractable problem.
(disclaimer: I'm the founder of the product showcased in this case study.)
I'd be interested in hearing what you changed to work around this.
Deleted Comment