Rust: Doubling Throughput with Continuous Profiling and Optimization

I don’t really see what “continuous profiling and optimisation” has to do with your code change here. 68% of your cpu time is spent computing SHA256? That’ll pop out as a hotspot using any competent profiling tool. It doesn’t matter what tool you use, or what approach. It just matters that you look, at all.

I’m not saying your special profiling tools are bad. But I can’t tell what’s special about them when they’re simply doing something you could do using literally any profiler.

brancz · 10 months ago

Nobody is saying that a regular profiling tool can't detect it. However, it's one of those things that if you don't profile it regularly then these are easy things to miss. With a continuous profiler set up you skip everything regarding collection of the right data (and you can see aggregates across time which in the second example of the blog post was important because the memory allocations aren't necessarily seen in a 10s profile). It makes the barrier for including profiling data in your everyday software engineering way easier.

You can capture memory or CPU metrics with top as well, and that's useful, but it's not the same thing as a full-blown metrics system eg. Prometheus.

nurettin · 10 months ago

How do you not profile something? It is right there on the call stack millions of times. Do you explicitly filter it? Do you forego the idea of profiling when something slows down the system?

nottorp · 10 months ago

But this is an ad for their special profiling tools, not something intended to teach you about profiling your code.

secondcoming · 10 months ago

This article is basically an advert.

I am in the midst of writing a little toy axum project for my pi and also want to measure cpu/mem performance for that. In jvm land I used Prometheus for such tasks. Is there a straight forward tool/crate that anyone can recommend for this?

brancz · 10 months ago

Prometheus gives you CPU/memory total metrics, the profiler used in the article gets you much higher resolution: down to the line number of your source code.

If you're looking to optimize your project I would recommend using a profiler rather than metrics that just tell you the total.

(disclaimer: I'm the founder of the company that offers the product shown in the blog post, but I also happen to be a Prometheus maintainer)

Blackarea · 10 months ago

Yeah my comment wasn't that well thought through. Obviously if I want to find bottle necks in my code profiling is what I want. If I want to monitor actual load and peaks in prod I wanna go for some kind of monitoring. I have good experiences with Prometheus, but I went with sysinfo crate. Which is probably lightyears more basic/big picture. I love that I can just record my own metrics log them to csv and summarize peaks for live view exposed through axum in html. It's neat for my toy project which doesn't aim for industry standard.

elevader · 10 months ago

I'm not sure if this is what you are looking for but I found https://github.com/wolfpld/tracy to work rather well. There is an integration for the tracing crate that can get you very far: https://lib.rs/crates/tracing-tracy. If you're just looking for a very high level report then this might be a bit too much detail.

yencabulator · 10 months ago

Wtf is this:

> Depending on the configuration Tracy may broadcast discovery packets to the local network and expose the data it collects in the background to that same network. Traces collected by Tracy may include source and assembly code as well.

Blackarea · 10 months ago

looks neat and added a bookmark, but it's a bit too much for my usecase right now. Thanks though :)

josephg · 10 months ago

JoshTriplett · 10 months ago

This is a good demonstration that autodetection-based CPU acceleration should be on by default.

Also, it looks like several of your projects use zlib. You might take a look at zlib-rs, which is at this point faster than zlib-ng and easier to make use of from Rust. It's available as a flate2 backend. (Note that the folks working on zlib-rs are looking for funding for its development at the moment, if you're interested.)

oguz-ismail · 10 months ago

What does a rewrite need funding for? Marketing?

martijnvds · 10 months ago

Living expenses for the devs?

Funding developer salaries.

jonstewart · 10 months ago

How is this not spam? It just repeats the name of this product over and over.

deagle50 · 10 months ago

And they refer to themselves in the third person, with a hero arc. It even has the gpt sections.

fbergen · 10 months ago

I'm more curious on how much manual effort was required to find said line of code? Strikes me as moest of these optimizations are super easy to verify, very difficult to find

infiniteregrets · 10 months ago

(S2 dev) I think it took a bit of time to figure out what was going on as it was more of a game of enabling a feature in the sha2 crate since the profile showed us that it was using `soft` while we needed the hardware optimized. We thought being on neoverse-v1 would automatically detect to use hardware optimization, but that wasn't the case and we ended up looking at the sha2 crate closely only to figure that enabling the asm feature fixes it!

shikhar · 10 months ago

We were looking at the main branch of the sha2 crate at first, where the 'asm' feature has been removed (but this is unreleased, will be in 0.11.x) https://github.com/RustCrypto/hashes/blob/master/sha2/CHANGE...

Cool, makes you wonder how many more of these oneliners are scattered across the codebase (in application and also OS etc).

Sidenote, I wonder how close we are to detect these automatically. From profiling many various applications, it feels like that should be a tractable problem.

diath · 10 months ago

This is such a weird article, not only it seems like it's the company itself trying to glaze themselves in third person, but claiming that it's a single line of change when they're swapping out a library feature is just disingenuous; you also don't need sophisticated tooling to find an unoptimized hotspot, perf record would suffice.

I mentioned this on another thread as well, but the point isn't that perf can't catch something like this, but it's that having continuous profiling set up makes it way easier to make profiling data an everyday part of your development process, and ultimately nothing behaves quite like production. This allows things that gradually sneak into the codebase to be easily spotted since you don't have to go through the whole shebang of getting representative production profiling data over time, because you just always have it available. Continuous profiling also allows you to spot intermittent things easier and so on.

(disclaimer: I'm the founder of the product showcased in this case study.)

> The AWS S3 Rust SDK was found to be unnecessarily recomputing CRC32C checksums.

I'd be interested in hearing what you changed to work around this.

hi, S2 dev here. I found that if you explicitly set the algorithm for the checksum (crc32c) aws SDK would ignore the provided checksum and we were doing both. I also found a related issue https://github.com/awslabs/aws-sdk-rust/issues/1103

foota · 10 months ago

In my experience (working on a heavily checksummed system) if you don't carefully think about where you're actually required to take checksums for data integrity, it's very easy to end up with wasteful checksums with every library/system/etc., taking a checksum "just in case", and as a bonus you might miss some important checksums that aren't obvious when transforming data.

Deleted Comment