Readit News logoReadit News
riquito · 2 years ago
My go to reference when I want to reduce rust binary size is the excellent https://github.com/johnthagen/min-sized-rust, a set of guidelines on how to reduce size with explanations of the consequences
pdimitar · 2 years ago
It's really a shame that Rust includes the stdlib piecemeal in binary form, debug symbols and all, in every final binary.

I do love Rust but binary sizes have always annoyed me greatly and I always had this nagging feeling that part of all programmers don't take Rust seriously because of it. And I actually have witnessed, several times in the last 2-ish years, older-school programmers berating and ignoring Rust on that basis alone (so the author is quite right to call this out as a factor).

Looking at the https://github.com/johnthagen/min-sized-rust repo, final binary size of 51 KB when compilation / linking / stripping takes stdlib into account (and not just blindly copy-pasting the 4MB binary blob) is acceptable and much more reasonable. I wouldn't care for further micro-optimizations e.g. going to 20KB or even 5KB (further down the README file).

I also don't use nightly in my Rust work so I guess I'll have to wait several more years. :(

nindalf · 2 years ago
> so I guess I'll have to wait several more years

A feature that's landed in Rust nightly will be part of the next beta release (at most 6 weeks away) and then the following full release (exactly 6 weeks away).

For this feature in particular, rustbot added a tag of 1.77.0, which is releasing on 21st March 2024, less than 2 months away.

It's possible you've confused this with more complex features that stay on nightly for a long while they are tested. This is not one of those features.

WhyNotHugo · 2 years ago
Some features remain in nightly for years.

A very relevant example is `build-std`, which builds the standard library (and does LTO) instead of copying a pre-built one. This feature has been on nightly for at least a couple of years.

sestep · 2 years ago
Are you sure? If so then this is awesome news, but I'm a bit confused; the commit in that min-sized-rust repo adding `build-std` to the README was merged in August 2021: https://github.com/johnthagen/min-sized-rust/pull/30

Are you saying that the feature still hadn't "landed in Rust nightly" until recently? If so then what's the difference between a feature just being available in Rust nightly, vs having "landed"?

The Cargo doc page on `build-std` says that it is an "unstable feature" which will only eventually end up in stable once it is stabilized: https://doc.rust-lang.org/cargo/reference/unstable.html#buil... and Kobzol's linked post above indicates that `build-std` is "sadly still unstable."

pdimitar · 2 years ago
I indeed am confusing it with those features that seem to be in nightly forever. Thank you, compiling with beta is much more acceptable for me. (EDIT: or simply waiting for stable 1.77)
lifthrasiir · 2 years ago
But why? I mean, I'm also obsessed of the byte size from time to time and I do often optimize for the size when it's doable, but in practice anything below 1 MB seems small enough that you don't need to optimize further. There are so many low-hanging fruits when it comes to the Rust binary size...
gred · 2 years ago
I don't have a horse in this race, but a programming language that prides itself on "zero-cost abstractions" and then generates a Hello World program that is 90% wasted space doesn't leave a great impression.
0x000xca0xfe · 2 years ago
I work in embedded software and we fight for every megabyte of NAND flash. Multiple smaller executables add up, too.

Reducing the binary size is usually more important than performance (and often more important than memory safety, if we are totally honest...).

Personally, I hate bloated software just as much as slow software. It's crazy what you can do with 64 kilobytes, let the demoscene blow your mind: https://www.youtube.com/watch?v=ZfuierUvx1A

Would be awesome if compilers could do the same :)

zer00eyz · 2 years ago
Out in javascript land no one cares about file size or wire time any more (but should) --

Over in C, plenty of people work on things where file size matters. It is a big deal. System constraints (embedded) wire constraints (far away systems where internet is slow and ephemeral), legacy systems where everything is going to be slow even moving that fat file around and you have to be present to update (ATM, ticket, vending machines)... The list goes on.

So it may NOT matter for the desk top, or mobile or servers, but that's a tiny fraction of the computing out there.

bipson · 2 years ago
Depends on what you're aiming for.

If we talk about ultra-low-power platforms, e.g. energy-harvesting IoT devices, 1MB is still quite a lot.

If we are going to argue that Rust can compete with C/C++, it needs to have similar performance, also regarding binary size.

pdimitar · 2 years ago
Mostly because CPU caches matter a lot. Executing 7-8 different Rust binaries, in a loop, that are each north of 10MB+ is bound to be less performant than corresponding smaller programs.

Never had the time or dedication to actually verify this but I've been bitten by programs and OS-es that trash the cache too much and I've seen humanly perceivable lags because of it. But maybe in this case I am overreacting.

pjmlp · 2 years ago
That won't fit into a ESP32, Arduino or embedded hardware with similar restrictions, leaving the space for C and C++, or the Basic, Pascal compiler vendors still enjoying that space like Mikroe.
JohnFen · 2 years ago
> in practice anything below 1 MB seems small enough that you don't need to optimize further.

I think that it depends on what sort of machine you're aiming for the binary to run on. I develop for a few platforms where a 1MB executable size would be completely unacceptable.

formerly_proven · 2 years ago
Adds up with larger applications using a bunch of distinct binaries or many libraries.
josephg · 2 years ago
> (and not just blindly copy-pasting the 4MB binary blob)

Where is this 4mb claim coming from? I just built a hello world on macos in release mode, with no special flags and the result was 400kb. Thats absolutely larger that it should be, but its a lot smaller than 4mb.

Is it really that much worse on linux?

ckok · 2 years ago
Macho executables don't contain debug info. Just references to .o files.
lifthrasiir · 2 years ago
You are looking at the aftermath. Also, you could have manually put `strip = "debuginfo"` to avoid the 4 MB binary even before that; the issue is about the default when no `strip` option was given.
weinzierl · 2 years ago
I was wondering if dead code elimination also removes the debug symbols related to the dead code?

The ~ 4 MiB of debug symbols the article talks about are for the whole libstd or just the portions that actually ended up in the binary?

I think the new default makes sense but I'd love to have the option to build a lean but debuggable release binary with just the needed symbols.

kobzol · 2 years ago
You can't really use dead code elimination on debug symbols, because you don't know which symbols will you need. You would need to know where and how will your program crash or if the user will want to use a debugger on it.
weinzierl · 2 years ago
I don't understand. If a function isn't in the binary I will never need its name, so it does never have to be in the debug information. Likewise for variables.
anonymoushn · 2 years ago
So what's in the remaining huge 415KB hello world program? Can we get rid of 90% of it a couple more times?
lifthrasiir · 2 years ago
Buffered I/O and synchronization primitives to avoid corrupting the buffer. The underlying vector implementation and memory allocator. Panic support and backtrace printer, which includes a Rust-specific name demangler and a not-so-small DWARF parser in Linux. Path support because backtraces would include source file paths. Zlib-compatible decompressor because ELF allows compressed sections (!). Then you have several formatters that are often pretty large (e.g. f32 and f64 would add at least 30 KB of binary, and some depend on Unicode grapheme clusters). They are all essential for edge cases that can happen even for such a simple program.
owl57 · 2 years ago
> several formatters that are often pretty large (e.g. f32 and f64 would add at least 30 KB of binary …)

I know floats are full of scary corner cases, but… Assuming tens of bytes per an if statement, hundreds of corner cases just in formatting? Is it really that bad?

temac · 2 years ago
Why does it need to parse elf ?
riquito · 2 years ago
32Kb with the following configuration, on a linux target. It's not the default, you're using nightly, it's more complex to build and there are tradeoffs. You can even go less than that if that's your thing

  [profile.release]
  strip = true
  opt-level = "z"
  lto = true
  codegen-units = 1
  panic = "abort"
cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-unknown-linux-gnu --release

As mentioned in another thread, I've simply followed https://github.com/johnthagen/min-sized-rust

flohofwoe · 2 years ago
These look like sensible defaults to me for release mode. What are the tradeoffs?
kobzol · 2 years ago
Mostly the formatting machinery from stdlib. You can go sub 100 KiB or even less without it, but unless you target embedded, there's really a reason to do that, IMO.
lionkor · 2 years ago
Initial binary size, like of a hello world, is a great indicator of how much abstraction the language has that youre also paying for.

For example, java has to set up a constants pool, parse it from the classfile, run init and cinit, resolve references, allocatea a frame, and so on, just to get to the entrypoint.

Compared to C without stdlib which needs to basically just run a few bytes to syscall write().

There are obviously massive advantages to Java for which these steps are needed, but you do pay for it.

frfl · 2 years ago
I'm not trying to be rude, but what does your comment have to do with the actual article linked. I suspect you're commenting based solely on the title? If you read the article it mentions 90% reduction if you remove the debug symbols from the rust std lib that get bundled in the final hello world binary. A conversation about Java's size and abstraction is quite irrelevant to the article.
kelnos · 2 years ago
The article talks about how "hello world" binary size can be part of the first impression someone new to your language has. Java is a nice illustration of that. Rust probably isn't, because many of Rust's abstractions are zero-cost, so it's not as obvious a comparison.
lionkor · 2 years ago
My comment is pretty unrelated to the content of the article, but more has to do with what others said here in the comments
nindalf · 2 years ago
It’s funny, because your claim that it’s a great indicator is contradicted by the article we’re discussing. Rust was widely considered to be “only pay for what you use” despite having 4MB hello world programs. And now that it’s 400KB that continues to be true. So in this case, 4MB wasn’t a good indicator was it?
weinzierl · 2 years ago
It's funny that it was already considered be “only pay for what you use” when the binaries contained half of H. P. Lovecraft's œuvre.

See issue #13871

https://github.com/rust-lang/rust/issues/13871

I exaggerate, point is that we've come a long way and are still getting better. Different people look at different metrics and the more popular the language becomes the bigger the variety of metrics the come into focus.

lionkor · 2 years ago
Its still a great indicator, its just that indicators dont mean very much sometimes.
rpigab · 2 years ago
Out of curiosity, I looked at a basic hello world in C, and compiled it naively using gcc on Ubuntu, by default the executable produced was 15960 bytes, but when adding the -Os option (from the man page: "-Os Optimize for size") to enable many optimisation flags including code size, I was able to reduce it to 15968.

  #include <stdio.h>
  int main() { printf("Hello, World!"); return 0;}
Of course, If I had read more than 3 words in the man page, the answer was easy to understand: "Os Optimize for size. -Os enables all -O2 optimizations except those that often increase code size", so you can't really get lower size than the default using only the optimization system, there's also "-finline-functions" included in -Os, but it won't help you in a Hello World.

lifthrasiir · 2 years ago
This program is not same to the Rust version. A more faithful version, assuming glibc, would look like this:

    #include <stdio.h>
    #include <stdlib.h>
    #include <execinfo.h>
    #include <errno.h>
    
    void print_backtrace(void) {
        void *traces[50];
        char **symbols;
        int num_traces, i;
    
        num_traces = backtrace(traces, sizeof(traces) / sizeof(*traces));
        strings = backtrace_symbols(traces, num_traces);
        if (!strings) return;
        for (i = 0; i < num_traces; ++i) {
            fprintf("%d: %s\n", i + 1, strings[i]);
        }
        free(strings);
    }
    
    int main(void) {
        static const char FMT[] = "Hello, World!\n";
        static int EXPECTED = (int) (sizeof(FMT) - 1);
        int ret = printf(FMT);
        if (ret < 0) {
            fprintf(stderr, "printf failed: %s\n", strerror(errno));
            print_backtrace();
            return 1;
        }
        if (ret != EXPECTED) {
            fprintf(stderr, "printf failed: only %d characters were written\n", ret);
            print_backtrace();
            return 1;
        }
        return 0;
    }
While this is still substantially different (for example, Rust's I/O buffering is different from C), this should be enough to demonstrate that this comparison is very unfair.

rfoo · 2 years ago

    $ gcc -Os a.c && ls -al a.out && size
    -rwxr-xr-x 1 user user 15952 Jan 24 17:49 a.out
       text    data     bss     dec     hex filename
       1316     584       8    1908     774 a.out
    $ gcc -s -Os a.c && ls -al a.out && size
    -rwxr-xr-x 1 user user 14472 Jan 24 17:50 a.out
       text    data     bss     dec     hex filename
       1316     584       8    1908     774 a.out
    $ gcc -s -Os -fuse-ld=lld a.c && ls -al a.out && size
    -rwxr-xr-x 1 user user 4552 Jan 24 17:50 a.out
       text    data     bss     dec     hex filename
       1199     528       1    1728     6c0 a.out

flohofwoe · 2 years ago

    zig cc -Os -target x86_64-linux-musl hello.c -o hello
...which basically calls Clang under the hood, but comes with out-of-the-box cross-compilation support for Linux and MUSL creates a 5136 bytes executable.

ragnese · 2 years ago
Debug symbols increase the binary size and don't really have anything to do with the abstraction level of the language. My understanding is that you also don't really "pay for" the debug symbols at runtime (until you generate a stack trace and need them). There could be some nuance that I'm missing there.
edflsafoiewq · 2 years ago
More abstraction generally means more functions which means more debug symbols. For example, iterating over an array in Rust involves iterators, slices, options. There are lots of function calls, which all contribute debug info. In C, iterating over an array is a for loop and pointer arithmetic, no function calls, so much less debug info.
raggi · 2 years ago
It'd be nice if the default was external symbols rather than none at all
berkes · 2 years ago
I'd find that a bit counterintuitive. To me, --release is an alternative for --debug. Debug implies debug stuff is in there, therefore the alternative removes it.

But I do find it useful now and then to have debug symbols in production, so that monitoring and telemetry gets some context. But to me, it's rather logical that I need to add this to the build.

I would, however, love it when it's trivial to get these symbols set up external.

raggi · 2 years ago
GNU binutils and LLVM both have support for debuginfod server topologies now, it would be great to see Rust tools get on the train.

https://github.com/llvm/llvm-project/commit/36f01909a0e29c10...

And naturally we already have a port: https://crates.io/crates/debuginfod-rs

LAC-Tech · 2 years ago
It's great to see rust maturing. That people care about binary size and work to improve it is a good sign.
flohofwoe · 2 years ago
It's not a great sign that this bug existed for 7 years, people noticed, asked about it, wrote tickets, everybody agreed that it should be fixed but then nobody took the time to actually do it (and the eventual fix is a rather crude hack by just stripping the output binary instead of preventing that debug symbols slip into a release binary in the first place).

Looks more like a systemic issue in the Rust development process to me tbh.

What's more shocking though is that even after a 90% size reduction, a vanilla hello world is still 415 KBytes. That's about 10..100x bigger than I would expect from a low level "systems programming language".

lifthrasiir · 2 years ago
A vast majority of that 415 KB is due to the backtrace support, which is amazingly complicated. (See my other comment for specifics.) It will depend on some more machinaries from std, and a "simple" hello world can always panic if stdout is closed or so, therefore that part of std cannot be easily removed unless you are fine with useless backtraces.

Also, Rust has no direct platform support unlike C. So everything has to be statically linked to be portable. A statically linked glibc is indeed much larger than that (~800 KB in my machine). Conversely, you can sacrifice portability and link to `libstd*.so` dynamically to get a very small binary (~17 KB in my machine, both for C and Rust).

alpaca128 · 2 years ago
> nobody took the time to actually do it

Because in the end almost nobody actually cares about it enough to create a fix.

Small binaries are great, but people care mainly about how fast it compiles and how fast it runs, and in the few cases where the binary size is important it was already possible to shrink it significantly (more than with this new change). In my entire life I have never heard a user complain about the size of the binary, what people really care about is efficiency/speed at runtime. That's why people regularly mention VS Code using Electron, but not that its installation package alone has >500MB.

shzhdbi09gv8ioi · 2 years ago
If you read the article you would know that this is just about changing defaults. You already could achieve this by editing your project's Cargo.toml. As is documented and discussed in several places over the years (google "shrink rust binary size").

The `strip` was added to rust nightly in 2020.

1: https://github.com/johnthagen/min-sized-rust

2: https://kerkour.com/optimize-rust-binary-size

3: https://rustrepo.com/repo/johnthagen-min-sized-rust

4: https://sing.stanford.edu/site/publications/rust-lctes22.pdf

5: https://arusahni.net/blog/2020/03/optimizing-rust-binary-siz...

zozbot234 · 2 years ago
> but then nobody took the time to actually do it

That happens all the time, it's called prioritizing. If you don't let people prioritize, they will burn out and leave the project. That's not what you want.

milianw · 2 years ago
I applaud this initiative. yet I wonder - the ideal situation would be to not throw away the debug info (ever), but rather it should be put into an external file. even Linux supports this for many years and nowadays we even have direct support with dwarf5/dwo. is trust taken any action towards that direction?

after all: without debug info (which you do not want to ship to customers necessarily), you cannot do profiling or debugging in any meaningful way...

lifthrasiir · 2 years ago
Cargo does support `split-debuginfo` [1] but it still has some rough edges in my experience. I do think that is the ultimate way to go in the future.

[1] https://doc.rust-lang.org/cargo/reference/profiles.html#spli...

emi2k01 · 2 years ago
Splitting the debug info from the executable is supported on the 3-big-OSes but only enabled by default on Windows (and maybe macOS?)
goku12 · 2 years ago
Many Linux distributions have debuginfod servers that supply split debugging symbols on demand. So one could argue that it's implemented by default on Linux too.