The state of GNU/Linux and a case against shared object libraries

Distributions optimize for software in their repos. Software in their repos is compiled against libraries in their repos, so dynamic linking has no downsides and has the upside of reducing disk usage and runtime memory usage (sharing pages).

Your problem with "libFlac.8.so is missing" happens with using software not from the distro repos. Feel free to statically link it, or run it via AppImage or Flatpak or Podman or whatever you want that provides the environment it *was* compiled for. Whether the rest of the distro is dynamically linked or not makes no difference to your ability to run this software, so there's no reason to make the rest of the distro worse.

I personally do care about disk usage and memory usage. I also care about using software from distro repos vs Flatpak etc wherever possible, because software in the distro repos is maintained by someone whose values align with me and not the upstream software author. Eg the firefox package from distro repos enables me to load my own extensions without Mozilla's gatekeeping, the Audacity package from distro repos did not have telemetry enabled that Audacity devs added to their own builds, etc.

cbmuser · a year ago

The main argument for using shared libraries isn’t memory or disk usage, but simply security.

If you have a thousand packages linking statically against zlib, you will have to update a thousand packages in case of a vulnerability.

With a shared zlib, you will have to update only one package.

pizlonator · a year ago

That's also a good argument for shared libraries, but the memory usage argument is a big one. I have 583 processes running on my Linux box right now (I'm posting from FF running on GNOME on Pop!_OS), and it's nice that when they run the same code (like zlib but also lots of things that are bigger than zlib), they load the same shared library, which means they are using the same mapped file and so the physical memory is shared. It means that memory usage due to code scales with the amount of code (linear), not with the amount of processes times the amount of code (quadratic).

I think that having a sophisticated desktop OS where every process had distinct physical memory for duplicated uses of the same code would be problematic at scale, especially on systems with less RAM. At least that physical memory would still be disk-backed, but that only goes so far.

That said, the security argument is also a good argument!

Arnavion · a year ago

That's not a difference of security, only of download size (*). A statically-linking distro would queue up those thousand packages to be rebuilt and you would receive them via OS update, just as you would with a dynamically-linking distro. The difference is just in whether you have to download 1000 updated packages or 1.

And on a distribution like OpenSUSE TW, the automation is set up such that those thousand packages do get rebuilt anyway, even though they're dynamically linked.

(*): ... and build time on the distro package builders, of course.

imoverclocked · a year ago

> The main argument for using shared libraries isn’t memory or disk usage, but simply security.

“The” main argument? In a world filled with diverse concerns, there isn’t just one argument that makes a decision. Additionally, security is one of those things where practically everything is a trade off. Eg: by having lots of things link against a single shared library, that library becomes a juicy target.

> With a shared zlib, you will have to update only one package.

We are back to efficiency :)

sunshowers · a year ago

The solution here is to build tooling to track dependencies in statically linked binaries. There is no inherent reason that has to be tightly coupled to the dynamic dispatch model of shared objects. (In other words, the current situation is not an inherent fact about packaging. Rather, it is path-dependent.)

For instance, many modern languages use techniques that are simply incompatible with dynamic dispatch. Some languages like Swift have focused on dynamic dispatch, but mostly because it was a fundamental requirement placed on their development teams by executives.

While there is a place for dynamic dispatch in software, there is also no inherent justification for dynamic dispatch boundaries to be exactly at organizational ones. (For example, there is no inherent justification for the dynamic dispatch boundary to be exactly at the places a binary calls into zlib.)

edit: I guess loading up a .so is more commonly called "dynamic binding". But it is fundamentally dynamic dispatch, ie figuring out what version of a function to call at runtime.

nmz · a year ago

Does static compilation use the entire library instead of just the parts that are used? If I'm just using a single function from this library, why include everything?

t-3 · a year ago

If a vulnerability in a single library can cause security issues in more than one package, there are much more serious issues to consider with regards to that library than the need to recompile 1000 dependents. The monetary/energy/time savings of being able to update libraries without having to rebuild dependents are of far greater significance than the theoretical improvement in security.

anotherhue · a year ago

If you're in an adversarial relationship with the OEM software developer there's not a whole lot the distro maintainers can do, probably time to find a fork/alternative. (Forks exist for both your examples).

I say this as a casual maintainer of several apps and I'm loathe to manually patch versus upstream any fix.

Arnavion · a year ago

I'm not going to switch to a firefox fork over one line in the configure script invocation. Forks have their own problems with maintenance and security. It's not useful to boil it down to one "adversarial relationship" boolean.

skissane · a year ago

> I also care about using software from distro repos vs Flatpak etc wherever possible, because software in the distro repos is maintained by someone whose values align with me and not the upstream software author.

The problem one usually finds with distro repo packages, is they are usually out of date compared to the upstream – especially if you are running a stable distro release as opposed to the latest bleeding edge. You can get in a situation where you are forced to upload your whole distro to some unstable version which may introduces lots of other issues, just because you need a newer version of some specific package. Upstream binary distributions, Flatpak/etc, generally don't have that issue.

> the firefox package from distro repos enables me to load my own extensions without Mozilla's gatekeeping, the Audacity package from distro repos did not have telemetry enabled that Audacity devs added to their own builds, etc

This is mainly a problem with "commercial open source", where an open source package is simultaneously a commercial product. "Community open source" – where the package is developed in people's spare time as a hobby, or even by commercial developers where the package is just some piece of platform infrastructure not a product in itself, is much less likely to have this kind of problem.

jauntywundrkind · a year ago

With Debian, one can apt-pin different releases on as well. So you can run testing for example but have oldstable, stable, unstable and experimental all pinned on.

That maximizes your chance of being able to satisfy a particular dependency like libflac.8.so. Sometimes that might not actually be practical to pull in or might involve massively changing a lot of your installed software to satisfy the dependencies, but often it can be a quick easy way to drop in more libraries.

Sometimes libraries don't have a version number on them, so it'll keep being libflac even across major versions. Thats prohibitive because ideally you want to install old version 8 alongside newer version 12. But generally Debian is pretty good about allowing multiple major versions of packages. Here for example is libflac12, on stable and unstable both. https://packages.debian.org/search?keywords=libflac12

o11c · a year ago

> Feel free to statically link it

Actually, please don't. There's a nontrivial chance that you'll ship a copy with bugs that somebody wants to fix.

Instead, to get ALL the advantages of static linking with none of the downsides, what you should do is:

* dynamically link, shipping a copy of the library alongside the binary

This allows the user to find a newer build of the same-version library. It's also much less likely to be illegal (not that anybody cares about laws).

* use rpath to load the library

Remember: rpath is NOT a security hole in general. The only time that can happen is if it is both a privileged program (setuid, setcap, etc.) and it contains a non-owner-writable path, neither of which is likely to happen for the kind of software people complain about shipping.

* do not use dlopen, please

`dlopen` works well for plugins, but terribly for actually-needed dependencies.

red016 · a year ago

Install Gentoo.

Something worth noting with shared dependencies is that yes, they save on disk space, but they also save on memory. A 200MB non-shared dependency will take up 600MB across three apps, but a 200MB shared dependency can be loaded on it's own and save 400 megabytes. (Most operating systems manage this at the physical page level, by mapping multiple instances of a shared library to the same physical memory pages.)

400 megabytes of memory usage is probably worth more than 400 megabytes of storage. It may not be a make-or break thing on it's own, but it's one of the reasons Linux can run on lower-end devices.

packetlost · a year ago

When you statically compile an application, you only store text (code) of functions you actually use, generally, so unless you're using all 400Mb of code you're not going to have a 400Mb+ binary. I don't think I've ever seen a dependency that was 400Mb of compiled code if you stripped debug information and weren't embedding graphical assets, so I'm not sure how relevant this is in the first place. 400Mb of opcodes is... a lot.

AshamedCaptain · a year ago

I have seen a binary that is approximately 1.1GB of text. That is without debug symbols. With debug symbols it would hit GDB address space overflow bugs all the time. You have not seen what 30 year old engineering software houses can produce. And this is not the largest software by any chance.

Also, sibling comments argue that kernel samepage merging can help avoid the bloat of static linking. But here what you argue will make every copy of the shared libraries oh-so-slightly-different and therefore prevent KSM from working at all. Really, no one is thinking this through very well. Even distributions that do static linking in all but name (such as NixOS) do still technically use dynamic linking for the disk space and memory savings.

vvanders · a year ago

Yes, this is lost in most discussions when it comes to DSOs. Not only do you have the complexity of versioning and vending but also you can't optimize with LTO and other techniques(which can make a significant difference in final binary size).

If you've got 10-15+ consumers of a shared library or want to do plugins/hot-code reloading and have a solid versioning story by all means vend a DSO. If you don't however I would strongly recommend trying to keep all dependencies static and letting LTO/LTCG do its thing.

Lerc · a year ago

In practice I don't think this results in memory savings. By having a shared library and shared memory use, you also have distributed the blame for the size of the application.

It would be true that this save memory if applications did not increase their memory requirements over time, but the fast is that they do, and the rate at which they increase their memory use seems to be dictated not by how much memory they intrinsically need but how much is available to them.

There are notable exceptions. AI models, Image manipulation programs etc. do actually require enough memory to store the relevant data.

On the other hand I have used a machine where the volume control sitting in the system tray used almost 2% of the system RAM.

Static linking enables the cause of memory use to be more clearly identified, That enables people to see who is wasting resources. When people can see who is wasting resources, there is a higher incentive to not waste them.

pessimizer · a year ago

This is a law of averages argument. There is no rational argument for bloat in order to protect software from bloat. This is like saying that it doesn't matter that we waste money, because we're going to spend the entire budget anyway.

bbatha · a year ago

Only if you don’t have LTO on. If you have LTO on you’re likely to use a fraction of the shared dependency size even across multiple apps.

odo1242 · a year ago

That is a good point.

tdtd · a year ago

This is certainly true on Windows, where loaded DLLs share the base addresses across processes even with ASLR enabled, but is it the case on Linux, where ASLR forces randomization of .so base addresses per process, so relocations will make the data in their pages distinct? Or is it the case that on modern architectures with IP-relative addressing (like x64) that relocations are so uncommon that most library pages contain none?

saagarjha · a year ago

Code pages are intentionally designed to stay clean; relocations are applied elsewhere in pages that are different for each process.

pradn · a year ago

It's possible for the OS to recognize that several pages have the same content (ie: doing a hash) and then de-duplicate them. This can happen across multiple applications. It's easiest for read-only pages, but you can swing it for mutable pages as well. You just have to copy the page on the first write (ie: copy-on-write).

I don't know which OSs do this, but I know hypervisors certainly do this across multiple VMs.

slabity · a year ago

Even if the OS could perfectly deduplicate pages based on their contents, static linking doesn't guarantee identical pages across applications. Programs may include different subsets of library functions and the linker can throw out unused ones. Library code isn't necessarily aligned consistently across programs or the pages. And if you're doing any sort of LTO then that can change function behavior, inlining, and code layout.

It's unlikely for the OS to effectively deduplicate memory pages from statically linked libraries across different applications.

ChocolateGod · a year ago

Correct me if I'm wrong Linux only supports KSM (memory-deduping) between processes when doing it between VMs, as QEMU provides information to the kernel to perform it.

Deleted Comment

indigodaddy · a year ago

I remember many years ago VMWare developed a technology to take advantage of these shared library savings across VMs as well.

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsp...

abhinavk · a year ago

Linux’s KVM has it too. It’s called KSM.

whatshisface · a year ago

Isn't 200MB a little large for a dependency? The Linux kernel is ~30MB.

odo1242 · a year ago

It's around the size of OpenCV, to be specific. I do see your argument though.

anotherhue · a year ago

Assuming it is worth it with modern memory sizes (I think not), could this present a negative on NUMA systems? Forcing contention when a simple copy would have been sufficient?

I assume there's something optimising that away but I'm not well versed.

intelVISA · a year ago

Please don't soapbox under false pretenses - in no world does this "200mb non-shared dependency" comparison exist: that would imply static linkage is just appending a shared object into the executable at compile-time.

839302 · a year ago

Hola. No

tannhaeuser · a year ago

> Sure, we now have PipeWire and Wayland. We enjoy many modern advances and yet, the practical use for me is worse than it was 10 years ago.

That's what has made me leave Linux for Mac OS, and I'm talking about things like the touchpad (libinput) causing physical pain and crashing every couple minutes while the "desktop" wants to appeal to a hypothetical casual tablet user even Microsoft left long behind in the Windows 8.1 era (!). Mind, Mac OS is far from perfect and also regressing (hello window focus management, SIP, refactored-into-uselessness expose, etc, etc) but Mac OS has at least a wealth of new desktop apps created in this millenium to make up for it, unlike the Linux desktop struggling to keep the same old apps running as it's on a refactoring spree to fix self-inflicted problems (like glibc, ld.so) and then still not attracting new developers. I wish I could say, like the author, that containers are the solution, but the canonical example of browser updates also is a case of unwarranted and rampant complexity piling up without the slightest actual benefit for the user as the web is dying.

imiric · a year ago

I agree to an extent, but Linux is actually very usable if you stick to common quality hardware, and a collection of carefully curated robust and simple software. I realize this is impractical for most people, but having a limited ecosystem is what essentially allows Apple to deliver the exceptional user experience they're known for.

The alternative of supporting an insane amount of hardware, and making it work with every combination of software, both legacy and cutting-edge, is much, much harder to achieve. It's a miracle of engineering that Linux works as well as it does, while also being developed in a globally distributed way with no central company driving the effort. Microsoft can also pull this off arguably better, but with a completely different development and business model, and all the resources in the world, so it's hardly comparable.

The sad part is that there is really no alternative to Linux if you want full control over your devices, while also having a decent user experience. Windows and macOS are walled gardens, and on the opposite spectrum BSDs and other niche OSs are nowhere near as usable. So the best we can do is pick a good Linux distro, something we're thankfully spoiled for choice, and customize it to our needs, which unfortunately does take a lot of time and effort. I still prefer this over the alternatives, though.

desumeku · a year ago

> while the "desktop" wants to appeal to a hypothetical casual tablet user even Microsoft left long behind in the Windows 8.1 era This sounds like a GNOME issue, not a Linux one.

shiroiushi · a year ago

>while the "desktop" wants to appeal to a hypothetical casual tablet user even Microsoft left long behind in the Windows 8.1 era (!)

Using GNOME is a personal choice that you made. There's lots of Linux distros that use better desktop environments like KDE and XFCE.

jwells89 · a year ago

In my opinion, what desktop Linux needs to attract the the sort of developers that create the high-polish third party software that macOS is known is a combination DE and UI toolkit that provides the same breadth and depth that's found in AppKit.

With AppKit, one can build just about any kind of app imaginable by importing a handful of stock frameworks. The need to use third party libraries is the lowest I've seen on any platform, with many apps not needing any at all. That's huge for lowering the amount of activation energy and friction involved in building things, as well as for reducing ongoing maintenance burden (no fear of the developer of libfoo suddenly disappearing, requiring you to chase down the next best fork or to fork it yourself).

The other thing is not being afraid of opinionated design choices, which is also a quality of AppKit. This is going to chafe some developers, but I see it as a necessary evil because it's what allows frameworks to have "happy paths" for any given task that are well-supported, thoroughly tested, and most importantly work as expected.

GTK and Qt are probably the closest to this ideal but aren't quite there.

MrDrMcCoy · a year ago

I've been meaning to get into Qt dev for a while now, as it seems to be a pretty big, portable framework with batteries included, which would be a big step up from the little headless Python tools I've been writing. What would you say is missing from Qt and the related KDE frameworks compared with AppKit?

nosioptar · a year ago

I hate libinput and its lack of configuration so much I've contemplated going back to windows. I probably would have if there was a legit way to get win ltsc as an individual.

BeetleB · a year ago

Been running Gentoo for over 20 years. It's as (un)stable now as it was then. It definitely has not regressed in the last N years.

I don't see this as a GNU/Linux problem, but a distro problem.

Regarding shared libraries: What do you do when a commonly used library has a security vulnerability? Force every single package that depends on it to be recompiled? Who's going to do that work? If I maintain a package and one of its dependencies is updated, I only need to check that the update is backward compatible and move on. You've now put a huge amount of work on my plate with static compilation.

Finally: What does shared libraries have to do with GNU/Linux? I don't think it's a fundamental part of either. If I make a distro tomorrow that is all statically compiled, no one will come after me and tell me not to refer to it as GNU/Linux. This is an orthogonal concern.

anon291 · a year ago

The main issue is mutability, not shared objects. Shared objects are a great memory optimization for little cost. The dependency graph is trivially tracked by sophisticated build systems and execution environments like NixOS. We live in 2024. Computers should be able to track executable dependencies and keep around common shared object libraries. This is a solved technical problem.

tremon · a year ago

Immutabililty actually destroys the security benefits that shared objects bring, because with every patch the location of the library changes. So you're back to the exact same situation as without dynamic linking: every dependency will need to be recompiled anyway against the new library location. And that means that even though you may have a shared object that's already patched, every other package on your system that's not yet been recompiled is still vulnerable.

rssoconnor · a year ago

In NixOS, you can use system.replaceDependencies to create a new derivation that performs a search and replace on references to dynamically linked libraries in the old derivation, recursively swapping them out for the new library without recompiling.

It's no better than static linking then, which is one of the proposed solutions here. In reality, if you're using a latest nixpkgs, there's no recompile, since hydra's already done that, and the binaries come updated with shared dependencies.

Again... solved problem

Applications rarely link against versioned shared libraries. It is rare that a security patch would require every application to have its list of dependent libraries edited.

> I am all in for AppImages or something like that. I don't care if these images are 10x bigger. Disk space now is plenty, and they solve the issue with "libFlac.8.so is missing

They don't solve the dependency problem, they solve a distribution problem (in a bad for security way), what the AppImage provides is up to the author and once you go out of the "Debian/Ubuntu" sphere, you run into problems with distributions such as Arch and Fedora whom provide newer packages or do things slightly differently. You can have them fail to run if you're missing Qt, or your Qt version does not match the version it was compiled against, same GTK, Mesa, Curl etc.

The moment there's an ABI incompatibility with the host system (not that uncommon), it breaks down. Meanwhile, a Flatpak produced today should run in 20 years time as long as the kernel doesn't break user-space.

They don't run on my current distribution choice of NixOS. Meanwhile Flatpaks do.

I really doubt a X11 Flatpak from today will run in Fedora from 10 years time, much less 20 years time. They will break XWayland (in the name of "security") much before that. They will break D-Bus much before that.

In addition, the kernel breaks ABI all the time; sometimes this is partially workarounded thanks to dynamic linking (e.g. OSS and solutions like aoss). Other times not so much.

I feel that everytime someone introduces a "future proof" solution for 20 years they should make the effort to run 20 year old binaries on their Linux system of today and extrapolate from it.

As someone who actually has run at-least-15-year-old binaries (though admittedly not x11 ones for my use case), I can strongly state the following advice:

* do not use static linking for libc, and probably not for other libraries either

C-level ABI is not the only interface between parts of a program. There are also other things like e.g. well-known filepaths, and the format of those can change. A statically-linked program has no way to work across a break, whereas a dynamically-linked one (many important libraries have not broken SOVERSION in all that time) will know how to deal with the actual modern system layout.

During my tests, all the statically-linked programs I tried from that era crashed immediately. All the dynamically-linked ones worked with no trouble.

The kernel breaks ABI? Since when?

mfuzzey · a year ago

I haven't seen such problems on Debian or Ubuntu guess it's par for the course with a bleeding edge distro.

The author seems to be focusing on the disk space advantage and claiming it's not enough to justify the downsides today. I can understand that but I don't think disk space savings are the main advantage of shared dependencies, rather it's centralized security updates. If every package bundles libfoo what happens there's a security vulnerability in libfoo?

> If every package bundles libfoo what happens there's a security vulnerability in libfoo?

That’s actually the key point that many people in this discussion seem to miss.

pmontra · a year ago

What happens is that libfoo gets fixed, possibly by the maintainers of the distro, and all the apps using it are good to go again.

With multiple versions bundled to multiple apps, a good number of those apps will never be updated, at least not in a timely manner, and the computer will be left vulnerable.

Then you get an alert that your libfoo has a vulnerability (GitHub does a pretty good job here!) and you roll out a new version with a patched libfoo.

Since I switched to nixos all these articles read like people fiddling with struct-packing and optimising their application memory layout. The compiler does it well enough now that we don't have to, so it is with nix and your application filesystem.

__MatrixMan__ · a year ago

I had a similar feeling during the recent crowdstrike incident. Hearing about how people couldn't operate their lathe or whatever because of an update, my initial reaction was:

> Just boot yesterday's config and get on with your life

But then, that's one of those NixOS things that we take for granted.

sshine · a year ago

Not just NixOS. Ubuntu with ZFS creates a snapshot of the system on every `apt install` command.

But yeah, it’s pretty great to know that if your system fails, just `git restore --staged` and redeploy.

thot_experiment · a year ago

Using shared libraries is optimizing for a very different set of constraints than nixos, which iirc keeps like 90 versions of the same thing around just so everyone can have the one they want. There are still people who are space constrained. (I haven't touched nix in years so maybe i'm off base on this)

> The compiler does it well enough now that we don't have to

You know I see people say this and then I see some code with some nested loops running 2x as fast as code written with list comprehensions and I remember that it's actually.

"The compiler does it well enough now that we don't have to as long as you understand the way the compiler works at a low enough level that you don't use patterns that will trip it up and even then you should still be benchmarking your perf because black magic doesn't always work the way you think it works"

Struct packing too can still lead to speedups/space gains if you were previously badly aligned, which is absolutely something that can happen if you leave everything on auto.

matrss · a year ago

> Using shared libraries is optimizing for a very different set of constraints than nixos, which iirc keeps like 90 versions of the same thing around just so everyone can have the one they want.

This isn't really true. One version of nixpkgs (i.e. a specific commit of https://github.com/NixOS/nixpkgs) generally has one version of every package and other packages from the same nixpkgs version depending on it will use the same one as a dependency. Sometimes there are multiple versions (different major versions, different compile time options, etc.) but that is the same with other distros as well.

In that sense, NixOS is very similar to a more traditional distribution, just that NixOS' functional package management better encapsulates the process of making changes to its package repository compared to the ad-hoc nature of a mutable set of binary packages like traditional distros and makes it possible to see and rebuild the dependency graph at every point in time while a more traditional distro doesn't give you e.g. the option to pretend that it's 10 days or months ago.

You only really get multiple versions of the same packages if you start mixing different nixpkgs revisions, which is really only a good idea in edge cases. Old ones are also kept around for rollbacks, but those can be garbage collected.

thomastjeffery · a year ago

The problem with Nix is that it's a single monolithic package archive. Every conceivable package must go somewhere in the nixpkgs tree, and is expected to be as vanilla as possible.

On top of that, there is the all-packages.nix global namespace, which implicitly urges everyone to use the same dependency versions; but in practice just results in a mess of redundant names like package_version.1.12_x-feature-enabled...

The move toward flakes only replaces this problem with intentional fragmentation. Even so, flakes will probably end up being the best option if it ever gets coherent documentation.

No argument, if you're perf sensitive and aren't benchmarking every change then it's a roll of a dice as to whether llvm will bless your build.

The usual claim stands though, on a LoC basis a vanishingly small amount of code is perf sensitive (embedded likely more TBF)

jeltz · a year ago

The C compiler does not optimize struct packing at all. Some languages like Rust allows optimizing struct layouts but even for Rust struct layout can matter if you care about cache locality and vector operations.

The compiler takes care of vulnerabilities?

There was a recent talk that discussed the security nightmare with dozens of different versions of shared libraries in NixOS and how difficult it is for the distribution maintainers to track and update them.