Lots of brain cycles are spent on "programming language theory". We've roughly figured out the primitives required to express real-world computation.
In contrast, we apparently have no "package management theory". We have not figured out the primitives required to express dependencies. As a result, we keep building new variants and features, until we end up with <script>, require(), import, npm, yarn, pnpm, (py)?(v|virtual|pip)?env, (ana)?conda, easy_install, eggs and wheels ...
Is it just a "law of software" that this must happen to any successful language? Or are there examples of where this it has not happened, and what can we learn from them? Is there a "theory of package management", or a "lambda calculus of package management" out there?
We have a good hunch. The basic theory behind Nix definitely goes in the right direction, and if we look away from all the surface-level nonsense going on in Nix, it's conceptually capable (e.g. [0]) of being a first-class language dependency manager.
For this to work at scale we'd need to overcome a couple of large problems though (in ascending order of complexity):
1. A better technical implementation of the model (working on it [1]).
2. A mindset shift to make people understand that "binary distribution" is not a goal, but a side-effect of a reasonable software addressing and caching model. Without this conceptual connection, everything is 10x harder (which is why e.g. Debian packaging is completely incomprehensible - their fundamental model is wrong).
3. A mindset shift to make people understand that their pet programming language is not actually a special snowflake. No matter what the size of your compilation units is, whether you call modules "modules", "classes" or "gorboodles", whether you allow odd features like mutually-recursive dependencies and build-time arbitrary code execution etc.: Your language fits into the same model as every other language. You don't have to NIH a package manager.
This last one is basically impossible at the current stage. Maybe somewhere down the line, if we manage to establish such a model successfully in a handful of languages and people see for themselves, but for now we have to just hold out.
[0]: https://code.tvl.fyi/about/nix/buildGo
[1]: https://cs.tvl.fyi/depot/-/tree/tvix/
1. The package definitions are just a normal, battle-proven, very well defined general-purpose, functional-style supporting programming language (Scheme).
2. There is no conceptual difference between a package definition in the public Guix system, and a self-written package definition which a developers makes to build and test his own package, or to build and run a specific piece of software. The difference is equally small as between using an Emacs package, and configuring that package in ones .emacs configuration file.
And sure, may be you're right that distro packaging is the "wrong model," again, that is the problem then for distros, users are stuck using ubuntu or whatever so they don't have the option to do the "right" thing, so they do use the mishmash of packaging/repo systems as just the cost of doing business.
For development you may want to test your code against multiple versions of the system libraries. This is not easy using a distro package manager.
Nixpkgs doesn't do version or feature resolution, but other tools (that are not nixpkgs) can and do.
Dead Comment
[0] Edit: rebuilding your dependencies if you need it and handling that seemlesly CAN be hard on Debian. Something that even Nix struggles with even if they are best in class by far. It is also completely different from the notion of compiling a program. I wonder what you consider the goal of a package system.
Quite likely! The whole concept of separately building a source/binary package, and then uploading/"deploying" that binary package, already violates the notion of being "just a cache" for me. There might be a tool in Debian where I can seamlessly say "do not download this package from the repository, but build it locally" - but even if so, it would surprise me if it can give me the same guarantees as Nix (i.e. guarantees about the artifact being repeatable, and being addressable by its inputs rather than a human-curated tag or version number).
> Something that even Nix struggles with
Nix only struggles with it to the degree that some of the higher-level abstractions in the existing package set (which are built on top of the fundamental model, not part of it) can be confusing/underdocumented, but conceptually this is a simple thing to do in Nix. Even practically at this point it is usually simple - unless you're dealing with Haskell or Javascript, of course.
> I wonder what you consider the goal of a package system
I want it to let me describe a desired state, and then make it so. That state is best represented as a graph, like a Merkle tree, of the instructions for the individual steps and a way to address them. An individual step is a transformation (i.e. some program) executed over some sources, yielding some result (likely in the filesystem). I want any distribution of binaries to be a result of using this addressing scheme, and looking up/copying an already built equivalent artifact for that thing. I want a text file containing the string "Hello HN" to be represented by the same abstraction that represents a specific .so file, the `emacs` package, the configuration of my entire system, the configuration of a cluster of systems and so on. I want this system to be programmable so I can work around the shortcomings that its original designers missed.
Nix (and Guix) do large parts of this already, and are conceptually (though not yet technically) capable of doing all of it. Technically, so is something like Bazel - but its complexity and maintenance requirements make that prohibitively expensive (basically only Google can use Bazel like that, and even they have delineated areas where they "give up" on wrapping things in Bazel).
I'm in a team that works on a pet prog lang for distributed systems, and we did some research of using an existing package managing systems. We've settled on NPM for now, but god I wish there would be a better generic package manager out there.
Again, maybe you're already aware of it, but I think it's a nice example of genericising a concern common to many languages which sounds similar to what you're asking for (albeit unfortunately in a slightly different space).
[0] https://github.com/asdf-vm/asdf
https://docs.racket-lang.org/denxi-guide/index.html
https://www.channable.com/tech/nix-is-the-ultimate-devops-to...
(It doesn't go very much in depth on the conceptual model, but touches on the the main ideas)
Everything else is mostly written to teach people how to use Nix, and the more recent the thing is the more it will focus on surface-level features of the C++ implementation of Nix.
[0]: https://edolstra.github.io/pubs/phd-thesis.pdf
My experience is that the older gen languages you mention had to invent package management, made lots of understandable mistakes and now are in a backwards compat hellscape.
Rust and Go built their packaging story with the benefit of lessons learned from those other systems, and in my experience the difference is night and day.
I can go on, but it's a terrible hodge-podge of systems. It works nicely for simple cases (consuming libraries off Github), but it's awful when you go into details. And it's not even used by its creators - since Google has a monorepo and they actually use their internal universal build tool to just compile everything from source.
The flip side of this is that it never has to worry about naming collisions or namespacing: Your public package name must be a URL you control.
Additionally, there is no requirement for a centralized package facility to be run. The Golang project is currently running pkg.go.dev, but that's only been in the last few years; and if they decided to get rid of it, it wouldn't significantly impact the development environment.
Finally, the current system makes "typo-squatting attacks" harder to do. Consider the popular golang package github.com/mattn/go-sqlite3. The only way to "typosquat" the package is to typosquat somewhere up the dependency tree; e.g., by creating github.com/matn/go-sqlite3 or something. You can't typosquat github.com/mattn/sqlite3, or github.com/mattn/go-sqlite, because you don't own those namespaces; whereas with non-DNS-based package systems, the package would be called `go-sqlite3`, and `sqlite3` or `go-sqlite` would be much easier to typosquat.
All those things I find really valuable; and honestly it's something I wish the Rust ecosystem had picked up.
> It requires users of your code to update all of their import statements throughout their code whenever you move your hosting.
This is a necessary cost of the item above. It can be somewhat annoying, but I believe this can be done with a one-line change to the go.mod. I'd much rather occasionally deal with this.
> It requires you to physically move all code to a new folder in your version control if you want to increase the major version number.
And the benefit of this is that legacy code will continue to compile into the future. I do tend to find this annoying, but it was explicit trade-off that was decided back when they were developing their packaging system.
Packaging is a hard problem, with lots of trade-offs; I think Go has done a pretty good job.
One way in which Go and Rust have it easier than Python or Node is that the former only have to deal with developers; the latter have to deal with both developers and users, whose requirements are often at odds with one another.
> It requires you to physically move all code to a new folder in your version control if you want to increase the major version number.
This is untrue.
> It requires users of your code to update all of their import statements throughout their code whenever you move your hosting.
This is only true if not using a vanity URL, but is sadly often the case.
> It takes arcane magic to support multiple Go modules in the same repo.
I don’t know what you’re calling arcane magic here, but we maintain repos at work with 6-7 go modules in without it being an issue whatsoever, and no “arcane magic” required, so I’m going to go ahead and say this is untrue too.
That's simply not true. That's only one way you can do it. Another way is to create a branch.
https://sourcehut.org/blog/2023-01-09-gomodulemirror/
That was my impression some time ago.
But last week I attempted to compile a couple of (not very big) tools from cargo. And it ended up downloading hundreds of dependencies and gigabytes of packages.
Looks like node_modules.jpg all over again :(
Further, another major difference is that you don't need those dependencies after you've built. You can blow them away. Doing that with node is not as straightforward, and in many cases, not possible.
I wrote a post highlighting Go's mod system: https://verdverm.com/go-mods/
imo, it is the best designed dependency system I know of. One of the nice things is that Go uses a shared module cache so there is only one copy on your computer when multiple projects use the same dependency@version
Version incrementing, packaging wasms, dancing around code generation – all doable, but not standardized.
There's a release-please to automate all that, but it's not an easy task to set it up in all of your repos.
Besides, if in addition to Rust projects, you have projects in other languages like JavaScript, then you have to do it twice and struggle with understanding all of the package management systems provided by all languages you have.
A single swiss-army-knife package manager would be amazing.
Plus none of them handle binary library distribution as some of the packing models that came before them.
We should have left Commonjs a long time ago, while keeping backwards compatibility.
At the same time what I see with the node+npm system is that everything is just ,,it just doesn't work by default''.
Having 10 other package managers doesn't work either, they are faster, but don't solve this problem.
And given that packages have to depend on other packages, and cannot depend on a git repository, that feature is mostly useful for testing bug fixes, private repos for leaf packages, stuff like that.
Back when those languages were designed, you'd manually download the few modules you need, if you downloaded any packages at all. In C you'd normally build your own world, since it came before the www times, and C++ kind of inherited that. But languages which came out later decided that we now live in a world where most of the code that is executed is packages, most likely packages which live on Github. So Julia and Rust build this into the language. Julia in particular with the Project.toml and Manifest.jl for fully reproducing environments, its package manager simply uses git and lets you grab the full repository with `]dev packagename`, its package registry system lets you extend with private package worlds.
I think the issue is that dependencies are central, so you can never remove old package systems because if that's where the old (and rarely updated) dependencies live, then you need to keep it around. But for dependencies to work well, you need all dependencies to be resolved using the same package system. So package systems don't tend to move very fast in any language, whatever you had early has too much momentum.
I cringe hard when I see projects depending on git repos, without pinning a version or commit.
Have you tried to package a random Python/Ruby/etc. CLI program, for Debian? Or how about for Homebrew? Each one involves a cacophony of PL scripts calling shell-scripts calling PL scripts, and internal/undocumented packager subcommands calling other internal/undocumented packager subcommands. It takes ~forever to do a checked build of a package in these systems, and 99% of it is just because of how spread out the implementation of such checks is over 100 different components implemented at different times by different people. It could all be vastly simplified by reimplementing all the checks and transforms in a single pass that gradually builds up an in-memory state, making assertions about it and transforming it as it goes, and then emitting it if everything works out. You know — like a compiler.
There's no reason why a package-management system needs to be language-specific; dependencies are often cross-language. Hell, even some blocks of code contain more than one language.
The package-management system is responsible for deploying packages. The way a package is deployed should depend on the operating environment, not on the language. These language-specific packaging arrangements typically deploy into some private part of the file-system, organized in its own idiosyncratic way.
Using git as a repository is just nuts. Git is privately-owned, and stuffed with all kinds of unaudited junk. You can't audit everything you install by hand; so these systems force you to install unaudited code, or simply not install.
I've been using Debian derivatives for years. I appreciate having an audited repository, and an installation system that deploys code to more-or-less predictable filesystem locations.
Do you mean github? Git is open source and one of the few pieces of software that works in a truly distributed fashion.
Also: what do I care that people store unaudited and insecure stuff in there?
1) "npm install" or "go get" or what have you, works on every platform (barring bugs), while "apt install" only works on some.
2) Most platform package managers aren't good at handling multiple versions of dependencies (which, neither are many language package managers, but they're easier to sandbox away with supplemental tools than system package managers are)
3) Most platform package managers lag way behind language-specific package managers, and may also lack tons and tons of packages that are available on those.
> Git is privately-owned
Git... hub, you mean?
Anaconda is probably the closest as it also package many non python packages. Nix is also similar but it doesn't support Windows at all (without WSL)
Git is owned by the same owner as Linux. If you've been using Debian derivatives for years it seems you must have some trust to give that private entity? Unless the derivative you speak of is Debian GNU/k*BSD?
Furthermore, if you give trust to Debian derivative projects, why not trust their Git builds? If you trust everything else in their distribution Git is a curious omission. Do you have a personal beef with Torvalds or something?
This sounds a lot like a case of https://xkcd.com/927/ . Languages have different ways of importing and installing dependencies, trying to create a package manager over all of those is just going to end up making things even more complex, especially if you target all platforms at once.
> Using git as a repository is just nuts. Git is privately-owned, and stuffed with all kinds of unaudited junk.
Git is fully open source. Are you confusing Git and GitHub?
They're not actually different. They call things differently, and they have different methods of passing the required lookup paths/artifacts/sources to their compilers/interpreters/linkers, but in the end all of them are conceptually the same thing.
We self-organize into communities of practice and select the package management strategy that works best. Reaching across communities to develop a grand centralized strategy that fits everyone's needs would be __possible__, but involves significant communication and coordination overhead. So instead we fracture, and the tooling ecosystem reflects ad-hoc organizational units within the community.
Ecosystems like Rust cargo that have batteries included from the start have an advantage, virtually all Rust developers have a single obvious path to package management because of this emergent social organization.
Ecosystems like Python's seem like the wild west, there is deep fracturing (particularly between data science and software engineering) and no consensus on the requirements or even the problems. So Python fractures further, ironically in a search for something that can eventually unify the community. And python users feel the strain of this additional overhead every day, needing to side with a team just to get work done.
I'd argue both of these cases are driven by consequences easily predictable from Conway's Law.
But to fully appreciate it, it helps to understand syntax transformation in Racket. Once the rigorous phase system forces non-kludgy static rules about when things are evaluated, your syntax transformers and the code on which they depend could cause a mess of shuffling code among multiple files to solve dependency problems... until you use submodules with the small set of visibility rules, and then suddenly your whole tricky package once again fits in a single file cleanly.
I leveraged this for some of my embedded doc and test experiments, without modifying the Racket core. (I really, really like single-source-file modules that embed doc, test, and package metadata all in the same file, in logical places.)
Python and Node both need a way to compile the code down to a single statically-linked binary like more modern languages (Go, Rust), solving the distribution problem once and for all.
There are module systems that aren't insane, like Go's module system. It uses semantic versioning to mediate version conflicts. Programs can import multiple major versions of the same module. The module requirements files ensure that every checkout of the code gets the exact same bytes of the code's dependencies. The compiler is aware of modules and can fetch them, so on a fresh workstation, "go test ./..." or "go install ./cmd/cool-thing" in a module-aware project works without running any other command first. It is actually so pleasant to use that I always think twice "do I want to deal with modules" before using a language like Javascript or Python, and usually decide "no".
npm and pip are the DARK AGES. That's why you're struggling. The community has been unable to fix these fundamental flaws for decades, despite trying in 100 incompatible ways. I personally have given up on the languages as a result. The standard library is never enough. You HAVE to solve the module problem on day 1.
I don't use Python much these days, but it's not as bad as Perl 15 years ago. I see blog posts like "how to set up the perfect Python dev environment with Docker" and it makes me very sad, but at least teams are getting their work done. The edge cases of Python packaging, though, are really really bad. For example, the C compiler that Python was compiled with (and C library; musl vs. glibc) affects installability of modules through pip. This really forces your Linux distribution to be the arbiter of what modules you can use, which is always too out of date for development. Also, the exact requirements (what are the sha256s and URLs of the code files that this code depends on) depends on architecture, compiler, etc. As far as I can tell, you have to run code to find the whole dependency tree. That is the fatal flaw I see with Python's packaging system.
I spent a lot of time trying to use Bazel to unify all the work my company does across languages. Python was the killer here. We do depend on a lot of C extensions, and I have a C toolchain built with Bazel (so that arm64 mac users can cross-compile to produce Linux releases; maybe a dumb requirement, but it would be too unpopular to discriminate against certain developer devices), but getting Python built with that toolchain, and using that Python to evaluate requirements.txt didn't work. (There are bugs open; they all end with "Python has to fix this".) As a result, you don't get compatible versions of, say, numpy with this setup. Dealbreaker. Partially Bazel's fault, partially Pyton's fault.
(It pained me to use Bazel for Go, but it did all work. While the workflow wasn't as nice as what Go has built in, easy things were hard and hard things were possible. I had working binaries within a few hours, buildable on any supported workstation type, with object and test caching in the cloud.)