I'm pretty immune to most JS ecosystem churn, but on package managers I'm feeling it.
All I want is a package manager that is correct, causes very few day to day issues and is going to work the same way for many years. Yet everyone seems to be optimizing disk usage and speed without even getting the basics (make it work, make it right) fully covered.
I don't understand why people are optimizing for disk space at all tbh. Like, have you ever edited a video project, used docker, installed Xcode? I cannot imagine what you must be doing for all node_modules combined to take up more than maybe 100 GB on disk.
pnpm seems to be the lightest of the bunch, which is nice but why even mess with symlinks and confuse other tools? Just put all the files in there, nest them and duplicate them and everything. I'll happily live with my 10 GB node_modules folder that never breaks and sometimes gives me a nice coffee break.
Possibly I'm actually just salty that Metro doesn't support symlinks and would otherwise be on the pnpm love-train.
NPM just has too much institutional inertia to avoid. The moment you make the decision to use something else, you are simply trading one set of warts for another. I can't even tell you how many projects I have seen waste countless hours of dev time on Yarn/NPM discrepancies. If you are working on anything with more than two people, you really need to just use the standard tooling that everyone is familiar with and that the entire ecosystem is based around. Anything else is yak shaving.
I went back to composer for a bit recently (PHP), and I was baffled when my install command did nothing other than install exactly the packages I’d specified. When I ran update, it didn’t modify any of my files, but went to the latest version matching the restrictions I’d specified in composer.json.
very much the opposite of my experience. i'd gladly have 30 copies of left-pad living in my project rent free if it meant i never had to see "Your requirements could not be resolved to an installable set of packages" ever again.
PNPM’s primary feature, even if it gets lost in all the optimization, is make it work, make it right. With an emphasis on the former serving the latter as the goal. FS links are an implementation detail. The thing it gets right is resolving only the dependencies you specify, unless you very clearly say otherwise. The way it does so just also happens to be efficient. If it’s confusing any other tools, they’re doing something wrong too. Links aren’t some wild new concept.
I'm not sure if this is the _main_ reason, but one thing that makes node_modules size more than an aesthetic concern is serverless.
Booting a serverless function with 100s of mbs of modules takes an appreciable time - to the point where some folks webpack bundle their serverless code
Are people actually downloading dependencies on the fly like this? IMO a bundler is absolutely essential. What if npm is down? What if the version of a dependency has been deleted for some reason? These are surprises you absolutely do not want when a function is starting.
100% agreed here. If your package manager confuses established tooling and libraries, it's garbage. I recently started working at a company that uses this, and, horrifyingly, uses a monorepo, and their serverless lambdas are all at least 70+mb. I can't even fix it by using the webpack plugin because pnpm breaks webpack. Also worth noting that 90% of the "problems" that yarn and pnpm try to address were addressed by later versions of npm. Node, being very much tied to npm, doesn't need more package managers, it needs consensus and collaboration to improve on that consensus, and without breaking libs.
Other than the biggest OS pretty much choking on the node_modules black hole whenever you do a global operation on it (try deleting a js project on Windows)
Some years back, when npm itself didn't do deduplication of modules, it was impossible to work with some projects on windows due to OS path length limit.
Summarizing the 3 major JS package management approaches:
* Classic node_modules: Dependencies of dependencies that can't be satisfied by a shared hoisted version are nested as true copies (OSes may apply copy-on-write semantics on top of this, but from FS perspective, these are real files). Uses standard Node.js node_modules resolution [1].
* pnpm: ~1 real copy of each dependency version, and packages use symlinks in node_modules to point to their deps. Also uses standard resolution. Requires some compatibility work for packages that wrongly refer to transitive dependencies or to peer dependencies.
* pnp[2]: 1 real copy of each dependency version, but it's a zip file with Node.js and related ecosystem packages patched to read from zips and traverse dependency edges using a sort of import map. In addition to the compatibility work required from pnpm, this further requires compatibility work around the zip "filesystem" indirection.
In our real-world codebase, where we've done a modest but not exhaustive amount of package deduplication, pnpm confers around a 30% disk utilization savings, and pnp around a 80% savings.
Interestingly, the innovations on top of classic node_modules are so compelling that the package managers that originally implemented pnpm and pnp (pnpm and Yarn, respectively) have implemented each others' linking strategies as optional configs [3][4]. If MacOS had better FUSE ergonomics, I'd be counting down the days for another linking strategy based on that too.
Node is doing the right thing: if two dependencies in maven have conflicting dependencies, maven just picks an arbitrary one as _the_ version, which results in running with an untested version of your dependency (the dependency is actually depending on a version the developers of that dependency didn’t specify). Because node allows the same dependency to be included multiple times, npm and friends can make sure that every dependency has the right version of its dependencies.
That’s kind of incredible that yarn pnp out performs pnpm. If that’s generally true across most projects then I’m really glad that turborepo decided to use it for project subslicing.
The practical disk usage difference between pnp and pnpm appears to be almost entirely accounted for by the fact that pnp zips almost all packages. Both store ~1 version of each package-version on disk; it's just that one's zipped and one's not. The mapping entries for package edges in both cases (.pnp.cjs for pnp and the symlink farms for pnpm) are very small in comparison.
Disk utilization is only one metric. The trade-off for Yarn PNP is that it incurs runtime startup cost. For us (~1000+ package monorepo), that can be a few seconds, which can be a problem if you're doing things like CLI tools.
Also, realistically, with a large enough repo, you will need to unplug things that mess w/ file watching or node-gyp/c++ packages, so there is some amount of duplication and fiddling required.
Problames long solved before, but problems that don't matter to the javascript crowd.. I think they actually love that things take so long. It makes them thing they're doing important work.. "We're compiling and initting"
We recently started sponsoring pnpm[1] as well as adding zero-config support for deployments and caching. I think pnpm is an incredible tool. Excited to see it grow further.
As a person who uses npm just for some hobby coding projects, it's quite frustrating that there are new partly incompatible package managers for the javascript ecosystem: npm, pnpm, yarn, yarn 2.
Some packages need one, some another, so I tried to switch to yarn (or yarn 2) for a package that I wanted to try out, but then other packages stopped working.
If there are clearly better algorithms, why not refactor npm and add them in experimental flags to npm and then setting them to default as they mature (with safe switching from one data structure or another)?
Generally I've found sticking with npm to be best. It's not the super-slow thing that it was before, and I can't remember the last time a package didn't install
because it wasn't compatible with npm.
I tried pnpm and it didn't just work, so I gave up. I would revisit it, but npm works.
These days I don't really see a reason to use yarn (but would like to hear them).
For what it's worth Yarn 3 implements essentially all modes. It can do standard node_modules, its own Plug'n'Play, as well as pnpm-style hardlinking to a global cache.
Edit: I just learned from another comment that PNPM also supports Plug'N'Play :) Thanks steven-xu!
Given what a dumpster fire npm ecosystem is security wise, it's best to run the whole build chain in a container anyway, at least for frontend apps. This way you also don't care about the chosen package manager or node.js version - you can just set it as you wish in the Dockerfile. It does take more disk space though, but to me it's a nice compromise.
Containers don't provide much protection from malware, unless you're running it rootless under an unprivileged user (no sudo access, no ssh keys or anything else interesting in the home directory, etc; and even then it's limited because the attack surface is enormous).
> If there are clearly better algorithms, why not refactor npm and add them in experimental flags to npm
While node_modules has many flaws, in the current ecosystem all modes have their own pros and cons, and there isn't a "clearly better" algorithm: node_modules has less friction, PnP is sounder, and pnpm's symlinks attempt to be kind of an in-between, offering half the benefits at half the "cost".
Like in many computer science things, it's tradeoffs all the way. Part of why Yarn implements all three.
I recently migrated a fairly large monorepo (20+ packages) that used Lerna and npm to pnpm, and the improvement in developer experience was pretty massive.
Dependency install times went down by a huge amount, and all the strange issues we had with lerna and npm sometimes erroring out, requiring us to remove all node_modules folders and re-install everything, are just gone.
We even noticed that the size of some of our production bundles went down. Before some dependencies that were used in multiple packages were being duplicated in our webpack bundles needlessly, but the way pnpm symlinks dependencies instead of duplicating them fixed that as well.
The non-flat node_modules structure did break some things as well, since in some places we had imports pointing to packages that were transitive dependencies and not defined in package.json. I see this as a positive though, since all those cases were just bugs waiting to happen.
Probably because Lerna is already working for them. We're just using straight Yarn workspaces ourselves. Rush is great, but for our project it's overkill (lerna was as well).
Though we're using git submodules + yarn workspaces right now. The submodules will likely go away eventually.
I've migrated from yarn to pnpm two days ago and I can tell the difference when I first hit install. I am working on a workspace so I have multiple packages with nested dependencies. For this case, I thought yarn (the classic version) is the ideal solution that was until I discovered pnpm. Thanks to its non-flat algorithm the packages have clean dependencies. Previously in yarn if you install `foo` in any package you can reuse it later in the workspace even if it's not listed in the package dependency. With pnpm it's not the case, which means a clean dependency tree and serving the purpose of what's workspace is meant for. If you want to share the dependency you can install it in the root which makes sense to me. Another big advantage is the recursive/parallel command something I couldn't do without Lerna. And it's fast. install once and it's there in the disk, so if you manage multiple projects, dependency installation is not something you wait for it's just there.
It works very fast in CI, its cache is smaller and it builds node_modules much faster. I don't feel comfortable caching node_modules folder itself, because I got side effects even causing incidents before, not sure if its supposed to. Speed difference is 15s vs 35s for our use case which is pretty significant.
All I want is a package manager that is correct, causes very few day to day issues and is going to work the same way for many years. Yet everyone seems to be optimizing disk usage and speed without even getting the basics (make it work, make it right) fully covered.
I don't understand why people are optimizing for disk space at all tbh. Like, have you ever edited a video project, used docker, installed Xcode? I cannot imagine what you must be doing for all node_modules combined to take up more than maybe 100 GB on disk.
pnpm seems to be the lightest of the bunch, which is nice but why even mess with symlinks and confuse other tools? Just put all the files in there, nest them and duplicate them and everything. I'll happily live with my 10 GB node_modules folder that never breaks and sometimes gives me a nice coffee break.
Possibly I'm actually just salty that Metro doesn't support symlinks and would otherwise be on the pnpm love-train.
This way work well with other tools (e.g. I can force create-react-app to install packages with pnpm)
Such a breath of fresh air…
Booting a serverless function with 100s of mbs of modules takes an appreciable time - to the point where some folks webpack bundle their serverless code
Deleted Comment
Makes sense when you think about it I guess. Only used bundlers for front end stuff.
https://pnpm.io/npmrc#node-linker
I've been using pnpm and rush in my projecrs, and going back to npm at my employer's every day is such a chore.
* Classic node_modules: Dependencies of dependencies that can't be satisfied by a shared hoisted version are nested as true copies (OSes may apply copy-on-write semantics on top of this, but from FS perspective, these are real files). Uses standard Node.js node_modules resolution [1].
* pnpm: ~1 real copy of each dependency version, and packages use symlinks in node_modules to point to their deps. Also uses standard resolution. Requires some compatibility work for packages that wrongly refer to transitive dependencies or to peer dependencies.
* pnp[2]: 1 real copy of each dependency version, but it's a zip file with Node.js and related ecosystem packages patched to read from zips and traverse dependency edges using a sort of import map. In addition to the compatibility work required from pnpm, this further requires compatibility work around the zip "filesystem" indirection.
In our real-world codebase, where we've done a modest but not exhaustive amount of package deduplication, pnpm confers around a 30% disk utilization savings, and pnp around a 80% savings.
Interestingly, the innovations on top of classic node_modules are so compelling that the package managers that originally implemented pnpm and pnp (pnpm and Yarn, respectively) have implemented each others' linking strategies as optional configs [3][4]. If MacOS had better FUSE ergonomics, I'd be counting down the days for another linking strategy based on that too.
[1] - https://nodejs.org/api/modules.html#loading-from-node_module... [2] - https://yarnpkg.com/features/pnp [3] - https://github.com/pnpm/pnpm/issues/2902 [4] - https://github.com/yarnpkg/berry/pull/3338
npm showed me that I lack creativity, for I could not imagine anything worse than maven.
The ~/organization/project/release dir structure is the ONE detail maven got right. (This is the norm, the Obviously Correct Answer[tm], right?)
And npm just did whatever. Duplicate copies of dependencies. Because reasons.
Deleted Comment
Also, realistically, with a large enough repo, you will need to unplug things that mess w/ file watching or node-gyp/c++ packages, so there is some amount of duplication and fiddling required.
[1]: https://vercel.com/changelog/projects-using-pnpm-can-now-be-...
Some packages need one, some another, so I tried to switch to yarn (or yarn 2) for a package that I wanted to try out, but then other packages stopped working.
If there are clearly better algorithms, why not refactor npm and add them in experimental flags to npm and then setting them to default as they mature (with safe switching from one data structure or another)?
I tried pnpm and it didn't just work, so I gave up. I would revisit it, but npm works.
These days I don't really see a reason to use yarn (but would like to hear them).
Edit: I just learned from another comment that PNPM also supports Plug'N'Play :) Thanks steven-xu!
Here is a feature comparison: https://pnpm.io/feature-comparison
Now you have to maintain three different code paths, two of which depend on the behaviour of external projects, so you're always playing catch up.
That's such a bad idea on so many levels.
While node_modules has many flaws, in the current ecosystem all modes have their own pros and cons, and there isn't a "clearly better" algorithm: node_modules has less friction, PnP is sounder, and pnpm's symlinks attempt to be kind of an in-between, offering half the benefits at half the "cost".
Like in many computer science things, it's tradeoffs all the way. Part of why Yarn implements all three.
But I think it is best to use Yarn for PnP and pnpm for the symlinked node_modules structure.
I have zero trust in NPM.
Dependency install times went down by a huge amount, and all the strange issues we had with lerna and npm sometimes erroring out, requiring us to remove all node_modules folders and re-install everything, are just gone.
We even noticed that the size of some of our production bundles went down. Before some dependencies that were used in multiple packages were being duplicated in our webpack bundles needlessly, but the way pnpm symlinks dependencies instead of duplicating them fixed that as well.
The non-flat node_modules structure did break some things as well, since in some places we had imports pointing to packages that were transitive dependencies and not defined in package.json. I see this as a positive though, since all those cases were just bugs waiting to happen.
Though we're using git submodules + yarn workspaces right now. The submodules will likely go away eventually.
I guess their benchmarks cover both [0]. But I'm also curious about independent figures.
[0] https://pnpm.io/benchmarks