Micro-libraries should never be used

Micro-libraries are really good actually, they're highly modular, self-contained code, often making it really easy to understand what's going on.

Another advantage is that because they're so minimal and self-contained, they're often "completed", because they achieved what they set out to do. So there's no need to continually patch it for security updates, or at least you need to do it less often, and it's less likely that you'll be dealing with breaking changes.

The UNIX philosophy is also build on the idea of small programs, just like micro-libraries, of doing one thing and one thing well, and composing those things to make larger things.

I would argue the problem is how dependencies in general are added to projects, which the blog author pointed out with left-pad. Copy-paste works, but I would argue the best way is to fork the libraries and add submodules to your project. Then if you want to pull a new version of the library, you can update the fork and review the changes. It's an explicit approach to managing it that can prevent a lot of pitfalls like malicious actors, breaking changes leading to bugs, etc.

foul · a year ago

Micro-libraries anywhere else are everything you said: building blocks that come after a little study of the language and its stdlib and will speed up development of non-trivial programs.

In JS and NPM they are a plague, because they promise to be a substitute for competence in basic programming theory, competence in JS, gaps and bad APIs inside JS, and de-facto standards in the programming community like the oldest operating functions in libc.

There are a lot of ways for padding a number in JS and a decent dev would keep an own utility library or hell a function to copy-paste for that. But no. npm users are taught to fire and forget, and update everything, no concept of vendoring (that would have made incidents like left-pad, faker and colors less maddening, while vendoring is even bolt in npm and it's very good!). They for years copy-pasted in the wrong window, really, they should copypaste blocks of code and not npm commands. And God helps you if you type out your npm commands because bad actors have bought the trend and made millions of libraries with a hundred different scams waiting for fat fingers.

By understanding that JS in the backend is optimizing for reducing cost whatever the price, becoming Smalltalk for the browser and for PHP devs, you would expect some kind of standard to emerge for having a single way to do routine stuff. Instead in JS-world you get TypeScript, and in a future maybe WASM. JS is just doomed. Like, we are doomed if JS isn't, to be honest.

ivan_gammel · a year ago

The whole web stack must die and be replaced. JS, CSS, HTML, HTTP are huge cost center for global economy.

orhmeh09 · a year ago

Could you link to somebody who is teaching npm users to "fire and forget?" Someone who is promising a substitute for competence in basic programming theory? Clearly you and I do not consume the same content.

porcoda · a year ago

The UNIX philosophy is being a bit abused for this argument. Most systems that fall under the UNIX category are more or less like a large batteries-included standard library: lots of little composable units that ship together. UNIX in practice is not about getting a bare system and randomly downloading things from a bunch of disjointed places like tee and cat and head and so on, and then gluing them together and perpetually having to keep them updated independently.

ristos · a year ago

They ship together because all of those small composable units, that were once developed by random people, were turned into a meta-package at some point. I agree with you that randomly downloading a bunch of disjointed things without auditing and forking it isn't good practice.

I'm also not arguing against a large popular project with a lot of contributors if it's made up of a lot of small, modular, self-contained code that's composed together and customizable. All the smaller tools will probably work seamlessly together. I think UNIX still operates under this sort of model (the BSDs).

There's a lot of code duplication and bad code out there, and way too much software that you can't really modify easily or customize very well for your use case because it becomes an afterthought. Even if you did learn a larger codebase, if it's not made up of smaller modular parts, then whatever you modify has a significantly higher chance of not working once the library gets updated, because it's not modular, and you updated internal code, and the library authors aren't going to worry about breaking changes for someone who's maintaining a fork of their library that changes internal code.

syncsynchalt · a year ago

> randomly downloading things from a bunch of disjointed places like tee and cat and head and so on, and then gluing them together and perpetually having to keep them updated independently.

I have distressing news about my experience using Linux in the '90s

wizzwizz4 · a year ago

We should totally have a system like that, though. It'd be such a great learning environment.

ivan_gammel · a year ago

> So there's no need to continually patch it for security updates, or at least you need to do it less often, and it's less likely that you'll be dealing with breaking changes.

Regardless of how supposedly good or small is the library, the frequency at which you need to check for updates is the same. It doesn’t have anything to do with the perceived or original quality of the code. Every 3rd party library has at least the dependency on platform and platforms are big, they have vulnerabilities and introduce breaking changes. Then there’s a question of trust and consistency of your delivery process. You won’t adapt your routines based on specifics of every tiny piece of 3rd party code, so you probably check for updates regularly and for everything at once. Then their size is no longer an advantage.

> Copy-paste works, but I would argue the best way is to fork the libraries and add submodules to your project. Then if you want to pull a new version of the library, you can update the fork and review the changes.

This sounds “theoretical” and is not going to work at scale. You cannot seriously expect application level developers to understand low level details of every dependency they want to use. For a meaningful code review of merges they must be domain experts, otherwise effectiveness of such approach will be very low - they will inevitably have to trust the authors and just merge without going into details.

ristos · a year ago

They don't need to understand the low level dependencies. People can create metapackages of a lot of a bunch of self-contained libraries that have been audited and forked, and devs can pull in the metapackages. The advantage is the modularity, which makes the code easier to audit and is more self-contained.

When's the last time ls, cat, date, tar, etc needed to be updated on your linux system? probably almost never. And composing them together always works. This set of linux tools, call it sbase, ubase, plan9 tools, etc, is one version of a metapackage. How often does a very large package need to be updated for bug fixes, security patches, or new versions?

GuB-42 · a year ago

If these libraries are so small, self-contained and "completed", why not just copy-paste these functions?

Submodules can work too, but do you really need these extra lines in your build scripts, extra files and directories, and the import lines just for a five line function? Copy-pasting is much simpler, with maybe a comment referring to the original source.

Note: there may be some legal reasons for keeping "micro-libraries" separate, or for not using them at all though but IANAL as they say.

5Qn8mNbc2FNCiVV · a year ago

As soon as source code is in your repo it's way more probable to getting touched. I'd never open that box ever because I don't want to waste time with my team touching code that they shouldn't when reviewing.

If you want the same functionality, build it according to the conventions in the codebase and strip out everything else that isn't required for the exact use case (since it's not a library anymore)

Barrin92 · a year ago

">The UNIX philosophy is also build on the idea of small programs, just like micro-libraries, of doing one thing and one thing well, and composing those things to make larger things."

The Unix philosophy is also built on willful neglect of systems thinking. The complexity of system isn't in the complexity of its parts but in the complexity of the interaction of its parts.

Putting ten micro-libraries together, even if each is simple, doesn't mean you have a simple program, in fact it doesn't even mean you have a working program, because that depends entirely on how your libraries play together. When you implement the content of micro-libraries yourself you have to be at the very least conscious not just of what, but how your code works, and that's a good first defense against putting parts together that don't fit.

ristos · a year ago

It's not a willful neglect of systems thinking. Functional programmers have been able to build very large programs made primarily of pure functions that are composed together. And it makes it much easier to debug as well, because everything is self-contained and you can easily decompose parts of the program. Same with the effectful code as well, leveraging things like algebraic effects.

alerighi · a year ago

> The UNIX philosophy is also build on the idea of small programs, just like micro-libraries, of doing one thing and one thing well, and composing those things to make larger things.

They have small programs, but that are not of different project. For example all the basic Linux utilities are developed and distributed as part of the GNU coreutils package.

It's the same of having a modular library, with multiple functions in them, that you can choose from. In fact the problem is that these function like isNumber shouldn't even be libraries, but should be in the language standard library itself.

tgv · a year ago

> I would argue the problem is how dependencies in general are added to projects

But you need the functionality anyway, so there are two dependencies: on your own code, or on someone else's code. But you can't avoid a dependency, and it comes at a cost.

If you don't know how to code the functionality, or it will take too much time, a library is an outcome. But if you need leftPad or isNumber as an external dependency, that's so far in the other direction, it's practically a sign of incompentence.

6510 · a year ago

If incompetent it provides a way to be sure?

Could you for laughs explain for which cases these are, why they are needed and why they did it this way?

1) num-num === 0

2) num.trim() !== ''

3) Number.isFinite(+num)

4) isFinite(+num)

5) return false;

6) Why this specific order of testing? Why prefer Number.isFinite over isFinite?

https://www.npmjs.com/package/is-number

   module.exports = function(num) {
     if (typeof num === 'number') {
       return num - num === 0;
     }
     if (typeof num === 'string' && num.trim() !== '') {
       return Number.isFinite ? Number.isFinite(+num) : isFinite(+num);
     }
     return false;
   };

I would have just....

    isNumber = num => isFinite(num+''.trim());

Why is that not precisely the same? (it isn't)

how about...

   function isNumber(num){
     switch(typeof num){
       case "number" : return !isNaN(num);
       case "string" : return isFinite(num) && !!num.trim();
     }
   }

Is there a difference?

IMHO NPM should have a discussion page for this. There are probably interesting answers for all of those looking to copy and paste.

reaperducer · a year ago

The UNIX philosophy is also build on the idea of small programs, just like micro-libraries, of doing one thing and one thing well, and composing those things to make larger things.

This year I started learning FORTH, and it's very much this philosophy. To build a building, you don't start with a three-story slab of marble. You start with a hundreds of perfect little bricks, and fit them together.

If you come from a technical ecosystem outside the Unix paradigm, it can be hard to grasp.

ristos · a year ago

Yeah, exactly! FORTH looks really awesome, I haven't gotten around to learning it much though. I heard it's addictive and fun.

Yeah, it's all concatenative programming: FORTH, unix pipes, function composition as monoids, effect composition as kliesli composition and monads, etc.

It makes it super useful for code readability (once you're familiar with the paradigm), and debugging, since you can split up and decompose any parts of your program to inspect and test those in isolation.

bborud · a year ago

This has nothing in common with the UNIX approach. Awk, grep, sort, less and the like are perhaps small, but not that small and not that trivial.

samatman · a year ago

Unix has yes, tr, cut, true, false, uniq, nl, id, fold, sort, sleep, head, tail, touch, wc, date, cal, echo, cat...

These are tiny programs.

I mean, sort has put on some weight over the years, sure. But if it were packaged up for npm people would call it a micro-library and tell you to just copy it into your own code.

kazinator · a year ago

Right! So if it is indeed so easy to understand what is going on, why would you need to make it an external dependency that can update itself behind your back?

If you understand what is going on, paste it into your tree.

mattlondon · a year ago

> Micro-libraries are really good actually, they're highly modular, self-contained code

Well I think that is the point, they're not self-contained. You are adding mystery stuff and who knows how deep the chain of dependencies go. See the left-pad fiasco that broke so much stuff, because the chain of transitive dependencies ran deep and wide.

NPM is a dumpster fire in this regard. I try to avoid it - is there a flag you can set to say "no downstream dependencies" or something when you add a dependency? At least that way you can be sure things really are self-contained.

a_wild_dandan · a year ago

There is a "no downstream dependencies" option; it's called writing/auditing everything yourself. Everything else -- be it libraries, monolithic SaaS platforms, a coworker's PR, etc. -- is a trade off between your time and your trust. Past that, we're all just playing musical chairs with where to place that trust. There's no right answer.

ristos · a year ago

Yeah there's a way to do that, yarn and pnpm can flatten the dependency tree. You can add the fork directly too:

yarn add <path/to/your/forked/micro-library.git>

pnpm add <path/to/your/forked/micro-library.git>

IgorPartola · a year ago

I remember adding a random date picker that pulled in a copy of React with it to a non-React project. NPM is a dumpster fire at a nuclear facility.

Toutouxc · a year ago

Do you know what else is all of that? Writing the five lines of code by hand. Or just letting a LLM generate it. This and everything else I want to reply has already been covered in the article.

ristos · a year ago

Nothing wrong with that either, like I said copy paste works too. A lot of minimalistic programs will just copy in another project.

Forking the code and using that is arguably nicer though IMO, makes it easier pull in new updates from the code, and to be able to track changes and bug fixes easier. I've tried both and find this approach nicer overall.

jvanderbot · a year ago

Micro libraries are ok - TFA even says you can use self-contained blocks as direct source.

Mirco dependencies are a god damn nuisance, especially with all the transitive micro-dependencies that come along, often with different versions, alternative implementations, etc.

Ygg2 · a year ago

If you're writing micro libraries, without intending to reuse them, why are you making it a library?

jaredsohn · a year ago

>I would argue the problem is how dependencies in general are added to projects

I haven't done anything with this myself (just brainstormed a bit with chatgpt) but I wonder if the solution is https://docs.npmjs.com/cli/v10/commands/npm-ci

Basically, enforce that all libraries have lock files and when you install a dependency use the exact versions it shipped with.

Edit: Can someone clarify why this doesn't work? Wouldn't it make installing node packages work the same way as it does in python, ruby, and other languages?

ristos · a year ago

I'm not sure why you're getting downvoted. The left-pad incident on npm primarily impacted projects that didn't have lockfiles or were not pinning exact versions of their dependencies. I knew a few functional programmers that would freeze the dependencies to an exact version before lockfiles came around, just to ensure it's reproducible and doesn't break in the future. Part of what was to blame was bad developer practice. I like npm ci.

mewpmewp2 · a year ago

These days with LLMs, doing leftPad yourself is incredibly easy, I would just do that.

VonGallifrey · a year ago

With LLMs? I don't think something like leftPad was every difficult to create.

prng2021 · a year ago

Why even stop at micro-libraries? Instead of "return num - num === 0" why not create the concept of pico-libraries people can use like "return isNumberSubtractedFromItselfZero(num)" ? It's basically plain English right?

You could say that if all the popular web frameworks in use today were rewritten to import and use hundreds of thousands of pico-libraries, their codebase would be, as you say, composed of many high modular, self contained pieces that are easy to understand.

Dead Comment

Deleted Comment

Micro libraries are worse than no libraries at all - but I maintain they are still better than gargantuan "frameworks" or everything-but-the-kitching-sink "util"/"commons" packages, where you end up only using a tiny fraction of the functionality but have to deal with the maintenance cost and attack surface of the whole thing.

If you're particularly unlucky, the unused functionality pulls in transitive dependencies of its own - and you end up with libraries in your dependency tree that your code is literally not using at all.

If you're even more unlucky, those "dead code" libraries will install their own event handlers or timers during load or will be picked up by some framework autodiscovery mechanism - and will actually execute some code at runtime, just not any code that provides anything useful to the project. I think an apt name for this would be "undead code". (The examples I have seem were from java frameworks like Spring and from webapps with too many autowired request filters, so I do hope that is no such an issue in JS yet)

zahlman · a year ago

> but I maintain they are still better than gargantuan "frameworks" or everything-but-the-kitching-sink "util"/"commons" packages, where you end up only using a tiny fraction of the functionality but have to deal with the maintenance cost and attack surface of the whole thing.

Indeed. Several toy projects I've done were blown up in size by four orders of magnitude because of Numpy.

I only want multi-dimensional arrays that support reshaping and basic element-wise arithmetic, maybe matrix multiplication; I'm not even that concerned about performance.

But I have to pay for countless numerical algorithms I've never even heard of provided by decades-old C and/or FORTRAN projects, plus even more higher-math concepts implemented in Python, Numpy's extensive (and fragmented - there's even compiled code for testing that's outside of any test folders) test suite that I'll never run myself, a bunch of backwards-compatibility hacks completely irrelevant to my use case, a python-to-fortran interface wrapper generator, a vendored copy of distutils even in the wheel, over 3MiB of .so files for random number generators, a bunch of C header files...

[Edit: ... and if I distribute an application, my users have to pay for all of that, too. They won't use those pieces either; and the likelihood that they can install my application into a venv that already includes NumPy is pretty low.]

I know it's fashionable to complain about dependency hell, but modularity really is a good thing. By my estimates, the total bandwidth used daily to download copies of NumPy from PyPI is on par with that used to stream the Baby Shark video from YouTube - assuming it's always viewed in 1080p. (Sources: yt-dlp info for file size; History for the Wikipedia article on most popular YouTube videos; pypistats.org for package download counts; the wheel I downloaded.)

DonHopkins · a year ago

Sometimes importing zombie "undead code" libraries can be beneficial!

I just refactored a bunch of python computer vision code that used detectron2 and yolo (both of which indirectly use OpenCV and PyTorch and lots of other stuff), and in the process of cleaning up unused code, I threw out the old imports of the yolo modules that we weren't using any more.

The yololess refactored code, which really didn't have any changes that should measurably affect the speed, ran a mortifying 10% slower, and I could not for the life of me figure out why!

Benchmarking and comparing each version showed that the yololess version was spending a huge amount of time with multiple threads fighting over locks, which the yoloful code wasn't doing.

But I hadn't changed anything relating to threads or locks in the refactoring -- I had just rearranged a few of the deck chairs on the Titanic and removed the unused yolo import, which seemed like a perfectly safe innocuous thing to do.

Finally after questioning all of my implicit assumptions and running some really fundamental sanity checks and reality tests, I discovered that the 10% slow-down in detectron2 was caused by NOT importing the yolo module that we were not actually using.

So I went over the yolo code I was originally importing line by line, and finally ran across a helpfully commented top-level call to fix an obscure performance problem:

https://github.com/ultralytics/yolov5/blob/master/utils/gene...

    cv2.setNumThreads(0)  # prevent OpenCV from multithreading (incompatible with PyTorch DataLoader)

Even though we weren't actually using yolo, just importing it, executing that one line of code fixed a terrible multithreading performance problem with OpenCV and PyTorch DataLoader fighting behind the scenes over locks, even if you never called yolo itself.

So I copied that magical incantation into my own detectron2 initialization function (not as top level code that got executed on import of course), wrote some triumphantly snarky comments to explain why I was doing that, and the performance problems went away!

The regression wasn't yolo's or detectron2's fault per se, just an obscure invisible interaction of other modules they were both using, but yolo shouldn't have been doing anything globally systemic like that immediately when you import it without actually initializing it.

But then I would have never discovered a simple way to speed up detectron2 by 10%!

So if you're using detectron2 without also importing yolo, make sure you set the number of cv2 threads to zero or you'll be wasting a lot of money.

conradludgate · a year ago

This is mortifying. This should not be acceptable implicit behaviour for imports to implicitly run code by simply existing