My rule is, don't use a dependency to implement your core business. Is JSON parsing our core business? No, so why would we ever write -- and thereby commit to supporting for its entire lifetime -- JSON parsing code? All the code you write and support should be directly tied to what you as a business decide are your fundamental value propositions. Everything else you write is just fat waiting to be cut by someone who knows how to write a business case.
To be clear, this is about the lifetime support of code. It's very, very rare that code can be written once and never touched. But that long tail of support eats up time and money, and is almost always discounted in these conversations. I don't even care that Jackson JSON parsing has years of work behind it, when I can hack together a JSON parser in a day. I care that Jackson will continue to improve their offering without any further input, while that's not true of my version.
> don't use a dependency to implement your core business
In logic language, you're saying "If X is your core business, don't outsource X".
> Is JSON parsing our core business? No, so why would we ever write -- and thereby commit to supporting for its entire lifetime -- JSON parsing code? All the code you write and support should be directly tied to what you as a business decide are your fundamental value propositions. Everything else you write is just fat waiting to be cut by someone who knows how to write a business case.
The rest of your argument is interpreted as "If X is not your core business, don't in-house X".
These two logical implication statements are not equivalents of each other, but are converses. Casual language often conflates If, Only-If, and If-And-Only-If.
You should spend time implementing your core business implies that you shouldn’t spend time implementing things that aren’t in your core business, otherwise the first statement is pretty useless.
I think the problem is that the individual contributor has decided to make that chunk of logic their business. This will probably not benefit the team or the organization.
Well, one special edge-case would be where you only need to parse some extremely tiny subset of JSON (for example: you only need to parse dictionaries whose keys and values are positive integers, like {1:2,3:4}). Then, depending how expensive the full json parser is, it might be worth your while just writing the limited parser yourself.
Of course, you might say, inevitably feature-creep will expand the list of things your parser needs to parse, but that's not a law of physics. Sometimes in certain limited, well-defined projects, it really is true that YAGNI.
Your example is more apt than intended: That's not valid json, which only allows string keys. If you use a library it'll either barf now or later when they fix it, so if you're forced to work with an API like that and can't change it, a custom parser is really the only way to go.
You can also apply YAGNI to 'do we need our own custom parser'?
You don't know what your requirements are. The customers haven't told you yet.
If you pick a library with a straightforward interface, especially one that isn't too opinionated, you can always drop in a custom implementation later on. Frameworks, not so much (but that cuts both ways; the people who will write libraries often love writing frameworks too)
> Of course, you might say, inevitably feature-creep will expand the list of things your parser needs to parse
If you've done your parser correctly, you'll be able to replace its implementation with the new dependency, with little to no need for extra refactoring in the rest of the codebase.
I think a JSON parser is not a good example though — takes longer than a few hours / an afternoon, to write a JSON parser, add tests, fix bugs, corner cases. More like a week, or weeks, ...
I suppose a JSON parser was just an example. Made the whole answer sound weird to me though :- ) when the blog is about afternoon-projects and then a reply is about a week(s), could be month(s), long project.
Same with CSV. It looks easy, but it isn't. I've never seen anyone who writes their own CSV parser actually implement features necessary to conform to the standard like quoting and escape sequences. The end result is software that breaks when delimiters or quotes appear in user input. Honestly, I prefer xlsx spreadsheets because of that. Nobody fools themselves into implementing the parser or serializer for the format themselves. The only tiny pitfall with them is when people create spreadsheets manually in excel and write numbers as text, but parsing strings to numbers is absolutely trivial. You have to do that with CSV anyway.
> I think a thing is not a good example though — takes longer than a few hours / an afternoon, to write a thing, add tests, fix bugs, corner cases. More like a week, or weeks, ...
You're making my point for me. This is exactly what I meant by the lifetime of support you're signing up for by writing lines of code. Once you write that code, you're now in the business of supporting that code. Was that a good decision for your business?
There's a fair middle ground when the dependency in itself doesn't have dependencies, and is small enough with a permissive license such that the entirety of its code can be dropped in to your project. Especially for very specific functionalities. I have used such tiny xml parsers, and I'm not affected by the fact that my copy is no longer the latest version. Its not so far from copying and pasting snippets of existing code.
Great rule. I was wondering, how do you manage updating the Jackson JSON parsing package. What if you have 100 such packages and they get updated weekly with breaking changes ?
If you have a hundred direct dependencies and they all break the API on a weekly basis then: you are either at a scale where you can handle that, or you are using wrong dependencies, or you are doing something wrong.
I can understand max 10 dependencies iterating so quick. But only when they are your own internal dependencies and these should definitely not break the API weekly.
For what reason are you updating your packages? Is there a severe security issue in that package or, if it works today, could you pin it to that version and wait until there is a compelling reason to update it.
Here's some reasoning - if this project was inhoused would we detect and patch it any quicker? Would we have a dev constantly assigned to it that would be pushing out patches to the rest of the team... or is it the sort of software we'd write once and then wait until a compelling reason to invest more into. Whether software is inhouse or outsourced you still retain decision making about how much time to invest in its maintenance.
Only update dependencies when your code requires the new version, depends on a bug fix or it fixes a security vulnerability. Otherwise, continue using the same version.
Have good test coverage to catch bugs that may originate in dependencies and subscribe to a third-party service to track vulnerabilities in your dependencies.
There's lots of opinions on this, all with good justification. My current team leaves most dependencies unlocked and depends on good automated tests to sniff out broken dependencies. If necessary we lock dependencies to a particular version or range (e.g. <2.0.0). Once tested, we freeze for distribution.
Some people just never upgrade until they need to. That's workable, though when you do need to upgrade a package you may be spending the rest of the week working out a cascade of breaking changes.
Beside _lifetime support_, working on that core business feature make us _understand_ deeply about the that feature.
I've seen people integrate dependency for their core business. It helped to get started fast, but will create a blockage that required understanding deeper to overcome
So you're saying that I should implement my own ormapper just because my product is using a database?
And even this is not thee case, writing everything yourself will end up in your own hands. No Bugfixes, no patches or improvements without spending man work.
I've worked in such a company and it was a mess accompanied by dev leaders who's to proud of their code to allow any change.
I'm confused by your response. Is your core business mapping objects to databases? As in, that's what you get paid for? If not, my heuristic is that you should not be writing an ORM tool.
Quite agree, every single line of code written requires lifetime support. Code adds up and reduces productivity gradually, so only write code in core business logics.
"This'll take an afternoon" - three weeks later......
Programmers are notorious for this.
BUT even apart from this problem ... you absolutely should use every dependency you can that will save you time.
Try to write less code not more. When you write code you write bugs, add complexity, add scope increase need for testing, increase the cognitive load required to comprehend the software, introduce the need for documentation..... there's a vast array of reason to use existing code even if you truly could estimate it and build it in an afternoon.
You also assume that you understand all the edge cases and fickle aspects of the dependency, all the weird ins and outs that the dependency author probably spent much resources understanding, fixing and bug hunting.
There's a hard fact that proves the above poster to be wrong..... how many dependencies took only an afternoon of time in total to write? Hard to say (maybe look at the github commit history) but I'd guess almost none. It didn't take the dependency author an afternoon, so why will it take you an afternoon?
Even worse .... you just lost an afternoon coding features for your core application.
Multiply this by every dependency that "you could build in an afternoon" and you'll be in Duke Nukem Forever territory.
I'd advise doing the opposite of this articles suggestion.
Find a dependency that will save you an afternoon? Grab it.
- Dependencies break over time. They have a nonzero maintenance cost.
- They impose API boundaries on you that may not fit your existing data structures
- It's harder to change underlying bugs
- They might introduce security issues
Sure, use dependencies. But there's a reasonable position between "never write any code" and "never take on dependencies". Of which NPM is one of the only ecosystems being at one extreme.
And when you run into a bug or design problem in a dependency of a dependency of a dependency?
It often takes less time to write some code than to understand someone else's code.
Most programmers I've worked with get lost easily when jumping through layers of other people's code. I certainly do.
Solid, well tested dependencies that solve hard problems are worthwhile. But dependencies have a cost in debuggability and maintenance, so it's worth using them with care. And often, they aren't worth the time, when compared to writing a dozen lines of code.
While I agree that if you think it'll just take an afternoon, for the sake of this article it had better!
But conceding that charitable assumption to the article, I agree with its basic premise: dependencies cost a lot of time in diffuse, non-codey ways.
There are AAA dependencies you pull into every project, but most other dependencies require a good degree of due diligence, evaluation, risk, and their own long-term maintainance.
Its not that it always tips the scales all the way to 'roll your own', but
I think the cost of new dependencies is underrated.
> you absolutely should use every dependency you can that will save you time
> Find a dependency that will save you an afternoon? Grab it.
Agree. The point of the article, though, is that dependencies are often saving much less time than they promise - so much less that it's better to avoid them.
> "This'll take an afternoon" - three weeks later......
> Programmers are notorious for this.
From my experience with these personal failings, the problem usually comes from the question being phrased in the context like, "before you begin working on this, how long do you think this will this take you to complete?". If there's no opportunity to scope, with requires not insignificant work towards the solution, the estimates will always be wrong. If I understand the actual scope of the problem, which means have the architecture mostly worked out, and have a bit of experience (and luck), my estimates can be pretty close, usually eaten up by that oh-so-seductive feature creep that ruins my work file balance.
Exactly. I'm not reinventing the wheel. I may write some convenience wrapping around Spring Security, for instance, but why would I rewrite auth-z when it's a solved problem?
> you absolutely should use every dependency you can that will save you time.
Absolutely. As long as it does save you that time over the foreseeable lifetime of the project. Or you are deliberately incurring a technical debt because of some deadline.
On the other hand, saving an afternoon (or even a week), over the next two weeks means very little.
Essentially, it'll take you an afternoon to write and then weeks of work properly fixing the bugs and handling the edge cases. Potentially and probably, while you're trying to do something else.
That was my first thought: I've seen these projects before — they're where you find 5 slightly different implementations of similar logic, no logging or tests, failures as soon as someone uses Unicode, etc. and I get an order of magnitude performance improvement by replacing that code with an external module which has had the other 19 afternoons' worth of work it actually takes.
> and I get an order of magnitude performance improvement
Have you heard the adage about premature optimization being the root of all evil? Yes, even with the second part. What is the premature optimization here, in your opinion?
Most of cases developers create something new - that's the state of industry now, not too good but it's how it is. If you'd be refactoring the existing code - sure, find the problem, design the solution, have reasons going from A to B. If, however, you're writing new functionality, you don't know if you'll have problems of this kind with this code - so optimize for developmentality. You can remove those excessive crutches later - if and when you need them. In my experience, having them trumps looking into code and spending time figuring what it does mere months later - your own code, that is.
It really depends (haha) on what it is. I needed to copy a file in npm scripts. can't use `cp` because that fails on windows. I looked on npm to copy a file. First hit 197 dependencies, 1170 files, 47000 lines of JavaScript.
Taking 197 dependencies means 197 things that need updates several times a year at a minimum. Any of those updates could break my code, introduce a bug, add a vulnerability on top of the ones already in the packages. So it's not like adding more dependencies is magically free.
I’ve been working on the same medium-size (fewer than 1M LoC) codebase for about 7 years now. I feel like over the years, my estimates of how long something will take have gotten better for one reason: I’ve found the scaling factor I have to apply to my intuitive estimate that brings into the realm of reason.
So, if I think something looks like about a day’s work, I’ll actually estimate it at about 3.5 or 4 days. Thus, for a project to qualify as “just an afternoon,” I’d have to naively estimate it at under an hour.
I rarely have time to spare, but I also rarely go over by more than maybe a third.
Your multiplier may vary depending on how horrifying your codebase is. On a side project with good test coverage, my multiplier is only about 2.
This goes both ways. When was the last time someone properly scoped the maintenance effort of an external library? This goes double for external systems, like kafka or mysql. I've never seen anyone so far even get within two orders of magnitude of the real cost of operating kafka, much less an organization that accurately compared that to the cost of DIY.
The "don't reinvent the wheel" argument often acts as though using a 3rd party lib is "free", and building it yourself is costly with no benefit.
This is sometimes true, but often not. From SFTP libraries to SVG rendering libraries, there have probably been about 3-5 major dependencies of my company's project that I have had to learn and extend or fix bugs in to make them work just in the last year.
And sometimes this means using our own fork that we have to keep maintained.
I'm not saying I would have rather written these particular dependencies from scratch, but they were definitely not cost free. Nor are they all of better quality than what I would have produced had I written them from scratch.
That's the other common refrain - to "defer to the expertise of the crowd".
Don't get me wrong, many 3rd party libraries are of great quality by amazing men and women who I am very thankful for. But certainly not all of them.
There's no magic that says "every third party library is made by an expert with the highest standards".
Is that really 5 minutes? (For when left-pad was relevant)
var cache = [
'',
' ',
' ',
' ',
' ',
' ',
' ',
' ',
' ',
' '
];
function leftPad (str, len, ch) {
// convert `str` to a `string`
str = str + '';
// `len` is the `pad`'s length now
len = len - str.length;
// doesn't need to pad
if (len <= 0) return str;
// `ch` defaults to `' '`
if (!ch && ch !== 0) ch = ' ';
// convert `ch` to a `string` cuz it could be a number
ch = ch + '';
// cache common use cases
if (ch === ' ' && len < 10) return cache[len] + str;
// `pad` starts with an empty string
var pad = '';
// loop
while (true) {
// add `ch` to `pad` if `len` is odd
if (len & 1) pad += ch;
// divide `len` by 2, ditch the remainder
len >>= 1;
// "double" the `ch` so this operation count grows logarithmically on `len`
// each time `ch` is "doubled", the `len` would need to be "doubled" too
// similar to finding a value in binary search tree, hence O(log(n))
if (len) ch += ch;
// `len` is 0, exit the loop
else break;
}
// pad `str`!
return pad + str;
}
The issue I have with this is a lack of specification. Left pad _what_?
Numbers or ASCII-only-printing? OK that's a reasonable. Is there a desired overflow behavior?
Past that it becomes more an issue of where and why. The suddenly not-trivial example includes questions about fonts, layout, and multi-byte characters. Emoji, etc.
Incidentally, in pseudoscope:
Create a valid full-space pad string (termination / etc), then decrement back from the end of the source string and over-write the pad characters from the end to the start of the string, exiting either on no more pad characters or no more input.
A second algorithm might combine those two steps as one pass, fill the output buffer from back to front. Only for C style strings would this be an issue given the dynamic end point for the data structure.
We all too often forget the scope: requirements, developing, testing, to say the least.
My favorite example is NPM. While the author has a point, I tend to rely on the wisdom of the crowd. Sometimes there is a reason why a couple of million developers - in the case of NPM packages - seem to be lazy.
In my experience, we ended up copy/pasting and modifying some code and syncing it with the "superfluous" package. Good intentions, badly executed.
Leftpad was the right itch at the right time and people found better ways to deal with NPM. NPM got better after that, as well as native implementations.
Ironically, these days with front end development I'm finding it hard to accurately scope how long it will take to incorporate 3rd-party dependencies. The docs make it seem straightforward enough, but they don't cover how to use it correctly under TypeScript instead of ES, or how to use it with Angular instead of React, or how to build it with Rollup instead of webpack, and I often spend an entire day googling obscure blog posts on how to get a dependency working in my own ecosystem.
Well, when the programmer was burned too much by incorrect scoping before?
Don't buy generalized statements like "programmers are always underestimate efforts needed", or even, for that matter, "a task always requires all the possible time it might take" (Parkinson's law). There are exceptions from them :) which sometimes, in a good team, look more than laws themselves.
I do this all the time. My head tells me "five lines, tops" -- corresponding to about 10 minutes of "programming." Add in testing, bugs, another 10-20 lines of comments and docs, we're looking at an afternoon.
Never do I give that raw 10-minute estimate to anybody, because it can be wrong by a factor of 10.
A year ago I needed a min-heap to build a priority queue at work.
So first I grabbed 'heap' from npm (272k weekly downloads) and set it to work. But a few days later I realized my code was executing slower than expected because it sometimes needed to clone the data structure, and the clone instantiation would break the heap invariant in the array internals. It turned out there's been an issue open about this since early 2017.
Then I went for the 'collections' package (35k weekly downloads) and brought in its heap implementation. That worked like a charm for about six months until a bug came in that made it seem like a completely different package was breaking. After almost a whole day of debugging, it turns out that 'collections' silently shims the global Array.from function (scream emoji) without mimicking its behavior when dealing with non-Array iterables presented by the other package.
So finally I wrote my own heap -- well, I cribbed from Eloquent JavaScript [0] but I did have to briefly remember a little bit about how they're supposed to work. So while I don't totally buy the "Never..." rule in the post title, thinking more carefully about writing versus importing a dependency would have saved me a great deal of headache in this case.
This would be a perfect spot to solve those issues and make the packages better... Or y'know, publish your implementation and have it used by people with the same issue
OP here: A lot of people are objecting, "What if you estimate wrong, and it takes more than an afternoon?" This objection is very bad.
It is not possible to add a new dependency in less than afternoon because you need to evaluate alternatives, test it, learn how it works, make sure it's not accidentally GPL, etc. So there are not two methods, the less-than-an-afternoon method and the more-than-an-afternoon method. There are two methods that both take at least one afternoon. If you estimate wrong and you can't write the code in an afternoon… Then stop working on your handwritten version and find a dependency? But now you know what the dependency is really supposed to do and why it's non-trivial, so you're in an even better position to evaluate which ones are good.
> But now you know what the dependency is really supposed to do and why it's non-trivial, so you're in an even better position to evaluate which ones are good.
I came in here to say this. If you think you're not qualified to write the function, you're probably also equally unqualified to choose someone else's implementation of it.
There is a lot of stuff out there-- stuff which is widely used-- which is not fit for your purposes, ... perhaps not for anyone's. And there is no replacement for a bit of domain expertise.
Not a lot of people can correctly write cryptography code on the first try, but we definitely advocate for people pulling well known cryptography libraries and using them instead of building their own, for obvious reasons. Not many people are qualified to write a lot of things, but are capable of making sound dependency judgements with heuristics. The trick is to use good heuristics and to not use a library for every tiny thing.
I probably spend almost 2 months evaluating hashmaps.
There are a dozen separate maps in FreePascal standard library. But they all have some issues. Like a max key length of 255 chars, only working with orderable keys, not rehashing itself, treemap instead of a hashmap, or actually not working...
In the end I used a map from another library and modified it heavily. It also has a big issue of not really deleting items, only keeping a tombstone that is not removed until rehashing, but the advantage is that it keeps insertion order
> It is not possible to add a new dependency in less than afternoon because you need to evaluate alternatives, test it, learn how it works, make sure it's not accidentally GPL, etc
For a small module that would take less than an afternoon. Checking a module license takes less than it took to read the comment.
This sounds like pre-mature optimization. If the library does what you need it to do, use it. If it becomes a problem later, then optimize it.
That last thing you want to do is spend a bunch of time reimplementing code when: 1) It may not matter at all, 2) You might miss important edge cases, or 3) You got everything right but you still have to maintain it forever.
If it's going to take you three days to integrate the library, maybe it's not such a good library, or maybe it's really complicated because there are a lot of edge cases. In that case, dig into the code and see if you can figure out what it's doing.
But if you think you can spend an afternoon rewriting a library that would take three days to integrate, there is a good chance you might be missing something important.
I remember starting with an overloaded library from NPM where I used some basic functionality. That worked fine for a while. When later I got lost fixing a defect in the tangle, I just ripped out the parts I needed and made a trimmed version. The interface remained the same, for the parts I was using. Nothing to adapt.
In this way I had little investment in the beginning. And once I knew what I needed, it was another small investment to clean the code.
It's a matter of judgement, but here's a few observations:
- With a little experience, you know what gets fiddly and what doesn't. Today for instance, I needed a way to remove tags in an SVG document, which looks a lot like HTML tags. I quickly ended up finding that Regex is not the solution (a well known guy on SO wrote an answer that looks like a huge warning sign). I also couldn't enumerate all the corner cases. So I found a lib that does it, along with an SO answer that turns it into a two-liner.
- Dependencies vary in quality. Some are basically like another standard lib. Boost for instance is very well used. The tough ones are where the lib seems to be "finished", where there seem to be few commits recently, but the project was once lively and functional. IIRC libev comes to mind here. And then there are the totally dead projects, where there's a load of issues open and nobody saying anything.
- Try to lock down versions. If you get a thing working with a certain version, there's no reason you need the newest new as soon as it's pushed. You can probably live with doing a scan for updates now and again.
- Your afternoon of programming needs to have a clear end. That hashmap you wrote will very likely spew out issues over the next few days. CSV parser, maybe. Bessel function, that'll work.
Specifically on the SVG filtering example, which I think is a good illustration of when to use or not use a dependency:
Writing an SVG (or at least XML) parser is a necessary task for writing a filter that doesn’t get stuck due to weirdo issues. That is way more than an afternoon of work! But once you have a parser, dropping tags you don’t want or transforming them somehow is totally an afternoonable task size. So, do use a dependency for SVG parsing, but don’t look for a special “SVG filter all” package. Just do the filtering yourself.
>Try to lock down versions. If you get a thing working with a certain version, there's no reason you need the newest new as soon as it's pushed. You can probably live with doing a scan for updates now and again.
Agree ! it irks me a lot that I often see update bots tracking new releases.. it is just begging to be exposed to regressions.
We need to find a happy medium though. Otherwise whenever you actually need to update something (e.g. you need add a new dependency which only handles one of your other dependency if it jumps 20 releases ), you have a huge version gap to cover.
Your dependency that you could code "in an afternoon" may handle far more corner cases than you suspect. (That may be what you meant by "fiddly".) Sure, you don't care about covering all those corner cases... but you might care about some, even some that you haven't thought about yet. And you might care about some more next month. That can make that "afternoon" take a lot longer than you expect.
Avoiding dependencies is a noble goal, and something to be valued, but this simple rule is too simplistic.
The problem lies in the fact that there are a great many things I can hack together in an afternoon to "replace" some kind of external dependency, but the quality discrepancy of these hacks is highly variant. My understanding of what can or should be done in an afternoon might differ with my colleagues'.
Unfortunately, like all things in engineering, you have to carefully reason about the pros/cons, requirements, and costs. After that analysis, you can make a judgment on depend vs. build (also, buy vs. build).
Agreed. For libs that are "afternoon-y" in their scope (so, not an HTTP server or crypto), if you need to get off the fence you can use some cheap heuristics to assess the quality of a library without auditing its code. For instance, you can look at its popularity (in downloads or Github stars), its release/version history, its number of open issues, and its development activity. If I see high issue counts and many major releases with breaking changes, I'm going to avoid it. If I see 2+ years of stability with mostly minor releases, low issue counts, and high use rates, I figure it's going to probably be better than whatever half-baked solution I could scribble in an afternoon.
I wouldn't consider a high number of open issues a problem on its own. All big popular projects with a history have a high number of open issues. There are some exceptions, who may be closing isses aggressvely, but it is more about a style of managing of those issues, not about project health.
Over time an issue tracker inevitably becomes a collection of hard-to-reproduce bugs, incomplete patches, underspecified feature requests, random tracebacks, etc. Maintainers can choose to just close everything which is not actionable immediately, or be in comfort with such issues, and let them live in the bug tracker. I personally like a style when an issue is closed only if it is fixed, or if it doesn't contain useful information, or if it is a duplicate.
A better indicator is activity and responsiveness of the maintainers in the issue tracker.
I don't really worry about something I could write in an afternoon.
I can look at the code, get a good grasp of it (hopefully), judge the quality, docs, prospects of getting updates/needing updates/being able to update it myself, pretty comfortably. In other words, the risk evaluation is incredibly straight forward.
Additionally, the risk itself is fairly low. If it goes out of date or stops working or just turns out to suck, the most I risked is an afternoon of work. Leftpad was a debacle due to it's scale, but fixing Leftpad was pretty easy (I'm not recommending importing one liners as dependencies mind you)
-
But when it comes to stuff that isn't small, it's usually also the kind of stuff that holds the most insane amounts of risk for a project and is the hardest to evaluate.
Stuff like application frameworks, threading frameworks, massive networking libraries, etc.
The interface is _huge_. To the point that even when you try and wrap their complexity in nice packages with separation of concern and encapsulation they leak out into the rest of your code and end up being a nightmare to ever change.
Instead of spending an afternoon writing dependencies like this, spend that time investigating your "too-big-to-fail" dependencies. Try and keep a finger on their pulse, because they're the ones that will really come back to bite you if things go south.
> Additionally, the risk itself is fairly low. If it goes out of date or stops working or just turns out to suck, the most I risked is an afternoon of work.
Sometimes, the opportunity cost (time spent) is the largest term in the risk equation, but often there are other terms that might be orders of magnitude larger. For example, the risk of depending on the wrong abstraction, or becoming coupled to a hack.
What you're saying makes sense. My only point is that there's a lot more subtle judgment required in these decisions than often meets the eye.
A simple example would be an HTTP client. It’s easy to write a naive thing that makes GET requests with no request body, TLS, connection pooling, etc. Why should I use a dependency when I can write it in an afternoon? Well, I used to think that before I tried writing one :) The first draft was easy. Adding features got messy.
I had the opposite experience. All I needed was a way to do a simple GET. That's it (and that's all it still is, by the way). Instead of spending half an hour writing the code, I decided to use libcurl---that's what it's for, right?
Until I found it wasn't installed on some of our test machines (it was needed for testing, not for production and for reasons beyond my pay grade, libcurl was not available on the machines). Then I thought, well, I could include the libcurl into our vendor repo. It worked, but it was a nightmare to use. It took way too long to figure out the proper "configure" options to use for what systems, it nearly tripled the time to build it on the build servers, and even then, it was hit-or-miss.
After several years of this misery, I removed libcurl, and wrote what I should have years earlier. Using libcurl as a dependency did NOT save us any time.
> The problem lies in the fact that there are a great many things I can hack together in an afternoon to "replace" some kind of external dependency, but the quality discrepancy of these hacks is highly variant.
Perhaps it's a domain-specific thing, but when someone uses the words "hack together" I imagine it means using dependencies without really understanding what's going on in them, precisely to avoid figuring out how to code a solution properly.
Writing it yourself obviously needs to also imply doing it correctly. Even if that means you must learn a bit about what is the right way to do it (a side benefit, though usually viewed as a downside).
"This HN discussion https://news.ycombinator.com/item?id=24123878 is topical for me: at this very moment, I am implementing C++ MFCC code myself, because my attempt to integrate Kaldi (on windows) was unpleasant. It already took more than an afternoon, but I learned good things! \
I'm more sympathetic to the Use-All-The-Dependencies crowd than some might suppose. It definitely isn't my way, but I see them as a fellow subclass of programmer, evolved for other environments. It is amazing what can be cobbled together in a weekend now. \
The old Knuth vs McIlroy story is relevant: http://leancrew.com/all-this/2011/12/more-shell-less-egg/
Generally, use-the-tools is correct, but sometimes you really do want a Knuth (or maybe a Carmack)."
"
To be clear, this is about the lifetime support of code. It's very, very rare that code can be written once and never touched. But that long tail of support eats up time and money, and is almost always discounted in these conversations. I don't even care that Jackson JSON parsing has years of work behind it, when I can hack together a JSON parser in a day. I care that Jackson will continue to improve their offering without any further input, while that's not true of my version.
In logic language, you're saying "If X is your core business, don't outsource X".
> Is JSON parsing our core business? No, so why would we ever write -- and thereby commit to supporting for its entire lifetime -- JSON parsing code? All the code you write and support should be directly tied to what you as a business decide are your fundamental value propositions. Everything else you write is just fat waiting to be cut by someone who knows how to write a business case.
The rest of your argument is interpreted as "If X is not your core business, don't in-house X".
These two logical implication statements are not equivalents of each other, but are converses. Casual language often conflates If, Only-If, and If-And-Only-If.
You should spend time implementing your core business implies that you shouldn’t spend time implementing things that aren’t in your core business, otherwise the first statement is pretty useless.
outsource = O
Object = x
x ∉ C ↔ O(x)
If the symbols don't show up:
if-and-only-if x is not in C, O(x)
Of course, you might say, inevitably feature-creep will expand the list of things your parser needs to parse, but that's not a law of physics. Sometimes in certain limited, well-defined projects, it really is true that YAGNI.
You don't know what your requirements are. The customers haven't told you yet.
If you pick a library with a straightforward interface, especially one that isn't too opinionated, you can always drop in a custom implementation later on. Frameworks, not so much (but that cuts both ways; the people who will write libraries often love writing frameworks too)
[1]: https://branchfree.org/2019/02/25/paper-parsing-gigabytes-of...
> Of course, you might say, inevitably feature-creep will expand the list of things your parser needs to parse
If you've done your parser correctly, you'll be able to replace its implementation with the new dependency, with little to no need for extra refactoring in the rest of the codebase.
... Look, a tiny json parser — Not an afternoon project: https://github.com/rafagafe/tiny-json/blob/master/tiny-json....
And a question about small JSON parsers — didn't see any afternoon projects among the answers:
https://stackoverflow.com/questions/6061172/smallest-less-in...
I suppose a JSON parser was just an example. Made the whole answer sound weird to me though :- ) when the blog is about afternoon-projects and then a reply is about a week(s), could be month(s), long project.
You're making my point for me. This is exactly what I meant by the lifetime of support you're signing up for by writing lines of code. Once you write that code, you're now in the business of supporting that code. Was that a good decision for your business?
I can understand max 10 dependencies iterating so quick. But only when they are your own internal dependencies and these should definitely not break the API weekly.
* corrected spelling
Here's some reasoning - if this project was inhoused would we detect and patch it any quicker? Would we have a dev constantly assigned to it that would be pushing out patches to the rest of the team... or is it the sort of software we'd write once and then wait until a compelling reason to invest more into. Whether software is inhouse or outsourced you still retain decision making about how much time to invest in its maintenance.
Have good test coverage to catch bugs that may originate in dependencies and subscribe to a third-party service to track vulnerabilities in your dependencies.
Some people just never upgrade until they need to. That's workable, though when you do need to upgrade a package you may be spending the rest of the week working out a cascade of breaking changes.
The solution to that is simple, stop using node.js ;)
Beside _lifetime support_, working on that core business feature make us _understand_ deeply about the that feature.
I've seen people integrate dependency for their core business. It helped to get started fast, but will create a blockage that required understanding deeper to overcome
"This'll take an afternoon" - three weeks later......
Programmers are notorious for this.
BUT even apart from this problem ... you absolutely should use every dependency you can that will save you time.
Try to write less code not more. When you write code you write bugs, add complexity, add scope increase need for testing, increase the cognitive load required to comprehend the software, introduce the need for documentation..... there's a vast array of reason to use existing code even if you truly could estimate it and build it in an afternoon.
You also assume that you understand all the edge cases and fickle aspects of the dependency, all the weird ins and outs that the dependency author probably spent much resources understanding, fixing and bug hunting.
There's a hard fact that proves the above poster to be wrong..... how many dependencies took only an afternoon of time in total to write? Hard to say (maybe look at the github commit history) but I'd guess almost none. It didn't take the dependency author an afternoon, so why will it take you an afternoon?
Even worse .... you just lost an afternoon coding features for your core application.
Multiply this by every dependency that "you could build in an afternoon" and you'll be in Duke Nukem Forever territory.
I'd advise doing the opposite of this articles suggestion.
Find a dependency that will save you an afternoon? Grab it.
Dependencies have costs:
- Dependencies break over time. They have a nonzero maintenance cost.
- They impose API boundaries on you that may not fit your existing data structures
- It's harder to change underlying bugs
- They might introduce security issues
Sure, use dependencies. But there's a reasonable position between "never write any code" and "never take on dependencies". Of which NPM is one of the only ecosystems being at one extreme.
It often takes less time to write some code than to understand someone else's code.
Most programmers I've worked with get lost easily when jumping through layers of other people's code. I certainly do.
Solid, well tested dependencies that solve hard problems are worthwhile. But dependencies have a cost in debuggability and maintenance, so it's worth using them with care. And often, they aren't worth the time, when compared to writing a dozen lines of code.
But conceding that charitable assumption to the article, I agree with its basic premise: dependencies cost a lot of time in diffuse, non-codey ways.
There are AAA dependencies you pull into every project, but most other dependencies require a good degree of due diligence, evaluation, risk, and their own long-term maintainance.
Its not that it always tips the scales all the way to 'roll your own', but I think the cost of new dependencies is underrated.
> Find a dependency that will save you an afternoon? Grab it.
Agree. The point of the article, though, is that dependencies are often saving much less time than they promise - so much less that it's better to avoid them.
From my experience with these personal failings, the problem usually comes from the question being phrased in the context like, "before you begin working on this, how long do you think this will this take you to complete?". If there's no opportunity to scope, with requires not insignificant work towards the solution, the estimates will always be wrong. If I understand the actual scope of the problem, which means have the architecture mostly worked out, and have a bit of experience (and luck), my estimates can be pretty close, usually eaten up by that oh-so-seductive feature creep that ruins my work file balance.
Absolutely. As long as it does save you that time over the foreseeable lifetime of the project. Or you are deliberately incurring a technical debt because of some deadline.
On the other hand, saving an afternoon (or even a week), over the next two weeks means very little.
Have you heard the adage about premature optimization being the root of all evil? Yes, even with the second part. What is the premature optimization here, in your opinion?
Most of cases developers create something new - that's the state of industry now, not too good but it's how it is. If you'd be refactoring the existing code - sure, find the problem, design the solution, have reasons going from A to B. If, however, you're writing new functionality, you don't know if you'll have problems of this kind with this code - so optimize for developmentality. You can remove those excessive crutches later - if and when you need them. In my experience, having them trumps looking into code and spending time figuring what it does mere months later - your own code, that is.
all I needed was
Taking 197 dependencies means 197 things that need updates several times a year at a minimum. Any of those updates could break my code, introduce a bug, add a vulnerability on top of the ones already in the packages. So it's not like adding more dependencies is magically free.- You should absolutely use community-supported tools to solve your problems.
- You should substitute idiomatic code for libraries.
You have made an argument for the latter that does not detract from the former.
Lots of things can go wrong when writing a file: https://danluu.com/deconstruct-files/
So, if I think something looks like about a day’s work, I’ll actually estimate it at about 3.5 or 4 days. Thus, for a project to qualify as “just an afternoon,” I’d have to naively estimate it at under an hour.
I rarely have time to spare, but I also rarely go over by more than maybe a third.
Your multiplier may vary depending on how horrifying your codebase is. On a side project with good test coverage, my multiplier is only about 2.
This is sometimes true, but often not. From SFTP libraries to SVG rendering libraries, there have probably been about 3-5 major dependencies of my company's project that I have had to learn and extend or fix bugs in to make them work just in the last year.
And sometimes this means using our own fork that we have to keep maintained.
I'm not saying I would have rather written these particular dependencies from scratch, but they were definitely not cost free. Nor are they all of better quality than what I would have produced had I written them from scratch.
That's the other common refrain - to "defer to the expertise of the crowd".
Don't get me wrong, many 3rd party libraries are of great quality by amazing men and women who I am very thankful for. But certainly not all of them.
There's no magic that says "every third party library is made by an expert with the highest standards".
Numbers or ASCII-only-printing? OK that's a reasonable. Is there a desired overflow behavior?
Past that it becomes more an issue of where and why. The suddenly not-trivial example includes questions about fonts, layout, and multi-byte characters. Emoji, etc.
Incidentally, in pseudoscope:
Create a valid full-space pad string (termination / etc), then decrement back from the end of the source string and over-write the pad characters from the end to the start of the string, exiting either on no more pad characters or no more input.
A second algorithm might combine those two steps as one pass, fill the output buffer from back to front. Only for C style strings would this be an issue given the dynamic end point for the data structure.
This is why you time box things. Spend XX hours trying to get a thing working and if you aren't close, you grab a library and move on.
We all too often forget the scope: requirements, developing, testing, to say the least.
My favorite example is NPM. While the author has a point, I tend to rely on the wisdom of the crowd. Sometimes there is a reason why a couple of million developers - in the case of NPM packages - seem to be lazy.
In my experience, we ended up copy/pasting and modifying some code and syncing it with the "superfluous" package. Good intentions, badly executed.
Leftpad was the right itch at the right time and people found better ways to deal with NPM. NPM got better after that, as well as native implementations.
Better cope with NPM than fight it, my 2 cents.
Besides, in reality pulling in and using the dependency takes time as well. There's no real guarantee it's cheaper in terms of developer time.
Don't buy generalized statements like "programmers are always underestimate efforts needed", or even, for that matter, "a task always requires all the possible time it might take" (Parkinson's law). There are exceptions from them :) which sometimes, in a good team, look more than laws themselves.
Never do I give that raw 10-minute estimate to anybody, because it can be wrong by a factor of 10.
They just won’t have unit tests, and they’ll probably have lots of defects and other technical debt.
So first I grabbed 'heap' from npm (272k weekly downloads) and set it to work. But a few days later I realized my code was executing slower than expected because it sometimes needed to clone the data structure, and the clone instantiation would break the heap invariant in the array internals. It turned out there's been an issue open about this since early 2017.
Then I went for the 'collections' package (35k weekly downloads) and brought in its heap implementation. That worked like a charm for about six months until a bug came in that made it seem like a completely different package was breaking. After almost a whole day of debugging, it turns out that 'collections' silently shims the global Array.from function (scream emoji) without mimicking its behavior when dealing with non-Array iterables presented by the other package.
So finally I wrote my own heap -- well, I cribbed from Eloquent JavaScript [0] but I did have to briefly remember a little bit about how they're supposed to work. So while I don't totally buy the "Never..." rule in the post title, thinking more carefully about writing versus importing a dependency would have saved me a great deal of headache in this case.
[0] https://eloquentjavascript.net/1st_edition/appendix2.html
It is not possible to add a new dependency in less than afternoon because you need to evaluate alternatives, test it, learn how it works, make sure it's not accidentally GPL, etc. So there are not two methods, the less-than-an-afternoon method and the more-than-an-afternoon method. There are two methods that both take at least one afternoon. If you estimate wrong and you can't write the code in an afternoon… Then stop working on your handwritten version and find a dependency? But now you know what the dependency is really supposed to do and why it's non-trivial, so you're in an even better position to evaluate which ones are good.
I came in here to say this. If you think you're not qualified to write the function, you're probably also equally unqualified to choose someone else's implementation of it.
There is a lot of stuff out there-- stuff which is widely used-- which is not fit for your purposes, ... perhaps not for anyone's. And there is no replacement for a bit of domain expertise.
I probably spend almost 2 months evaluating hashmaps.
There are a dozen separate maps in FreePascal standard library. But they all have some issues. Like a max key length of 255 chars, only working with orderable keys, not rehashing itself, treemap instead of a hashmap, or actually not working...
In the end I used a map from another library and modified it heavily. It also has a big issue of not really deleting items, only keeping a tombstone that is not removed until rehashing, but the advantage is that it keeps insertion order
1. Not evaluate any alternatives
2. Not read the docs
3. Not check the code
4. Not check the license
;)
(Thanks for a great blog post that articulates something I've felt for a long time.)
For a small module that would take less than an afternoon. Checking a module license takes less than it took to read the comment.
That last thing you want to do is spend a bunch of time reimplementing code when: 1) It may not matter at all, 2) You might miss important edge cases, or 3) You got everything right but you still have to maintain it forever.
If it's going to take you three days to integrate the library, maybe it's not such a good library, or maybe it's really complicated because there are a lot of edge cases. In that case, dig into the code and see if you can figure out what it's doing.
But if you think you can spend an afternoon rewriting a library that would take three days to integrate, there is a good chance you might be missing something important.
In this way I had little investment in the beginning. And once I knew what I needed, it was another small investment to clean the code.
- With a little experience, you know what gets fiddly and what doesn't. Today for instance, I needed a way to remove tags in an SVG document, which looks a lot like HTML tags. I quickly ended up finding that Regex is not the solution (a well known guy on SO wrote an answer that looks like a huge warning sign). I also couldn't enumerate all the corner cases. So I found a lib that does it, along with an SO answer that turns it into a two-liner.
- Dependencies vary in quality. Some are basically like another standard lib. Boost for instance is very well used. The tough ones are where the lib seems to be "finished", where there seem to be few commits recently, but the project was once lively and functional. IIRC libev comes to mind here. And then there are the totally dead projects, where there's a load of issues open and nobody saying anything.
- Try to lock down versions. If you get a thing working with a certain version, there's no reason you need the newest new as soon as it's pushed. You can probably live with doing a scan for updates now and again.
- Your afternoon of programming needs to have a clear end. That hashmap you wrote will very likely spew out issues over the next few days. CSV parser, maybe. Bessel function, that'll work.
For those who are in today's 10000, you might mean this piece of art: https://stackoverflow.com/questions/1732348/regex-match-open...
https://blog.codinghorror.com/regular-expressions-now-you-ha...
Writing an SVG (or at least XML) parser is a necessary task for writing a filter that doesn’t get stuck due to weirdo issues. That is way more than an afternoon of work! But once you have a parser, dropping tags you don’t want or transforming them somehow is totally an afternoonable task size. So, do use a dependency for SVG parsing, but don’t look for a special “SVG filter all” package. Just do the filtering yourself.
Agree ! it irks me a lot that I often see update bots tracking new releases.. it is just begging to be exposed to regressions.
We need to find a happy medium though. Otherwise whenever you actually need to update something (e.g. you need add a new dependency which only handles one of your other dependency if it jumps 20 releases ), you have a huge version gap to cover.
The problem lies in the fact that there are a great many things I can hack together in an afternoon to "replace" some kind of external dependency, but the quality discrepancy of these hacks is highly variant. My understanding of what can or should be done in an afternoon might differ with my colleagues'.
Unfortunately, like all things in engineering, you have to carefully reason about the pros/cons, requirements, and costs. After that analysis, you can make a judgment on depend vs. build (also, buy vs. build).
Over time an issue tracker inevitably becomes a collection of hard-to-reproduce bugs, incomplete patches, underspecified feature requests, random tracebacks, etc. Maintainers can choose to just close everything which is not actionable immediately, or be in comfort with such issues, and let them live in the bug tracker. I personally like a style when an issue is closed only if it is fixed, or if it doesn't contain useful information, or if it is a duplicate.
A better indicator is activity and responsiveness of the maintainers in the issue tracker.
I can look at the code, get a good grasp of it (hopefully), judge the quality, docs, prospects of getting updates/needing updates/being able to update it myself, pretty comfortably. In other words, the risk evaluation is incredibly straight forward.
Additionally, the risk itself is fairly low. If it goes out of date or stops working or just turns out to suck, the most I risked is an afternoon of work. Leftpad was a debacle due to it's scale, but fixing Leftpad was pretty easy (I'm not recommending importing one liners as dependencies mind you)
-
But when it comes to stuff that isn't small, it's usually also the kind of stuff that holds the most insane amounts of risk for a project and is the hardest to evaluate.
Stuff like application frameworks, threading frameworks, massive networking libraries, etc.
The interface is _huge_. To the point that even when you try and wrap their complexity in nice packages with separation of concern and encapsulation they leak out into the rest of your code and end up being a nightmare to ever change.
Instead of spending an afternoon writing dependencies like this, spend that time investigating your "too-big-to-fail" dependencies. Try and keep a finger on their pulse, because they're the ones that will really come back to bite you if things go south.
Sometimes, the opportunity cost (time spent) is the largest term in the risk equation, but often there are other terms that might be orders of magnitude larger. For example, the risk of depending on the wrong abstraction, or becoming coupled to a hack.
What you're saying makes sense. My only point is that there's a lot more subtle judgment required in these decisions than often meets the eye.
Until I found it wasn't installed on some of our test machines (it was needed for testing, not for production and for reasons beyond my pay grade, libcurl was not available on the machines). Then I thought, well, I could include the libcurl into our vendor repo. It worked, but it was a nightmare to use. It took way too long to figure out the proper "configure" options to use for what systems, it nearly tripled the time to build it on the build servers, and even then, it was hit-or-miss.
After several years of this misery, I removed libcurl, and wrote what I should have years earlier. Using libcurl as a dependency did NOT save us any time.
Perhaps it's a domain-specific thing, but when someone uses the words "hack together" I imagine it means using dependencies without really understanding what's going on in them, precisely to avoid figuring out how to code a solution properly.
Writing it yourself obviously needs to also imply doing it correctly. Even if that means you must learn a bit about what is the right way to do it (a side benefit, though usually viewed as a downside).
The support aspect of internal libraries, especially in the age of Stack Overflow, is widely overlooked by the very people who Must Be Stopped.
"This HN discussion https://news.ycombinator.com/item?id=24123878 is topical for me: at this very moment, I am implementing C++ MFCC code myself, because my attempt to integrate Kaldi (on windows) was unpleasant. It already took more than an afternoon, but I learned good things! \ I'm more sympathetic to the Use-All-The-Dependencies crowd than some might suppose. It definitely isn't my way, but I see them as a fellow subclass of programmer, evolved for other environments. It is amazing what can be cobbled together in a weekend now. \ The old Knuth vs McIlroy story is relevant: http://leancrew.com/all-this/2011/12/more-shell-less-egg/
Generally, use-the-tools is correct, but sometimes you really do want a Knuth (or maybe a Carmack)." "
However in almost all cases, you - the reader - cannot achieve in an afternoon what Carmack can achieve in an afternoon.
Few developers have the ability to build an alternative to a speech recognition dependency.