Self-Documenting Code

My cut:

    const passwordRules = [/[a-z]{1,}/, /[A-Z]{1,}/, /[0-9]{1,}/, /\W{1,}/];

    async function createUser(user) {
        const isUserValid = validateUserInput(user);
        const isPasswordValid = user.password.length >= 8 && passwordRules.every((rule) => rule.test(user.password));

        if (!isUserValid) {
            throw new Error(ErrorCodes.USER_VALIDATION_FAILED);
        }


        if (!isPasswordValid) {
            throw new Error(ErrorCodes.INVALID_PASSWORD);
        }

        const userExists = await userService.getUserByEmail(user.email);

        if (userExists) {
            throw new Error(ErrorCodes.USER_EXISTS);
        }

        user.password = await hashPassword(user.password);
        return userService.create(user);
    }

1. Don't use a bunch of tiny functions. This makes it harder for future eng to read the code because they have to keep jumping around the file(s) in order to understand control flow. It's much better to introduce a variable with a clear name.

2. Don't use the `a || throw()` structure. That is not idiomatic JS.

2a. Don't introduce `throwError()`. Again, not idiomatic JS.

3. Use an enum-like object for error codes for clarity.

4. If we must use passwordRules, at least extract it into a global constant. (I don't really like it though; it's a bit too clever. What if you want to enforce a password length minimum? Yes, you could hack a regex for that, but it would be hard to read. Much better would be a list of arrow functions, for instance `(password) => password.length > 8`.

5. Use TypeScript!

OptionOfT · 10 months ago

My issue with this is that you're using exceptions for control flow. A user not being valid is expected (duplicate username). A password not matching a regex is also expected.

Then, in general (not seen here as there are no types), I like to use a lot of types in my code. The incoming user would be of type UnvalidatedUser, whereas the return type of this function would be StoredUser or something like that to distinguish the incoming user type with the outgoing. I like to attach semantics to a type, not to conditions: https://existentialtype.wordpress.com/2011/03/15/boolean-bli...

aleksiy123 · 10 months ago

I would add one suggestion/comment. Use a known set of standard error codes and not a unique error code/error type for each new situation.

Error codes are there to hint to clients/callers what action they should potentially take (retry?, backoff)

Don't make callers handle 100s of different potential error codes/types.

If the whole internet can work with 70 your app can work with less.

All of google uses less than 20

https://github.com/googleapis/googleapis/blob/master/google/...

Put more specific information in the error message or a secondary status code.

johnfn · 10 months ago

That's a great point, I didn't even think of that. I would use error types as well, yes.

jansommer · 10 months ago

It's a good point, especially because different return types is well supported in TS.

I made the same in plpgsql recently and opted for returning an implicit union of UserSession | Error, by returning UserSession in the signature and raising errors in the function. The alternative was to return json where you'd have to look at the body of the function to figure out what it returns (when successful), as opposed to the signature.

I'm not sure if I'm striking the right balance. Yes, the signature is "self-documenting" - until you hit an error!

rerdavies · 10 months ago

My issue with that is that absolutely NOTHING will ever convince me that returning error codes is a better idea than throwing exceptions. And that you seem to be using 'expected' in some weird cargo-culty sense of the word. An invalid user name is an error, not an expected case.

jagged-chisel · 10 months ago

“ Don't use a bunch of tiny functions. This makes it harder for future eng to read the code …”

This is where the naming things bit comes in. You name the function correctly, then when the body is read to understand that it works as named, you can remove that cognitive complexity from your brain and continue on. Once you’ve built trust in the codebase that things do what they claim, you can start getting a top-down view of what the code does.

That is the power of proper abstraction.

johnfn · 10 months ago

I don't know. I really don't see any clarity improvements between, `user.password.length >= 8 && passwordRules.every((rule) => rule.test(user.password))` and `validatePassword(password)`. What if you want to add that the password must contain one special character? You don't actually know, by reading the name "validatePassword", if that work has already been done or not. You need to go read the function definition and check. And so even for such a small function, there is no name that you can choose to truly do what you claim and "remove the cognitive complexity from your brain".

Once a function gets to a certain threshold of size and reuse, I tend to agree with you. But I think that threshold is quite large - like at least 40-50 lines, or reused at least 3 times.

dietr1ch · 10 months ago

Abstraction is way better, I don't really want to know how the password is validated unless I know I'm facing issues with validation (which proper logging tells you about before you even dive into the code).

I don't understand why some people prefer being swarmed with details. It's not that they want details, but that they just hate navigating files (layer 8 + tooling problem) or that they "need" to know the details because not knowing them haunts them at night somehow. Also, not having that as a free function makes me think it's not tested at all (although there might be some integration test that hopefully catch all errors at once, but I'm sure they don't either)

hinkley · 10 months ago

He even knows what the missing function should be called:

> isPasswordValid

TrianguloY · 10 months ago

My only nitpick is that the const isPasswordValid = ... should be just before its use (between the first two ifs). Other than that, I prefer this approach (although I would inline the booleans in the ifs to avoid the one-use variables. But that's ok).

> Don't use a bunch of tiny functions

Exactly this. I only do that when the function is used in more than 10 places and it provides some extra clarity (like something as clamp(minVal,val,maxVal){return max(minVal,min(val,maxVal))} if your language doesn't already have it, of course).

I also apply that to variables though, everything that is only used once is inlined unless it really helps (when you create a variable, you need to remember it in case it is used afterwards, which for me is a hard task)

xigoi · 10 months ago

> My only nitpick is that the const isPasswordValid = ... should be just before its use (between the first two ifs).

Wouldn’t that cause the regexes to be recompiled every time you call the function?

rjbwork · 10 months ago

This is one of my pet peeves. I had an engineer recently wrap Dapper up in a bunch of functions. Like, the whole point of dapper to me is that it gets out of your way and lets you write very simple, declarative SQL database interactions and mapping. When you start wrapping it up in a bunch of function calls, it becomes opaque.

DRY has been taken as far too much gospel.

skydhash · 10 months ago

I always prefer having loadBooks than “select * from books” everywhere. I prefer my code to be just the language version of how I would write an explanation if you ask me one specific question. Not for DRY, but for quick scanning and only diving into details when needed

dclowd9901 · 10 months ago

Yep, I’d hire you, and not OOP. There was some really questionable code structuring in there and I always wince at abstraction for abstraction’s sake. It always becomes a hindrance when something goes wrong, which is exactly when you don’t want to have a hard time deciphering code.

eru · 10 months ago

> 1. Don't use a bunch of tiny functions. This makes it harder for future eng to read the code because they have to keep jumping around the file(s) in order to understand control flow. It's much better to introduce a variable with a clear name.

If you have nested functions, that's not a problem.

Btw, why do you use regular expressions for some rules, but not for others? Regular expressions are perfectly capable of expressing the length requirement.

ilrwbwrkhv · 10 months ago

Something about Javascript and Typescript is really ugly to my eyes. I think it is the large keywords at the beginning of every line. Makes it hard to parse and read. I find C++ style much better to read.

I don't find this easier to read:

    !(await userService.getUserByEmail(user.email)) || throwError(err.userExists);

I guess if I worked in a codebase that used that pattern consistently I'd get used to it pretty quickly, but if I dropped into a new codebase that I didn't work on often I'd take a little bit longer to figure out what was going on.

mega_dean · 10 months ago

After that step, they say "The resulting code is shorter and has no nested logic." The resulting code has the same logic as before, it's just not visually represented as being nested. I've seen the same argument ("nesting is bad so indentation is a code smell") used to say that it's better to use early returns and omit the `else` block, eg:

    if (some_condition) {
      // do stuff here
      return;
    }
    // do other stuff here

is "better" than:

    if (some_condition) {
      // do stuff here
    } else {
      // do other stuff here
    }

If you have very-deeply nested code then it usually becomes easier to work with after splitting it up into smaller pieces. But IMO rewriting code like this to save a single level of indentation is bikeshedding.

jonathanlydall · 10 months ago

I would say (as all good technical people know) it depends.

I have come to appreciate the style of early returns rather than else statements as I have found over the years it generally makes the code easier for me to follow when I’m looking at it possibly years later.

It really depends on the particular condition, but sometimes it just reads better to me to not use the else, and this is because as a style I tend to try have “fail conditions” cause an early return with a success being at the end of the method. But again there are regularly exceptions where trying to do this “just because” would contort the code, so returning an early success result happens often enough.

I have however found that sometimes ReSharper’s “avoid nesting” suggestion (particularly in examples like yours) results in less clear code, but it’s almost always at least not worse and maybe slightly better for the sake of consistency.

EDIT: Having thought about this more, here is why I find early returns generally easier to read than else statements.

With an early return the code is generally more linear to read as when I get to the end of the if block I can instantly see there is nothing else of relevance in the method, I save myself having to needlessly scan for the end of the else block, or even worse, past more code blocks only to find that the rest of the method’s code is irrelevant.

Again, not a hard rule, but a consistent style in a code base also generally makes it easier to read.

jonhohle · 10 months ago

I usually find the opposite (personally). Get rid of all the exceptional cases and error handling up front like you have in the first example and then spend the remaining body of the function doing the main work of that function.

It’s not so much indentation that’s an issue, but coupling control flow with errors and exceptions.

Swift does a nice job with `guard` statements that basically bake this in at the language level - a condition succeeds or you must return or throw.

If that control flow is part of business logic, I don’t think there’s any issue with your second example. That’s what it’s there for.

RaftPeople · 10 months ago

> I don't find this easier to read:

I agree. The previous iteration shown is simpler IMO.

I've really shifted how I code to making things just plain simple to look at and understand.

f1yght · 10 months ago

That's the way it should be, easy to understand. This set up might be short but it's complex to read.

amonith · 10 months ago

It could be "dangerous" even sometimes if you're not paying attention. In JS/TS "||" operator evaluates the right side when the left side is "falsy". "Falsy" doesn't mean only null/undefined, but also "", 0, NaN, and... well... false. So if you make a method like "isUserActive" or "getAccountBalance" and do a throw like that, you'll get an error for valid use cases.

jay_kyburz · 10 months ago

Whats more, the isUserValid function can't return a more detailed error about what was not valid about the User. It can only return falsy.

LorenPechtel · 10 months ago

Agreed. I do not like that line at all. I might take that approach but if I did it would be a separate IsDataValid function that checked things, one condition per line. (Might be a string of ||s, but never run together like that.) As much as possible I want one line to do one thing only.

cheapFunction(...) || expensiveFunction(...) // saves us a few cylces car = car || "bmw" // setting default values, common pattern funcA(...) && funcB_WhichMightBreakWithoutFuncA(...) // func A implies func B ... // probably a few other cases I don't remember

function insertion_sort(a) { for (let i = 1; i < a.length; i++) { let key = a[i]; let j = i; while (j > 0 && key < a[j-1]) { a[j] = a[j-1]; j--; } a[j] = key; } }

alilleybrinker · 10 months ago

There is no such thing as universally self-documenting code, because self-documentation relies on an assumption of an audience — what that audience knows, what patterns are comfortable for them — that does not exist in general.

Self-documenting code can work in a single team, particularly a small team with strong norms and shared knowledge. Over time as that team drifts, the shared knowledge will weaken, and the "self-documenting" code will no longer be self-documenting to the new team members.

simonw · 10 months ago

Chris_Newton · 10 months ago

If I were reviewing the original code, the first thing I’d question is the line

    user.password = await hashPassword(user.password);

1. As a rule, mutations are harder to understand than giving new names to newly defined values.

2. The mutation here apparently modifies an object passed into the function, which is a side effect that callers might not expect after the function returns.

3. The mutation here apparently changes whether user.password holds a safe hashed password or a dangerous plain text password, which are bad values to risk mixing up later.

4. It’s not immediately obvious why hashing a password should be an asynchronous operation, but there’s nothing here to tell the reader why we need to await its result.

At least three of those problems could trivially be avoided by naming the result hashedPassword and, ideally, using TypeScript to ensure that mixing up plain text and hashed passwords generates a type error at build time.

I do agree with many of the other comments here as well. However, I think the above is more serious, because it actually risks the program behaving incorrectly in various ways. Questions like whether to use guard clauses or extract the password check into its own function are more subjective, as long as the code is written clearly and correctly whichever choices are made.

tetha · 10 months ago

> At least three of those problems could trivially be avoided by naming the result hashedPassword and, ideally, using TypeScript to ensure that mixing up plain text and hashed passwords generates a type error at build time.

Going that path further ends up what a few code bases I've worked with do: Pull the two domains apart into a "UserBeingCreated" and an existing "User".

This felt a bit weird at first, but the more I think about it, the more sense it makes. One point leaning towards this: You are dealing with different trust levels. One is a registered and hopefully somewhat validated user, which can be trusted a bit. The other thing could just be a drive by registration attempt.

And you're dealing with different properties. Sure, there is some overlap - username, mail, firstname, lastname. But only a UserBeingCreated needs validation errors or a clear text password. Other things - like groups, roles and other domain properties only make sense after the user is properly registered.

I’ve had this debate a few times too. Personally I am in the camp that says you’re talking about two interfaces — your external UI or API, and your internal database schema — so even though you’ll often have a lot of overlap between types representing analogous entities in those two interfaces, they aren’t really the same concept and coding as if they will or should always have identical representations is a trap. I would almost always prefer to define distinct types and explicit conversion between them, even though it’s somewhat more verbose, and the password hashing here is a good example of why.

I wrote a more about this in a Reddit post a while back if anyone’s interested: https://www.reddit.com/r/Python/comments/16w97i6/flask_300_r...

jcparkyn · 10 months ago

Agreed, and then there's the time of check/time of use issue with creating a user. Probably not a vulnerability if userService is designed well, but still a bit dubious.

You’re right, that’s potentially a correctness issue as well. Ideally we’d have a creation interface that would also perform the pre-existence check atomically, so there would be no need for the separate check in advance and the potential race condition would not exist. This does depend on the user service providing a convenient interface like that, though, and alas we aren’t always that lucky.

cjfd · 10 months ago

Typescript looks much, much better than what he ends up with. The typescript is more or less the same thing but with comment tokens removed. How is just removing the comment tokens not an obvious improvement in readability?

Honestly, I think all of jsdoc, pydoc, javadoc, doxygen is stuff that most code should not use. The only code that should use these is code for libraries and for functions that are used by hundreds or thousands of other people. And then we also need to notice that these docs in comments are not sufficient for documentation either. When a function is not used by hundreds or thousands of people, just write a conventional comment or perhaps not write a comment at all if the function is quite straightforward. Documentation that explains the big picture is much more important but that is actually somewhat hard to write compared to sprinkling jsdoc, pydoc, javadoc or doxygen worthless shit all over the place.

Deleted Comment

joecarrot · 10 months ago

If one of my developers used "||" that way I would definitely throw some side eye

JellyBeanThief · 10 months ago

I was thinking exactly the same. You can write

    if (cond) { cons }

on one line and get more readable code admittedly a few chars longer.

65 · 10 months ago

Don't even need the curly braces. I do

    if (cond) doSomething();

all the time.

Code patterns are social! What is strange to one is normal to another.

The kind of pattern used here with the `||` might seem weird to some JavaScript developers, but it's pretty normal in shell scripts, and it's pretty normal in Ruby with `unless`!

MetaWhirledPeas · 10 months ago

I must be their target audience because as soon as they used the example with || it all started making sense.

This would have been fine too but it would trigger some people not to use {}

    if (!validateUserInput(user)) throwError(err.userValidationFailed);

My preferred style might be closer to this.

    if (!userInputIsValid(user)) throwError(err.userValidationFailed);

gnarlouse · 10 months ago

If one of my developers threw in a throw like that I would throw up in their mouth.

dvt · 10 months ago

The writer here misunderstands how short-circuit evaluation is supposed to be used. The idea is that you should use SCE in a few, pretty standard, cases:

Using it to handle control flow (e.g. throwing exceptions, as a makeshift if-then, etc.) is a recipe for disaster.

trealira · 10 months ago

Short-circuiting evaluation is also useful for things like this:

If short circuit evaluation didn't exist, then "key < a[j - 1]" would be evaluated even in the case where j = 0, leading to the array being indexed out of bounds.

38 · 10 months ago

I would go further to say that syntax should never be used. for example with Go:

> cheapFunction(...) || expensiveFunction(...)

is not valid unless both functions return bool

> car = car || "bmw"

is not valid at all, because both types would need to be bool

> funcA(...) && funcB_WhichMightBreakWithoutFuncA(...)

not valid unless functions return bool. I think Go smartly realized this syntax is just sugar that causes more problems than it solves.

lolinder · 10 months ago

This has nothing to do with syntax and short circuiting and everything to do with Go's type system. Go, like most compiled languages, has no concept of "truthiness". JavaScript is not Go and has truthiness.

We can debate the merits of truthiness and using it this way, but let's have that debate on the merits, not by invoking other languages with completely different design constraints.

Your argument here is similar to what got us "no split infinitives" in English (grammarians wanted English to be more like Latin).

variadix · 10 months ago

Types are the best form of documentation because they can be used to automatically check for user error, are integral to the code itself, and can provide inline documentation. The more I program in dynamically typed (or even weakly statically typed) languages the more I come to this conclusion.

einpoklum · 10 months ago

You're not wrong, but if your code is full of bespoke types that are only relevant to a couple of places where they're used, you hurt interoperability; you lock yourself in to how things are done now; and you may just be shifting the burden of making sense to someplace else in the code.

If you are able to formulate types which can be grasped without reading and re-reading their code - that's a win; and if you are able to express more of your code in terms of more generic non-trivial types and data structures, which are language-idiomatic - then that's a double win, because you're likely to be able to apply library functions directly, write less code, and have your code readers thinking "Oh, variadix is just doing FOO on the BAR structure, I know what that means".

callc · 10 months ago

> if your code is full of bespoke types that are only relevant to a couple of places where they're used, you hurt interoperability; you lock yourself in to how things are done now; and you may just be shifting the burden of making sense to someplace else in the code

What is an example of bespoke types? Is is all compound types (structs, classes)? If you need interop or extensibility, make an API. Feel free to use whatever types in your library you want, just make sure to agree to the API interface.

I’m honestly not sure what style of programming you are advocating for einpoklum. Can you provide an example?

All non-primitive types are essentially n-ary trees of types of sub-members (ignoring unions) with primitive types as leaves. Passing your non-primitive type so some external library function is accomplished by the library declaring that the type it receives must adhere to an interface.

> types that are only relevant to a couple of places

do not create architectural lock-in. Friction in type declaration comes from too many cooks in the kitchen. If only two chunks of code see a type, just change it.