It’s a matter of style, and like cooking, either too much or too little salt will ruin a dish.
In this case I hope nobody is proposing a single 1000-line god function. Nor is a maximum of 5 lines per function going to read well. So where do we split things?
This requires judgment, and yes, good taste. Also iteration. Just because the first place you tried to carve an abstraction didn’t work well, doesn’t mean you give up on abstractions; after refactoring a few times you’ll get an API that makes sense, hopefully with classes that match the business domain clearly.
But at the same time, don’t be over-eager to abstract, or mortally offended by a few lines of duplication. Premature abstraction often ends up coupling code that should not have to evolve together.
As a stylistic device, extracting a function which will only be called in one place to abstract away a unit of work can really clean up an algorithm; especially if you can hide boilerplate or prevent mixing of infra and domain concerns like business logic and DB connection handling. But again I’d recommend using this judiciously, and avoiding breaking up steps that should really be at the same level of abstraction.
> In this case I hope nobody is proposing a single 1000-line god function. Nor is a maximum of 5 lines per function going to read well.
This is the key. Novice devs tend to write giant functions. Zealot devs who read books like Clean Code for the first time tend to split things to a million functions, each one a few lines long (pretty sure the book itself says no more than 5 lines for each function). I worked with a guy who extracted each and every boolean condition to a function because "it's easier to read", while never writing any comments because "comments are bad" (according to the book). I hate that book, it creates these zealots that mindlessly follow its bad advices.
Or, the fun one I run into is devs who write a mix of 1000 line functions and tiny little 5 line functions with no discernible pattern to which option is chosen when.
The truth is that what makes code readable is not really (directly!) about function size in the first place. It's about human perceptual processing and human working memory. Readable code is easily skimmable, and should strive to break the code up into well-defined contexts that allow the programmer to only have to carry a handful of pieces of information in their head at any given moment. Sometimes the best way to do that is a long, linear function. Sometimes it's a mess of small functions. Sometimes it's classes. Which option you choose ultimately needs to be responsive to the natural structure of the domain logic you're implementing.
And, frankly, I think that both versions do a pretty poor job of that, because, forget the style, the substance is a mess. They're both haphzardly newing up objects and mutating shit all over the place. This code reads to me like the end product of about four sprints' worth of rushing the code so you can get the ticket closed just in time for sprint review.
I mean, let's just think about this as if we were describing how things work in a real kitchen, since I think that's pretty much what the example is asking us to do, anyway: on what planet does a pizzeria create a new, disposable oven for every single pizza? What the heck does
pizza.Ready = box.Close()
even mean? Now we've got a box containing a pizza that's storing information about the state of the object that contains it, for some reason? Demeter is off in a corner crying somewhere. What on earth is going on with that 'if order.kind == "Veg"' business, why aren't we just listing the ingredients on the order and then iterating over that list adding the items to the pizza? The logic for figuring out which ingredients go on the pizza never belonged in this routine in the first place; it's ready aim fire not ready fire aim. etc.
Oh man it's easy to spot someone who blindly follows Clean Code. I personally don't like it, but I am I fan of all of Martin's other books. It's just aggressively opinionated in a way that I just can't get behind. I'm sure I'm not alone but reading that book made me feel insane since he described things as objectively good that I found awful.
I was this dev early in my career. A sharp overreaction to a giant ball of mud architecture with no tests and minimal consistency. I read all those books looking for some better way and inflicted all those rules on people.
I don't regret the learning, but I do regret being dogmatic. It was interesting that no one around me knew better either way, or felt they could provide reasonable mentorship, so we went too far with it. These days I write the pizza function on the left, and use comments sparingly where they add context and reasoning.
Clean Code says "Functions should not be 100 lines long. Functions should hardly ever be 20 lines long".
I think both 100 and 20 are a bit low, but much better than 5. As I mentioned in a comment a few days ago when I also corrected someone that misremembered a detail from the book, I am not a huge fan. But I also think it is mostly correct about most things, and not as terribly bad as some say. Listening to fans of the book is more annoying than to actually read the book.
(And that other comment when I corrected someone was about bad comments. Clean Code definitely does not say that you shall never comment anything.)
> I worked with a guy who extracted each and every boolean condition to a function because "it's easier to read"
Obviously, readability is important, but I've also seen things like this so often in my career where it's used as an excuse for anything. Most recently, trying to stop a teammate turning nearly every class into a singleton for the sake of "simplicity" and "readability", which I thought was a real stretch.
>In this case I hope nobody is proposing a single 1000-line god function.
Why not? Who said it's worse? What study settles the issue?
Some times a "1000-line god function" is just what the domain needs, and can be way more readable, with the logic and operations consolidated, than 20 50 line functions, that you still have to read to understand the whole thing (and which then someone will be tempted to reuse a few, adjust them for 2-3 different needs not had by your original operation, and tie parts of the functions implementing your specific logic to irrelevant to it use cases).
And if it's a pure 1000-line function, it could even be 10,000 lines for all I care, and it would still be fine.
Yeah, when code gets spread out across too many classes and functions, it's like you're trying to navigate a maze without a map. You hit a breakpoint, and you're left scratching your head, trying to figure out what the heck each class is supposed to do. Names can be deceptive, and before you know it, the whole architecture feels like a jigsaw puzzle. It's a cognitive load, having to keep track of all these quirks. Maybe it was easier for the author to do it that way when they started from scratch, but after they finished, it's another deal.
1000-10000 lines typically mean the developer just doesn’t know how to abstract. Don’t go overboard with the function extraction but also don’t make me read every line of your code so I can find the one tiny part I want to change.pseudo-functions, like the commented segments of code like in the linked post, helps but it’s not obvious which data those segments of logic are depending on.
If we go with the cooking analogy, if you have to describe to someone how to cook a meal, and at one part of the meal you have to put the fond in, it is reasonable to explain how to make the fond in a seperate section. The fond is it's own thing and it has one touching point with the food,therefore it is okay (or even benefitial) to move it out.
Also: cooking recipes are also very abstracted. When they say you need to lightly fry onions they assume you know a way to cut onions and a lightly frying algorithm already. If they would inline everything it would become unreadable.
Code is very similar. If you want it strictly without abstractions it will be as low level as your language allows you, and that is definitely not readable code.
If you e.g. instead of using pythons "decode" method tries to do unicode decoding yourself it would become very hard to understand what your program is actually about. Now there are probably zero people who would do that, because the language provides a simple and well tested abstraction — but what makes that different from you creating your own simple and well tested abstraction and using that throughout the actual business logic of your code?
The hard part is creating abstractions that are so well chosen that nobody will have to ever touch them again.
To stay with the fond analogy: It gets interesting if the fond preparation involves deglazing a pan (mutable environment) with meat bits and juices left at the bottom (state/precondition). Two options:
- Linear code: The meat frying (state-producing) and deglazing (state-requiring) steps are below each other in the same recipe, so to verify that it works you can just linearly go through line by line. However if the recipe becomes long and a lot of stuff happens in between, it's no longer obvious. You'll have to use good comments ("// leave residue in the pan, we'll need it for the fond") because otherwise you might accidentally refactor in a way that violates the precondition (swaps/scrubs the pan).
- Modular code: You need to clearly describe the precondition on the fond preparation subroutine to have any chance to keep using it correctly. On one hand this forces documentation, on the other hand it's probably still easier to forget since the subroutine call ("Prepare the fond.") doesn't directly make the precondition obvious.
Either way has its advantages and drawbacks, and the right choice depends on the circumstances.
This is assuming you only want to cook this specific meal and aren't writing a cookbook - otherwise you should definitely modularize to remove repetition.
> But at the same time, don’t be over-eager to abstract, or mortally offended by a few lines of duplication. Premature abstraction often ends up coupling code that should not have to evolve together.
A relatively common piece of feedback from me to the team at work is usually to take a half step back and look at the larger problem domain and consider whether these things are necessarily the same, or coincidentally the same.
Just because the lines of code look similar right now doesn't mean they need to be that way or need to stay that way. Trying to mash together two disparate use cases because "the code's basically repeated" is often how you get abstractions that, especially over time, end up not actually abstracting anything.
As the various use cases get too divergent, the implementations either move much of the logic up to the caller (shallow abstractions, little value), or expose the differences via flags and end up with two very different implementations under the hood side-by-side (less clear than two independent implementations).
Have you ever seen a well structured 1000 line function?
I'm sure they exist - maybe some sort of exceedingly complicated data transform or something. But in almost every situation I've seen, a 1000 line function has countless side effects, probably sets a few globals, takes loads of poorly named arguments, each of which is a nested data structure which it reaches deeply into and often has the same for loop copied and pasted 10 times with one character changed.
Often a 1000 line function is actually 5 or 6 20 line functions. I'm sure there are legitimate exceptions, but I've never seen them.
Going further, I'll take a 1000-line shitty code, over split-to-small-functions shitty code. In the long code, all I have to think about is the code. With the functions, I have to pay attention to what calls what, also also because the code is shitty, surely the function names also are, adding two things at the same time to the confusion mix.
It is easy to nod along when someone speaks about different styles. But there are also a few objective truths down there, and it makes sense to try to identify them.
For example, I have been at this for over three decades now, and there are some things that almost never fails. From the article, the kind of person who advocates for the more "testable" code with a few more lines and more abstractions, is never the same person who can maintain that codebase a handful years later.
That should tell us something. For what it's worth, I agree with the article that simpler is better, which often coincides with fewer lines of code. I personally wouldn't have chosen objects that look like "pizza.Sliced = box.SlicePizza()" but most of the time the structure is already in place and it is best to go along with it.
As to that 1000 line function, if it is in an imperative style it might well be the easiest form to read. Have you seen the Python source code? That language success owes to a simple interpreter with ginormous functions that anyone and their brother can read from top to bottom and dare modify without having the brain the size of a planet.
> In this case I hope nobody is proposing a single 1000-line god function.
this made me feel a certain type of way. (dont ever look at video game source code, by the way; 1000-lines is quite short by some standards)
if a 1000-line long main is what makes sense then you should do that.
I find 1000-line long methods which are linear far easier to read than code which has every method call broken out into its own method. it's so bad I literally can't read JavaScript that is written in the contemporary style anymore. absolutely impenetrable for me.
it's true that I am not a "real" developer in that I don't work on code full-time, but I've written probably millions of lines of code in my 30-year career. I am not a novice.
if the solution calls for a 1000-line main method, then that's what I'm writing, "best practices" can go in the corner and cry. I'm writing what I need to solve the problem and nothing more.
My biggest pain is Javascript developers who get to high on Java concepts, most often after using NestJS. Providers, Models, Services and what not.
Remember an import script I wrote in ExpressJs. Was like 50 lines. Did things like copy databases, clean up config etc. There were hardly any layered ifs, just steps, I didn't see much use in breaking it up, was easy to read.
Another developer, who was smart but liked abstract concepts, overenginered the hell out of it, moving it to 20 places, a bunch of provider, and I could never find & make sense out of it after that, was very hard to read was going on. Was such a pain always to update it.
The main reason I have a distaste for dependency injection is because of this, promotes separating code into multiple places and over-abstracting things, making code hard to follow. Most of the times it is not worth the trade-off.
Doing module mocking for unit tests instead of dependency injection in runtime code is almost always a better idea in my opinion. Dependency injection was invented for languages that can't do module mocking.
Some programming language implementations and operating systems have more overhead for function calls, green threads, threads, and processes.
If each function call creates a new scope, and it's not a stackless language implementation, there's probably a hashmap/dict/object for each function call unless TCO Tail-Call Optimization has occurred.
Though,
function call overhead may be less important than Readability and Maintainability
The compiler or interpreter can in some cases minimize e.g. function call overhead with a second pass or "peephole optimization".
Code linting tools measure [McCabe,] Cyclomatic Complexity but not Algorithmic Complexity (or the overhead of O(1) lookup after data structure initialization).
Sometimes I use an anonymous scope instead of extracting a single use function. This is especially nice when you would otherwise have many parameters/returns
The example code is vey simplistic, so of course that linear code is more readable, but the idea doesn’t scale.
I think you have to consider things like reusability and unit-test-ability as well, and having all your code in a single function can make reasoning about it more difficult due to all the local variables in scope that you need to consider as possibly (maybe or maybe not) relevant to the block of code you’re reading.
That being said, when I look back on my younger, less experienced days, I often fell into the trap of over-refactoring perfectly fine linear code into something more modular, yet less maintainable due to all the jumping around. There is something to be said for leaving the code as you initially wrote it, because it is closer to how your mind was thinking at the time, and how a readers mind will also probably be interpreting the code as well. When you over-refactor, that can be lost.
So I guess in summary, this is one of those “programming is a craft” things, where experience helps you determine what is right in a situation.
> The example code is vey simplistic, so of course that linear code is more readable, but the idea doesn’t scale.
One of the best reviewed functions I wrote at work is a 2000 line monster with 9 separate variable scopes (stages) written in a linear style. It had one purpose and one purpose only. It was supposed to convert from some individual html pages used in one corner of our app on one platform into a carousell that faked the native feel of another platform. We only needed that in one place and the whole process was incredibly specific to that platform and that corner of the app.
You could argue that every one of those 9 scopes could be a separate function, but then devs would be tempted to reuse them. Yet, each step had subtle assumptions about what happened before. The moment we would have spent effort to make them distinct functions we would have had to recheck our assumptions, generalize, verify that methods work on their own... For code that's barely ever needed elsewhere. We even had some code that was similar to some of the middle parta of the process... But just slightly didn't fit here. Changing that code caused other aspects of our software to fail.
The method was not any less debuggable, it still had end to end tests, none of the intermediate steps leaked state outside of the function. In fact 2 other devs contributed fixes over time. It worked really well. Not to mention that it was fast to write.
Linear code scales well and solves problems. You don't always want that but it sure as hell makes life easier in more contexts than you'd expect.
Note. Initial reactions to the 2000 line monster were not positive. But, spend 5 minutes with the function, and yeah... You couldn't really find practical flaws, just fears that didn't really manifest once you had a couple tests for it.
I don't know if it is still like this, but the code for dpkg used to be like this, and it was amazing: if you ever needed to know in exactly what order various side effects of installing a package happened in, you could just scroll through the one function and it was obvious.
To this end, I'd say it is important to be working in a language that avoids messing up the logic with boiler plate, or building some kind of mechanism (as dpkg did) to ease error handling and shove it out of the main flow; this is where the happy path shines: when it reads like a specification.
I don't think the fact that a function works well is a good enough reason to write a 2000 line function. Sometimes there are long pieces of code that implement complex algorithms that are difficult to break into smaller pieces of code, but those cases are limited to the few you mentioned.
>The moment we would have spent effort to make them distinct functions we would have had to recheck our assumptions, generalize, verify that methods work on their own
Why? Why can't the functions say "to be used by <this other function>, makes assumptions based on that function, do not use externally"? Breaking out code into a function so that the place it came from is easier to maintain... does not mandate that the code broken out needs to be "general purpose".
I worked with an engineer that wrote the most clear and elegant linear code. It was remarkable, never seen anything like it since. I can't reproduce it but I do have an idea of what a well designed linear function looks like.. a story.
At first I thought how horrible, but basically you have sort of 9 functions within the same scope, each having a docstring. So I guess not too different from splitting them up.
I read you have "end to end" tests.
One question though: Wouldn't each part benefit for having their own unit tests?
If the sub-functions could be reused and people would be tempted to change them, then that’s what your tests are for. In fact, it’s often tricky to test the sun-function logic without pulling them out because to write the test you have to figure out how to trick the outer function to get into certain states. Follow the Beyoncé rule: if you like it: put a test on it. Otherwise it’s on you if someone breaks it.
Isn’t that the fucking point? Having a 2000 line function is a code smell so bad, I don’t care how well the function works. It’s an automatic review fail in my book. Abstractions, closures, scope, and most importantly - docs to make sure others use your functions the way you intended them. Jesus.
So where's the proof that the function'd code scales? As the complexity of the overall code grows, so would something that gets chopped into dozens of functions to the point of being unreadable.
Suddenly, you realize that the dozens of functions __need to be called in specific orders__, and they are each only ever used once. So really what you're doing is forcing someone to know the magic order these functions are composed in order for them to be of any use.
Unfortunately organizing your code along the right lines of abstraction is something that just takes skill and can't easily be summarized in the form of "just always do this and your code will be better"
If you organize your code into units that are easy to recompose and remix, well you get huge benefits when you want recompose and remix things.
If you organize your code into units that can't be easily recomposed, then yes you've added complexity for no benefit. But why make units that can't be treated individually?
"As the complexity of the overall code grows, so would something that gets chopped into dozens of functions to the point of being unreadable."
So the answer to this is, "don't chop it into functions in a way that leaves it unreadable, instead chop it into functions in a way that leaves it more readable."
That may be unsatisfying, but it gets to the point that blindly applying rules is not always going to lead to better code. But it doesn't mean that an approach has no value.
The API shouldn't be that. Expose something easy to use. That is the point of abstractions. It doesn't matter if there are a dozen methods called in order if those dozen methods are called by a helper method, beyond maybe some implementation details.
Really the question should always come up when there are more than say two ways to do things. If I can make a pizza from scratch, reheat a chilled pizza, create a pizza and chill it, reheat a half dozen pizzas, or make three pizzas of the same kind and chill them suddenly the useful abstractions are probably something you can figure out between those helper methods.
Honestly that is the real fear of the left way of thinking. If you add a quantity, whether to cook and whether to chill parameters you end up with a hard API where certain combinations of parameters don't make sense.
Have a clean API and make the implementation as simple as is feasible. Reuse via functions when it makes sense but don't add them willy nilly.
Aka "it is a craft and you figure things out" as someone said in the comments here
I'm very dubious of anyone resorting to "readability" as a justification.
What you're doing by breaking things into functions is trying to prevent it's eventual growth into a bug infested behemoth. In my experience, nearly every case where an area of a code base has become unmaintainable - it generally originates in a large, stateful piece of code that started in this fashion.
Every one who works in said area then usually has the option of either a) making it worse by adding another block to tweak it's behaviour, or b) start splitting it up and hope they don't break stuff.
I don't want to see the "how" every time I need to understand the "what". In fact, that is going to force me to parse extraneous detail, possibly for hundreds of lines, until I find the bit that actually needs to be changed.
> Suddenly, you realize that the dozens of functions __need to be called in specific orders__, and they are each only ever used once. So really what you're doing is forcing someone to know the magic order these functions are composed in order for them to be of any use.
That's where nested functions show their true utility. You get short linear logic because everything is in functions, but the functions are all local scope so you get to modify local scope with them, and because the functions are all named, it is easy to determine what is going on.
In a decent programming language you can nest functions, so all the little functions that make up some larger unit of the program are contained within (and can only be called within) that outer function. They serve less as functions to be called and more just as names attached to bits of code. And since they can't be called anywhere else, other people don't need to worry about them unless they're working on that specific part of the program.
If you have dozens of functions that need to be called in specific orders, design and use a state machine and then use a dispatch function that orchestrates the state machine.
> this is one of those “programming is a craft” things, where experience helps you determine what is right in a situation.
You are right here.
The key insight on why giant linear functions are often more readable (and desirable) is because they allow you to keep more concepts/relationships simultaneously together as a single chunk without context switching which seems to aid our comprehension. An extreme proponent is Arthur Whitney (inventor of the K language) who writes very terse (almost incomprehensible to others) code so as to accommodate as much as possible in a single screen.
Two examples from my own experience;
1) I found reading/understanding/debugging a very large Windows message handler function (i.e. a WndProc with a giant switch statement containing all the business logic) far easier than the same application rewritten in Visual C++ where the message handlers were broken out into separate functions.
2) The sample code for a microcontroller showed an ADC usage example in two different ways; One with everything in the same file and another where the code was distributed across files eg. main.c/config.c/interrupts.c/timer.c/etc. Even though the LOC was <200 i found the second example hard to understand simply because of the context switch involved.
> The key insight on why giant linear functions are often more readable (and desirable) is because they allow you to keep more concepts/relationships simultaneously together as a single chunk without context switching which seems to aid our comprehension.
The problem with giant linear functions is that those concepts get separated by sometimes thousands of lines. Separating out the high-level concepts vs the nitty-gritty details, putting the latter in functions that then get called to implement the high-level concepts, does in my experience in most cases a better job of keeping related things together.
YEAH, but the moral that should be taken from that is not "it's always better to write huge, linear functions". Rather, "there are cases where huge, linear functions make sense because of the way the code needs to interact with things". Along the same lines, there are cases where breaking the code up into smaller functions, and calling them from the main function, makes more sense".
> an extreme proponent is Arthur Whitney (inventor of the K language) who writes very terse (almost incomprehensible to others) code
But k has a small set of built-in commands and a built-in database; it was made for fast analysis of stock information, so with that you have everything you need and you use the same semantics. The only thing you need to know is the data structure and you can build whatever you need.
So in this way, it's very likely that, given two tables A + B and 'bunch of operations' X on A and 'bunch of operations Y' on B where Y depends on the result of X, and given the tasks to;
- create X' = X
- create XY' = X + Y
to implement XY without knowing X already exists rather than figure out X exists and reuse it.
The problem with not k (or programs written in similar style; it doesn't really matter what the programming language is), that we have learned to use the second style from the article, and, more extreme, to separate everything out in layers. You cannot even reach the data model without going through a layer (or more) of abstractions which makes it necessary not only to know the datamodel in detail but also find the matching findXinAandApplyWithYToB(). Where X & Y & A & B are often some kind of ambiguous and badly named entities. And then there is of course badly designed databases which is also quite the norm as far as we see, so there is a much lower data integrity which means that if you create something without checking all the code that touches it, that you might change something and the data becomes inconsistent.
I notice the same when working on systems built with stored procedures on MSSQL/Postgres; it is far quicker to oversee and (at least basically) understand the datamodel (even with 1000+ tables, which is rather normal for systems we work with) than it is to understand even a fraction of a, let's say Go, codebase. So when asked to do do a task XY', you are usually just not searching for X'; you are simply reading the data used in X & Y and whop up a procedure/query/whatever yourself. It's simply much faster as you have a restricted work surface; the model and sql (I know, you can use almost any language in postgres, but let's not here) and you can reason about them and the tasks at hand when you shut off internet and just use your sql workbench.
I have seen many instances where people just out of habbit factor out a lot of linear code that will never be reused into separate functions.
These pieces of code then often end up being private functions of a class. With state. Since they are private functions now, they are not really testable.
So now we got a lot of private functions that are only called once and typically modify side effect state. When these functions are grouped together with the caller, it is actually still a bit readable in simple cases.
But then after a while someone adds other functions in between the calling function and the factored out ones.
Now we have bits and pieces modifying different side effect state that no one knows if they are called from different places without getting a call graph or doing a search in the class file.
If you insist on making the code non-linear, I'd beg you to at least consider making these factored out private funcs inner funcs of the calling function if your language supports that. This makes it clear that these functions won't be called from anywhere else.
As with so many things in life, in a real codebase this is not an either/or, but an art of combining the two into something that stays readable and maintainable.
If the function was truly linear having a long function wouldn't be so bad. But it actually isn't, the example contains multiple branches!
Will people bother testing all of them? Or will they write a single test, pass in a pizza and just glance at it actually working? My guess is the latter, as testing multiple branches from outside is often tedious, vs testing smaller specialized functions.
> The example code is vey simplistic, so of course that linear code is more readable, but the idea doesn’t scale.
...that's basically why common sense and taste in programming is still required, it's not a purely mechanical task. That's also why I'm not entirely a fan of automatic code formatting tools, they don't understand the concept of nuance.
Everyone saying "linear code doesn't scale" actually has it backwards - it's concise functions with a deeply nested call stack that really becomes a nightmare in large codebases. It's never obvious where new code should be added, the difficulty of understanding what the effects of your changes will be increases exponentially since you have to trace all the possible ways code can get called, you end up with duplicated subroutines, etc etc.
99% of the time, you haven't actually come up with a good abstraction, so just write some linear code. Prefer copy/pasting to dubious function semantics.
Another risk is if you add print_table() then someone else is going to find it and use it in their code, but also add a little flag to adjust the output for their use case.
I think we all know at least some functions like this in a code base. All it takes is for a newcomer to come across a complex function that they need to update some logics for but also don't understand it enough to refactor, so they just added some parameters with default values and call it a day.
That’s what tests are for. And if `print_table` is factored properly then they won’t want to add flags, they’ll make a new function out of the pieces of `print_table` that has distinct behavior of its own.
Well you're describing a readability problem. And you're essentially saying readability is what causes it not to scale.
If we consider the concepts orthogonally meaning we don't consider the fact that readability can influence scalability then "everyone" is fully correct. Linear code doesn't scale as well as modular code. The dichotomy is worth knowing and worth considering depending on the situation.
That being said I STILL disagree with you. Small functions do not cause readability issues if those functions are PURE. Meaning they don't touch state. That and you don't inject logic into your code, so explicitly minimize all dependency injection and passing functions to other functions.
Form a pipeline of pure functions passing only data to other functions then it all becomes readable and scalable. You'll much more rarely hit an issue where you have to rewrite your logic because of a design flaw. More often then not by composing pure functions your code becomes like legos. Every refactoring becomes more like re-configuring and recomposing existing primitives.
I disagree. It's not the purity of the functions, its having to know the details of them. The details, which could have existed here, are now in two other places. If you need to figure out how a value is calculated, and you use a half dozen functions to come to that value, you now have a half dozen places you need to jump to within the codebase.
Small functions increase the chances of you having to do this. Larger ones decrease it, but can cause other issues.
Also, many small functions doesn't make code modular. Having well defined, focused interfaces (I don't mean in the OO sense) for people to use makes it modular. Small functions don't necessarily harm it, but if you're not really good at organizing things they definitely can obscure it.
I think you’re right about side effects being the missing ingredient to this discussion, that is leading people to talk past each other. The pattern’s sometimes called “imperative shell, functional core”.
And I totally agree, this is how you write large code bases without making them unmaintainable.
Where to go “linear” vs “modular” is an important design choice, but it’s secondary to the design choice of where to embed state-altering features in your program tree.
I think people dislike modular code because they want to have all the “side-effects” visible in one function. Perhaps they’ve only worked in code bases where people have made poor choices in that regard.
But if you can guarantee and document things like purity, idempotency, etc, you can blissfully ignore implementation details most of the time (i.e. until performance becomes an issue), which is definitionally what allows a codebase to scale.
The example code would be less distracting if it at least attempted to stick to the pizza metaphor in a meaningful way and weren't subpar Go code.
`prepare` is a horrible name for a function. I would expect a seasoned Gopher to call it something like `NewPizzaFromOrder`.
I don't see any reason for putting `addToppings` in its own function. If you have to have it, I personally would have made it a method on Pizza something like `func (p *Pizza) WithToppings(topping ...Topping) *Pizza { /* ... */ }`. Real pizza is mutable, so the method mutates the receiver.
Why is a new oven instantiated every time you want to bake a pizza? You should start with an oven you already have, then do `oven.Preheat()`, and then call call `oven.Bake(pizza)`. You can take this further by having `oven.Preheat()` return a newtype of Oven which exposes `.Bake()` so that you can't accidentally bake something without preheating the oven first. Maybe elsewhere `Baker` is an interface, and you have a `ToasterOven` implementation that does not require you to preheat before baking because it's just not as important.
Without changing the code, I'd also reorder the declarations to be more what you'd expect (so you don't have to jump up and down the page as you scan through functions that call each other).
IDK I have to leave now but there are just so, so many ways in which the code is already a deeply horrible example to even start picking apart the "which is more readable" debate.
John carmack said much the same and I have been following it ever since. Of course linear code is easier to read, if follows the order of execution. It minimizes eye saccades.
Some code needs to be non-linear for reuse. Then execution is a graph. If you code does not exploit code reuse from a graph structure, do not bother introducing vertexes where a single edge suffices.
Something Carmack calls out but the OP doesn't is that if you can break out logic with no side effects into its own function that's usually a good idea. I think the left side would have benefited from
pizza.Toppings = get_pizza_toppings(order.kind)
in this case to keep the mutation of the pizza front and center in the main function here.
I actually sort of agree that linear code is more readable, but that’s not what makes good code practices alone. So while good linear code is more readable, at least in my opinion, it’s also a lot less maintainable and testable. I have a few decades of experience now, I even work a side gig as an external examiner for CS students, and the only real world good practices I’ve seen over the years is keeping functions small. I know, I know, I grade students on a lot of things I don’t believe in. I’m not particularly fond of abstraction, or even avoiding code-duplication at all costs and so on, but “as close to single purpose” functions as you can get, do that, and the future will thank you for it.
Because what is going to happen when the code in those examples run in production over a decade is that each segment is going to change. If you’re lucky the comments will be updated as that happens, but they more than likely won’t. The unit test will also get more and more clunky as changes happen because it’s big and unwieldy, and maybe someone is going to forget to alter the part of it that wasn’t obviously tied to a change. The code will probably also become a lot less readable as time goes by, not by intend or even incompetence but mostly due to time pressure or other human things. So yes, it’s more readable, and in the perfect world you probably wouldn’t need to separate your concerns, but we live in a very imperfect world and the smaller and less responsibility you give your functions the easier it’ll be to deal with that imperfection as time goes on.
Sure, it's less testable BUT in the specific case at hand it's all mutations that need to be performed in a specific sequence. IMO if you are taking an object through a specific set of states, you either split that and use types to mark the transitions (bakePizza takes a RawPizza and returns a BakedPizza, enforcing the order of calls at compile time) or you write one big function because it doesn't make sense to create a pizza and then not bake it before you box it.
I obviously prefer the former for readability, correctness, and testability etc. However, in most PL changing the type of an object involves creating a new object and has a runtime cost. For hot code path, it makes sense to mutate in place, but in that case it's better to keep it all in one linear function.
I recently started reading Sussman's Software Design for Flexibility and what you write is directly in line with that book
https://mitpress.mit.edu/9780262045490/
Hard agree. And I used to belong to the other camp.
The basic tension here is between locality [0], on the one hand, and the desire to clearly show the high-level "table of contents" view on the other. Locality is more important for readable code. As the article notes, the TOC view can be made clear enough with section comments.
There is another, even more important, reason to prefer the linear code: It is much easier to navigate a codebase writ large when the "chunks" (functions / classes / whatever your language mandates) roughly correspond to business use-cases. Otherwise your search space gets too big, and you have to "reconstruct" the whole from the pieces yourself. The code's structure should do that for you.
If a bunch of "stuff" is all related to one thing (signup, or purchase, or whatever), let it be one thing in the code. It will be much easier to find and change things. Only break it down into sub-functions when re-use requires it. Don't do it solely for the sake of organization.
I went the opposite direction: I used to be in the linear code camp, and now I'm in the "more functions" camp.
For me the biggest reason is state. The longer the function, the wider the scope of the local variables. Any code anywhere in the function can mutate any of the variables, and it's not immediately clear what the data flow is. More functions help scopes stay small, and data flow is more explicit.
A side benefit is that "more functions" helps keep indentation down.
At the same time, I don't like functions that are too small, otherwise it's hard to find out where any actual work gets done.
> Any code anywhere in the function can mutate any of the variables
Regardless of the language I'm using, I never mutate values. Counters in loops or some other hyper-local variables (for performance) might be the inconsequential exceptions to this rule.
> More functions help scopes stay small, and data flow is more explicit.
Just write your big function with local scope sections, if needed (another local exception to the rule above). Eg, in JS:
let sectionReturnVal
{
// stuff that sets sectionReturnVal
}
or even use IIFE to return the value and then you can use a const. "A function, you're cheating!" you might say, but my goal is not to avoid a particular language construct, but to maintain locality, and avoid unnecessary names and jumping around.
> A side benefit is that "more functions" helps keep indentation down.
It is also worth noting that solving this problem with function extraction can often be a merely aesthetic improvement. That is, you will still need to keep hold the surrounding context (if not the state) in your head when reading the function to understand the whole picture, and the extraction makes that harder.
Using early returns correctly, by contrast, can actually alleviate working memory issues, since you can dismiss everything above as "handling validation and errors". That is, even though technically, no matter what you do, you are spidering down the branches of control flow, and therefore in some very specific context, the code organization can affect how much attention you need to pay to that context.
> I don't like functions that are too small, otherwise it's hard to find out where any actual work gets done.
Precisely, just take this thinking to its logical conclusion. You can (mostly) have your cake and eat it too.
The better solution to this is to use nested functions that are immediately called, rather than top level functions. That lets you cordon off chunks of state while still keeping a linear order of definition and execution. And you don't have to worry about inadvertently increasing your API maintenance burden because people started to depend on those top level functions later.
> Only break it down into sub-functions when re-use requires it. Don't do it solely for the sake of organization
What about for testing? What about for reducing state you need to keep in mind? What about releasing resources? What about understanding the impact of a change? Etc.
Consider an end of day process with 10 non-reusable steps that must run in order and each step is 100 lines. Each step uses similar data to the step before it so variables are similar but not the same. You would really choose a 1000 line single function?
For "use-case" code like this with many steps, you are typically testing how things wire together, and so will either be injecting mocks to unit test, in which case it is not a problem, or wanting to integration or e2e test, in which case it is also not a problem.
If complex, purely logical computation is part of the larger function, and you can pull that part out into a pure function which can be easily unit tested without mocks, that is indeed a valid factoring which I support, and an exception to the general rule.
> What about for reducing state you need to keep in mind?
Typically not a problem because if the function corresponds to a business use-case, you and everybody else is already thinking about it as "one thing".
> What about releasing resources?
Not a problem I have ever once run into with backend programming in garbage collected languages. Obviously if you are in a different situation, YMMV.
> Consider an end of day process with 30 non-reusable steps that must run in order and each step is 100 lines.
I would use my judgement and might break it down. Again, I have never encountered such a situation in many years of programming.
You seem to be trying to find the (ime) rare exceptions as if those disprove the general rule. But in practice the "explode your holistic function unnecessarily into 10 parts" is a much more common error than taking "don't break it down" too far.
let DebugFlags = {StepOne=false, StepTwo=false, StepThree=true};
if (DebugFlags.StepOne) { ... }
if (DebugFlags.StepTwo) { ... }
if (DebugFlags.StepThree) { ... }
Your training in structured, DRY and OOP will recoil at this: More branches! Impossible. But your spec says "must run in order". It does this by design. Every resource can be tracked by reading it top to bottom, and the only way in which you can miss it is through a loop, which you can also aim to minimize usage of. The spec also says "uses similar data to the step before it". If variables are similar-not-same, enclose them in curly braces so that you get some scope guarding. The debug flags contain the information needed to generate whatever test data is necessary. They can alternately be organized as enumerated state instead of booleans: {All, TestOne, TestTwo, TestThree}.
Long, bespoke linear sequences can be hairy, but the tools to deal with them are present in current production languages without atomizing the code into tiny functions. Occasionally you can find a useful pattern that does call for a new function, and do a "harvest" on the code and get its size down. But you have to be patient with it before you have a good sense of where a new parameterized function gets the right effect, and where inlining and flagging an existing one will do better.
In this case I hope nobody is proposing a single 1000-line god function. Nor is a maximum of 5 lines per function going to read well. So where do we split things?
This requires judgment, and yes, good taste. Also iteration. Just because the first place you tried to carve an abstraction didn’t work well, doesn’t mean you give up on abstractions; after refactoring a few times you’ll get an API that makes sense, hopefully with classes that match the business domain clearly.
But at the same time, don’t be over-eager to abstract, or mortally offended by a few lines of duplication. Premature abstraction often ends up coupling code that should not have to evolve together.
As a stylistic device, extracting a function which will only be called in one place to abstract away a unit of work can really clean up an algorithm; especially if you can hide boilerplate or prevent mixing of infra and domain concerns like business logic and DB connection handling. But again I’d recommend using this judiciously, and avoiding breaking up steps that should really be at the same level of abstraction.
This is the key. Novice devs tend to write giant functions. Zealot devs who read books like Clean Code for the first time tend to split things to a million functions, each one a few lines long (pretty sure the book itself says no more than 5 lines for each function). I worked with a guy who extracted each and every boolean condition to a function because "it's easier to read", while never writing any comments because "comments are bad" (according to the book). I hate that book, it creates these zealots that mindlessly follow its bad advices.
The truth is that what makes code readable is not really (directly!) about function size in the first place. It's about human perceptual processing and human working memory. Readable code is easily skimmable, and should strive to break the code up into well-defined contexts that allow the programmer to only have to carry a handful of pieces of information in their head at any given moment. Sometimes the best way to do that is a long, linear function. Sometimes it's a mess of small functions. Sometimes it's classes. Which option you choose ultimately needs to be responsive to the natural structure of the domain logic you're implementing.
And, frankly, I think that both versions do a pretty poor job of that, because, forget the style, the substance is a mess. They're both haphzardly newing up objects and mutating shit all over the place. This code reads to me like the end product of about four sprints' worth of rushing the code so you can get the ticket closed just in time for sprint review.
I mean, let's just think about this as if we were describing how things work in a real kitchen, since I think that's pretty much what the example is asking us to do, anyway: on what planet does a pizzeria create a new, disposable oven for every single pizza? What the heck does
even mean? Now we've got a box containing a pizza that's storing information about the state of the object that contains it, for some reason? Demeter is off in a corner crying somewhere. What on earth is going on with that 'if order.kind == "Veg"' business, why aren't we just listing the ingredients on the order and then iterating over that list adding the items to the pizza? The logic for figuring out which ingredients go on the pizza never belonged in this routine in the first place; it's ready aim fire not ready fire aim. etc.I don't regret the learning, but I do regret being dogmatic. It was interesting that no one around me knew better either way, or felt they could provide reasonable mentorship, so we went too far with it. These days I write the pizza function on the left, and use comments sparingly where they add context and reasoning.
I think both 100 and 20 are a bit low, but much better than 5. As I mentioned in a comment a few days ago when I also corrected someone that misremembered a detail from the book, I am not a huge fan. But I also think it is mostly correct about most things, and not as terribly bad as some say. Listening to fans of the book is more annoying than to actually read the book.
(And that other comment when I corrected someone was about bad comments. Clean Code definitely does not say that you shall never comment anything.)
Obviously, readability is important, but I've also seen things like this so often in my career where it's used as an excuse for anything. Most recently, trying to stop a teammate turning nearly every class into a singleton for the sake of "simplicity" and "readability", which I thought was a real stretch.
The book was written by a Java dev who was dipping his toe into Ruby.
Go code, covered everywhere in an obnoxious rash of error handling, will be bigger.
Why not? Who said it's worse? What study settles the issue?
Some times a "1000-line god function" is just what the domain needs, and can be way more readable, with the logic and operations consolidated, than 20 50 line functions, that you still have to read to understand the whole thing (and which then someone will be tempted to reuse a few, adjust them for 2-3 different needs not had by your original operation, and tie parts of the functions implementing your specific logic to irrelevant to it use cases).
And if it's a pure 1000-line function, it could even be 10,000 lines for all I care, and it would still be fine.
Do you have other examples of 50+ lines functions where you thought it was the best to not separate issues?
Also: cooking recipes are also very abstracted. When they say you need to lightly fry onions they assume you know a way to cut onions and a lightly frying algorithm already. If they would inline everything it would become unreadable.
Code is very similar. If you want it strictly without abstractions it will be as low level as your language allows you, and that is definitely not readable code.
If you e.g. instead of using pythons "decode" method tries to do unicode decoding yourself it would become very hard to understand what your program is actually about. Now there are probably zero people who would do that, because the language provides a simple and well tested abstraction — but what makes that different from you creating your own simple and well tested abstraction and using that throughout the actual business logic of your code?
The hard part is creating abstractions that are so well chosen that nobody will have to ever touch them again.
- Linear code: The meat frying (state-producing) and deglazing (state-requiring) steps are below each other in the same recipe, so to verify that it works you can just linearly go through line by line. However if the recipe becomes long and a lot of stuff happens in between, it's no longer obvious. You'll have to use good comments ("// leave residue in the pan, we'll need it for the fond") because otherwise you might accidentally refactor in a way that violates the precondition (swaps/scrubs the pan).
- Modular code: You need to clearly describe the precondition on the fond preparation subroutine to have any chance to keep using it correctly. On one hand this forces documentation, on the other hand it's probably still easier to forget since the subroutine call ("Prepare the fond.") doesn't directly make the precondition obvious.
Either way has its advantages and drawbacks, and the right choice depends on the circumstances. This is assuming you only want to cook this specific meal and aren't writing a cookbook - otherwise you should definitely modularize to remove repetition.
A relatively common piece of feedback from me to the team at work is usually to take a half step back and look at the larger problem domain and consider whether these things are necessarily the same, or coincidentally the same.
Just because the lines of code look similar right now doesn't mean they need to be that way or need to stay that way. Trying to mash together two disparate use cases because "the code's basically repeated" is often how you get abstractions that, especially over time, end up not actually abstracting anything.
As the various use cases get too divergent, the implementations either move much of the logic up to the caller (shallow abstractions, little value), or expose the differences via flags and end up with two very different implementations under the hood side-by-side (less clear than two independent implementations).
I’ll take well-structured 1000-lines function over bad spaghetti of hundreds small functions any day.
I'm sure they exist - maybe some sort of exceedingly complicated data transform or something. But in almost every situation I've seen, a 1000 line function has countless side effects, probably sets a few globals, takes loads of poorly named arguments, each of which is a nested data structure which it reaches deeply into and often has the same for loop copied and pasted 10 times with one character changed.
Often a 1000 line function is actually 5 or 6 20 line functions. I'm sure there are legitimate exceptions, but I've never seen them.
For example, I have been at this for over three decades now, and there are some things that almost never fails. From the article, the kind of person who advocates for the more "testable" code with a few more lines and more abstractions, is never the same person who can maintain that codebase a handful years later.
That should tell us something. For what it's worth, I agree with the article that simpler is better, which often coincides with fewer lines of code. I personally wouldn't have chosen objects that look like "pizza.Sliced = box.SlicePizza()" but most of the time the structure is already in place and it is best to go along with it.
As to that 1000 line function, if it is in an imperative style it might well be the easiest form to read. Have you seen the Python source code? That language success owes to a simple interpreter with ginormous functions that anyone and their brother can read from top to bottom and dare modify without having the brain the size of a planet.
this made me feel a certain type of way. (dont ever look at video game source code, by the way; 1000-lines is quite short by some standards)
if a 1000-line long main is what makes sense then you should do that.
I find 1000-line long methods which are linear far easier to read than code which has every method call broken out into its own method. it's so bad I literally can't read JavaScript that is written in the contemporary style anymore. absolutely impenetrable for me.
it's true that I am not a "real" developer in that I don't work on code full-time, but I've written probably millions of lines of code in my 30-year career. I am not a novice.
if the solution calls for a 1000-line main method, then that's what I'm writing, "best practices" can go in the corner and cry. I'm writing what I need to solve the problem and nothing more.
Remember an import script I wrote in ExpressJs. Was like 50 lines. Did things like copy databases, clean up config etc. There were hardly any layered ifs, just steps, I didn't see much use in breaking it up, was easy to read.
Another developer, who was smart but liked abstract concepts, overenginered the hell out of it, moving it to 20 places, a bunch of provider, and I could never find & make sense out of it after that, was very hard to read was going on. Was such a pain always to update it.
Doing module mocking for unit tests instead of dependency injection in runtime code is almost always a better idea in my opinion. Dependency injection was invented for languages that can't do module mocking.
Cyclomatic complexity: https://en.wikipedia.org/wiki/Cyclomatic_complexity
Overhead: https://en.wikipedia.org/wiki/Overhead_(computing)
Some programming language implementations and operating systems have more overhead for function calls, green threads, threads, and processes.
If each function call creates a new scope, and it's not a stackless language implementation, there's probably a hashmap/dict/object for each function call unless TCO Tail-Call Optimization has occurred.
Though, function call overhead may be less important than Readability and Maintainability
The compiler or interpreter can in some cases minimize e.g. function call overhead with a second pass or "peephole optimization".
Peephole optimization: https://em.wikipedia.org/wiki/Peephole_optimization
Code linting tools measure [McCabe,] Cyclomatic Complexity but not Algorithmic Complexity (or the overhead of O(1) lookup after data structure initialization).
C. Muratori calls this method "semantic compression" . https://caseymuratori.com/blog_0015
I think you have to consider things like reusability and unit-test-ability as well, and having all your code in a single function can make reasoning about it more difficult due to all the local variables in scope that you need to consider as possibly (maybe or maybe not) relevant to the block of code you’re reading.
That being said, when I look back on my younger, less experienced days, I often fell into the trap of over-refactoring perfectly fine linear code into something more modular, yet less maintainable due to all the jumping around. There is something to be said for leaving the code as you initially wrote it, because it is closer to how your mind was thinking at the time, and how a readers mind will also probably be interpreting the code as well. When you over-refactor, that can be lost.
So I guess in summary, this is one of those “programming is a craft” things, where experience helps you determine what is right in a situation.
One of the best reviewed functions I wrote at work is a 2000 line monster with 9 separate variable scopes (stages) written in a linear style. It had one purpose and one purpose only. It was supposed to convert from some individual html pages used in one corner of our app on one platform into a carousell that faked the native feel of another platform. We only needed that in one place and the whole process was incredibly specific to that platform and that corner of the app.
You could argue that every one of those 9 scopes could be a separate function, but then devs would be tempted to reuse them. Yet, each step had subtle assumptions about what happened before. The moment we would have spent effort to make them distinct functions we would have had to recheck our assumptions, generalize, verify that methods work on their own... For code that's barely ever needed elsewhere. We even had some code that was similar to some of the middle parta of the process... But just slightly didn't fit here. Changing that code caused other aspects of our software to fail.
The method was not any less debuggable, it still had end to end tests, none of the intermediate steps leaked state outside of the function. In fact 2 other devs contributed fixes over time. It worked really well. Not to mention that it was fast to write.
Linear code scales well and solves problems. You don't always want that but it sure as hell makes life easier in more contexts than you'd expect.
Note. Initial reactions to the 2000 line monster were not positive. But, spend 5 minutes with the function, and yeah... You couldn't really find practical flaws, just fears that didn't really manifest once you had a couple tests for it.
To this end, I'd say it is important to be working in a language that avoids messing up the logic with boiler plate, or building some kind of mechanism (as dpkg did) to ease error handling and shove it out of the main flow; this is where the happy path shines: when it reads like a specification.
Why? Why can't the functions say "to be used by <this other function>, makes assumptions based on that function, do not use externally"? Breaking out code into a function so that the place it came from is easier to maintain... does not mandate that the code broken out needs to be "general purpose".
I read you have "end to end" tests.
One question though: Wouldn't each part benefit for having their own unit tests?
Good thinking. Now they’ll just add 50 flags and ten levels of nested ifs instead which is much simpler.
Isn’t that the fucking point? Having a 2000 line function is a code smell so bad, I don’t care how well the function works. It’s an automatic review fail in my book. Abstractions, closures, scope, and most importantly - docs to make sure others use your functions the way you intended them. Jesus.
Suddenly, you realize that the dozens of functions __need to be called in specific orders__, and they are each only ever used once. So really what you're doing is forcing someone to know the magic order these functions are composed in order for them to be of any use.
Unfortunately organizing your code along the right lines of abstraction is something that just takes skill and can't easily be summarized in the form of "just always do this and your code will be better"
If you organize your code into units that are easy to recompose and remix, well you get huge benefits when you want recompose and remix things.
If you organize your code into units that can't be easily recomposed, then yes you've added complexity for no benefit. But why make units that can't be treated individually?
"As the complexity of the overall code grows, so would something that gets chopped into dozens of functions to the point of being unreadable."
So the answer to this is, "don't chop it into functions in a way that leaves it unreadable, instead chop it into functions in a way that leaves it more readable."
That may be unsatisfying, but it gets to the point that blindly applying rules is not always going to lead to better code. But it doesn't mean that an approach has no value.
Really the question should always come up when there are more than say two ways to do things. If I can make a pizza from scratch, reheat a chilled pizza, create a pizza and chill it, reheat a half dozen pizzas, or make three pizzas of the same kind and chill them suddenly the useful abstractions are probably something you can figure out between those helper methods.
Honestly that is the real fear of the left way of thinking. If you add a quantity, whether to cook and whether to chill parameters you end up with a hard API where certain combinations of parameters don't make sense.
Have a clean API and make the implementation as simple as is feasible. Reuse via functions when it makes sense but don't add them willy nilly.
Aka "it is a craft and you figure things out" as someone said in the comments here
What you're doing by breaking things into functions is trying to prevent it's eventual growth into a bug infested behemoth. In my experience, nearly every case where an area of a code base has become unmaintainable - it generally originates in a large, stateful piece of code that started in this fashion.
Every one who works in said area then usually has the option of either a) making it worse by adding another block to tweak it's behaviour, or b) start splitting it up and hope they don't break stuff.
I don't want to see the "how" every time I need to understand the "what". In fact, that is going to force me to parse extraneous detail, possibly for hundreds of lines, until I find the bit that actually needs to be changed.
That's where nested functions show their true utility. You get short linear logic because everything is in functions, but the functions are all local scope so you get to modify local scope with them, and because the functions are all named, it is easy to determine what is going on.
Oh my God.
You are wrong here.
> this is one of those “programming is a craft” things, where experience helps you determine what is right in a situation.
You are right here.
The key insight on why giant linear functions are often more readable (and desirable) is because they allow you to keep more concepts/relationships simultaneously together as a single chunk without context switching which seems to aid our comprehension. An extreme proponent is Arthur Whitney (inventor of the K language) who writes very terse (almost incomprehensible to others) code so as to accommodate as much as possible in a single screen.
Two examples from my own experience;
1) I found reading/understanding/debugging a very large Windows message handler function (i.e. a WndProc with a giant switch statement containing all the business logic) far easier than the same application rewritten in Visual C++ where the message handlers were broken out into separate functions.
2) The sample code for a microcontroller showed an ADC usage example in two different ways; One with everything in the same file and another where the code was distributed across files eg. main.c/config.c/interrupts.c/timer.c/etc. Even though the LOC was <200 i found the second example hard to understand simply because of the context switch involved.
The problem with giant linear functions is that those concepts get separated by sometimes thousands of lines. Separating out the high-level concepts vs the nitty-gritty details, putting the latter in functions that then get called to implement the high-level concepts, does in my experience in most cases a better job of keeping related things together.
> Linear code is more readable
^ Wrong
> Linear code is sometimes more readable
^ Better
But k has a small set of built-in commands and a built-in database; it was made for fast analysis of stock information, so with that you have everything you need and you use the same semantics. The only thing you need to know is the data structure and you can build whatever you need.
So in this way, it's very likely that, given two tables A + B and 'bunch of operations' X on A and 'bunch of operations Y' on B where Y depends on the result of X, and given the tasks to;
- create X' = X
- create XY' = X + Y
to implement XY without knowing X already exists rather than figure out X exists and reuse it.
The problem with not k (or programs written in similar style; it doesn't really matter what the programming language is), that we have learned to use the second style from the article, and, more extreme, to separate everything out in layers. You cannot even reach the data model without going through a layer (or more) of abstractions which makes it necessary not only to know the datamodel in detail but also find the matching findXinAandApplyWithYToB(). Where X & Y & A & B are often some kind of ambiguous and badly named entities. And then there is of course badly designed databases which is also quite the norm as far as we see, so there is a much lower data integrity which means that if you create something without checking all the code that touches it, that you might change something and the data becomes inconsistent.
I notice the same when working on systems built with stored procedures on MSSQL/Postgres; it is far quicker to oversee and (at least basically) understand the datamodel (even with 1000+ tables, which is rather normal for systems we work with) than it is to understand even a fraction of a, let's say Go, codebase. So when asked to do do a task XY', you are usually just not searching for X'; you are simply reading the data used in X & Y and whop up a procedure/query/whatever yourself. It's simply much faster as you have a restricted work surface; the model and sql (I know, you can use almost any language in postgres, but let's not here) and you can reason about them and the tasks at hand when you shut off internet and just use your sql workbench.
These pieces of code then often end up being private functions of a class. With state. Since they are private functions now, they are not really testable.
So now we got a lot of private functions that are only called once and typically modify side effect state. When these functions are grouped together with the caller, it is actually still a bit readable in simple cases.
But then after a while someone adds other functions in between the calling function and the factored out ones.
Now we have bits and pieces modifying different side effect state that no one knows if they are called from different places without getting a call graph or doing a search in the class file.
If you insist on making the code non-linear, I'd beg you to at least consider making these factored out private funcs inner funcs of the calling function if your language supports that. This makes it clear that these functions won't be called from anywhere else.
As with so many things in life, in a real codebase this is not an either/or, but an art of combining the two into something that stays readable and maintainable.
Will people bother testing all of them? Or will they write a single test, pass in a pizza and just glance at it actually working? My guess is the latter, as testing multiple branches from outside is often tedious, vs testing smaller specialized functions.
...that's basically why common sense and taste in programming is still required, it's not a purely mechanical task. That's also why I'm not entirely a fan of automatic code formatting tools, they don't understand the concept of nuance.
99% of the time, you haven't actually come up with a good abstraction, so just write some linear code. Prefer copy/pasting to dubious function semantics.
12 months later you have:
> no_print = False
love this
Is print_table() + print_table_without_emoji() better than print_table(remove_emoji= False)?
If we consider the concepts orthogonally meaning we don't consider the fact that readability can influence scalability then "everyone" is fully correct. Linear code doesn't scale as well as modular code. The dichotomy is worth knowing and worth considering depending on the situation.
That being said I STILL disagree with you. Small functions do not cause readability issues if those functions are PURE. Meaning they don't touch state. That and you don't inject logic into your code, so explicitly minimize all dependency injection and passing functions to other functions.
Form a pipeline of pure functions passing only data to other functions then it all becomes readable and scalable. You'll much more rarely hit an issue where you have to rewrite your logic because of a design flaw. More often then not by composing pure functions your code becomes like legos. Every refactoring becomes more like re-configuring and recomposing existing primitives.
Small functions increase the chances of you having to do this. Larger ones decrease it, but can cause other issues.
Also, many small functions doesn't make code modular. Having well defined, focused interfaces (I don't mean in the OO sense) for people to use makes it modular. Small functions don't necessarily harm it, but if you're not really good at organizing things they definitely can obscure it.
And I totally agree, this is how you write large code bases without making them unmaintainable.
Where to go “linear” vs “modular” is an important design choice, but it’s secondary to the design choice of where to embed state-altering features in your program tree.
I think people dislike modular code because they want to have all the “side-effects” visible in one function. Perhaps they’ve only worked in code bases where people have made poor choices in that regard.
But if you can guarantee and document things like purity, idempotency, etc, you can blissfully ignore implementation details most of the time (i.e. until performance becomes an issue), which is definitionally what allows a codebase to scale.
`prepare` is a horrible name for a function. I would expect a seasoned Gopher to call it something like `NewPizzaFromOrder`.
I don't see any reason for putting `addToppings` in its own function. If you have to have it, I personally would have made it a method on Pizza something like `func (p *Pizza) WithToppings(topping ...Topping) *Pizza { /* ... */ }`. Real pizza is mutable, so the method mutates the receiver.
Why is a new oven instantiated every time you want to bake a pizza? You should start with an oven you already have, then do `oven.Preheat()`, and then call call `oven.Bake(pizza)`. You can take this further by having `oven.Preheat()` return a newtype of Oven which exposes `.Bake()` so that you can't accidentally bake something without preheating the oven first. Maybe elsewhere `Baker` is an interface, and you have a `ToasterOven` implementation that does not require you to preheat before baking because it's just not as important.
Without changing the code, I'd also reorder the declarations to be more what you'd expect (so you don't have to jump up and down the page as you scan through functions that call each other).
IDK I have to leave now but there are just so, so many ways in which the code is already a deeply horrible example to even start picking apart the "which is more readable" debate.
Some code needs to be non-linear for reuse. Then execution is a graph. If you code does not exploit code reuse from a graph structure, do not bother introducing vertexes where a single edge suffices.
http://number-none.com/blow/blog/programming/2014/09/26/carm...
Because what is going to happen when the code in those examples run in production over a decade is that each segment is going to change. If you’re lucky the comments will be updated as that happens, but they more than likely won’t. The unit test will also get more and more clunky as changes happen because it’s big and unwieldy, and maybe someone is going to forget to alter the part of it that wasn’t obviously tied to a change. The code will probably also become a lot less readable as time goes by, not by intend or even incompetence but mostly due to time pressure or other human things. So yes, it’s more readable, and in the perfect world you probably wouldn’t need to separate your concerns, but we live in a very imperfect world and the smaller and less responsibility you give your functions the easier it’ll be to deal with that imperfection as time goes on.
I obviously prefer the former for readability, correctness, and testability etc. However, in most PL changing the type of an object involves creating a new object and has a runtime cost. For hot code path, it makes sense to mutate in place, but in that case it's better to keep it all in one linear function.
Discussion: https://news.ycombinator.com/item?id=12120752
The basic tension here is between locality [0], on the one hand, and the desire to clearly show the high-level "table of contents" view on the other. Locality is more important for readable code. As the article notes, the TOC view can be made clear enough with section comments.
There is another, even more important, reason to prefer the linear code: It is much easier to navigate a codebase writ large when the "chunks" (functions / classes / whatever your language mandates) roughly correspond to business use-cases. Otherwise your search space gets too big, and you have to "reconstruct" the whole from the pieces yourself. The code's structure should do that for you.
If a bunch of "stuff" is all related to one thing (signup, or purchase, or whatever), let it be one thing in the code. It will be much easier to find and change things. Only break it down into sub-functions when re-use requires it. Don't do it solely for the sake of organization.
[0] https://htmx.org/essays/locality-of-behaviour/
For me the biggest reason is state. The longer the function, the wider the scope of the local variables. Any code anywhere in the function can mutate any of the variables, and it's not immediately clear what the data flow is. More functions help scopes stay small, and data flow is more explicit.
A side benefit is that "more functions" helps keep indentation down.
At the same time, I don't like functions that are too small, otherwise it's hard to find out where any actual work gets done.
> Any code anywhere in the function can mutate any of the variables
Regardless of the language I'm using, I never mutate values. Counters in loops or some other hyper-local variables (for performance) might be the inconsequential exceptions to this rule.
> More functions help scopes stay small, and data flow is more explicit.
Just write your big function with local scope sections, if needed (another local exception to the rule above). Eg, in JS:
or even use IIFE to return the value and then you can use a const. "A function, you're cheating!" you might say, but my goal is not to avoid a particular language construct, but to maintain locality, and avoid unnecessary names and jumping around.> A side benefit is that "more functions" helps keep indentation down.
This is important and I maintain it.
See "Align the happy path to the left" (https://medium.com/@matryer/line-of-sight-in-code-186dd7cdea...)
It is also worth noting that solving this problem with function extraction can often be a merely aesthetic improvement. That is, you will still need to keep hold the surrounding context (if not the state) in your head when reading the function to understand the whole picture, and the extraction makes that harder.
Using early returns correctly, by contrast, can actually alleviate working memory issues, since you can dismiss everything above as "handling validation and errors". That is, even though technically, no matter what you do, you are spidering down the branches of control flow, and therefore in some very specific context, the code organization can affect how much attention you need to pay to that context.
> I don't like functions that are too small, otherwise it's hard to find out where any actual work gets done.
Precisely, just take this thinking to its logical conclusion. You can (mostly) have your cake and eat it too.
What about for testing? What about for reducing state you need to keep in mind? What about releasing resources? What about understanding the impact of a change? Etc.
Consider an end of day process with 10 non-reusable steps that must run in order and each step is 100 lines. Each step uses similar data to the step before it so variables are similar but not the same. You would really choose a 1000 line single function?
For "use-case" code like this with many steps, you are typically testing how things wire together, and so will either be injecting mocks to unit test, in which case it is not a problem, or wanting to integration or e2e test, in which case it is also not a problem.
If complex, purely logical computation is part of the larger function, and you can pull that part out into a pure function which can be easily unit tested without mocks, that is indeed a valid factoring which I support, and an exception to the general rule.
> What about for reducing state you need to keep in mind?
Typically not a problem because if the function corresponds to a business use-case, you and everybody else is already thinking about it as "one thing".
> What about releasing resources?
Not a problem I have ever once run into with backend programming in garbage collected languages. Obviously if you are in a different situation, YMMV.
> Consider an end of day process with 30 non-reusable steps that must run in order and each step is 100 lines.
I would use my judgement and might break it down. Again, I have never encountered such a situation in many years of programming.
You seem to be trying to find the (ime) rare exceptions as if those disprove the general rule. But in practice the "explode your holistic function unnecessarily into 10 parts" is a much more common error than taking "don't break it down" too far.
Long, bespoke linear sequences can be hairy, but the tools to deal with them are present in current production languages without atomizing the code into tiny functions. Occasionally you can find a useful pattern that does call for a new function, and do a "harvest" on the code and get its size down. But you have to be patient with it before you have a good sense of where a new parameterized function gets the right effect, and where inlining and flagging an existing one will do better.