There's a lot of bad advice being tossed around in this thread. If you are worried about having to jump through multiple files to understand what some code is doing, you should consider that your naming conventions are the problem, not the fact that code is hidden behind functional boundaries.
Coding at scale is about managing complexity. The best code is code you don't have to read because of well named functional boundaries. Without these functional boundaries, you have to understand how every line of a function works, and then mentally model the entire graph of interactions at once, because of the potential for interactions between lines within a functional boundary. The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows. The cognitive load to understand code grows as the number of possible interactions grow. Keeping methods short and hiding behavior behind well named functional boundaries is how you manage complexity in code.
The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects. If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.
This is the problem right here. I don't just read code I've written and I don't only read perfectly abstracted code. When I am stuck reading someone's code who loves the book and tries their best to follow those conventions I find it far more difficult - because I am usually reading their code to fully understand it myself (ie in a review) or to fix a bug I find it infuriating that I am jumping through dozens of files just so everything looks nice on a slide - names are great, I fully appreciate good naming but pretending that using a ton of extra files just to improve naming slightly isnt a hindrance is wild.
I will take the naming hit in return for locality. I'd like to be able to hold more than 5 lines of code in my head but leaping all over the filesystem just to see 3 line or 5 line classes that delegate to yet another class is too much.
Carmack once suggested that people in-line their functions more often, in part so they could “see clearly the full horror of what they have done” (paraphrased from memory) as code gets more complicated. Many helper functions can be replaced by comments and the code inlined. I tried this last year and it led to overall more readable code, imho.
The idea is that without proper boundaries, finding the line that needed to be changed may be a lot harder than clicking through files with an IDE. Smaller components also help with code reviews since it’s a lot easier to understand a line within the context of a component (or method name) without having to understand what the huge globs of code before it is doing. Also, like you said a lot of the times a developer has to read code they didn’t write so there are other factors to consider like how easy it is for someone from another team to make a change or whether a new employee could easily digest the code base.
I would extend this one level higher to say managing complexity is about managing risk. Risk is usually what we really care about.
From the article:
>any one person's opinions about another person's opinions about "clean code" are necessarily highly subjective.
At some point CS as a profession has to find the right balance of art and science. There's room for both. Codifying certain standards is the domain of professions (in the truest sense of the word) and not art.
Software often likens itself to traditional engineering disciplines. Those traditional engineering disciplines manage risk through codified standards built through industry consensus. Somebody may build a pressure system that doesn't conform to standards. They don't get to say "well your idea of 'good' is just an opinion so it's subjective". By "professional" standards they have built something outside the acceptable risk envelope and, if it's a regulated engineering domain, they can't use it.
This isn't to mean a coder would have to follow rigid rules constantly or that it needs a regulatory body, but that the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.
A lot of "best practices" in engineering were established empirically, after root cause analysis of failures and successes. Software is more or less evolving along the same path (structured programming, OOP, higher-than-assembly languages, version control, documented ISAs).
Go back to earlier machines and each version had it's own assembly language and instruction set. Nobody would ever go back to that era.
OOP was pitched as a one-size-fits-all solution to all problems, and as a checklist of items that would turn a cheap offshored programmer into a real software engineer thanks to design patterns and abstractions dictated by a "Software Architect". We all know it to be false, and bordering on snake oil, but it still had some good ideas. Having a class encapsulate complexity and defining interfaces is neat. It forces to think in terms of abstractions and helps readability.
> This isn't to mean a coder would have to follow rigid rules constantly or that it needs a regulatory body, but that the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.
As more and more years pass, I'm less and less against a regulatory body. Would help with getting rid of snake oil salesman in the industry and limit offshoring to barely qualified coders. And simplify hiring too by having a known certification that tells you someone at least meets a certain bar.
>the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.
The problem I see with this is that programming could be described as a kind of general problem solving. Other engineering disciplines standardize methods that are far more specific, e.g. how to tighten screws.
It's hard to come up with specific rules for general problems though. Algorithms are just solution descriptions in a language the computer and your colleagues can understand.
When we look at specific domains, e.g. finance and accounting software, we see industry standards have already emerged, like dealing with fixed point numbers instead of floating point to make calculation errors predictable.
If we now start codifying general software engineering, I'm worried we will just codify subjective opinions about general problem solving. And that will stop any kind of improvement.
Instead we have to accept that our discipline is different from the others, and more of a design or craft discipline.
Yes, coding at scale is about managing complexity. No, "Keeping methods short" is not a good way to manage complexity, because...
> then mentally model the entire graph of interactions at once
...partially applies even if you have well-named functional boundaries. You said it yourself:
> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows. The cognitive load to understand code grows as the number of possible interactions grow.
Programs have a certain essential complexity. Making a function "simpler" means making it less complex, which means that that complexity has to go somewhere else. If you make all of your functions simple, then you simply need more functions to represent the same program, which increases the total number of possible interactions between nodes and therefore the cognitive load of understanding the whole graph/program.
Allowing more complexity in your functions makes them individually harder to understand, but reduces the total number of functions needed and therefore makes the entire program more comprehensible.
Also note that just because a function's implementation is complex doesn't mean that its interface also has to be complex.
And, functions with complex implementations are only themselves difficult to understand - functions with complex interfaces make the whole system more difficult to understand.
This is where Occam's Razor applies - do not multiply entities unnecessarily.
Having hundreds or thousands of simple functions is the opposite of this advice.
You can also consider this in more scientific terms.
Code is a mental model of a set of operations. The best possible model has as few moving parts as possible, there are as few connections between the parts as possible, each part is as simple as possible, and both the parts and the connections between them are as intuitively obvious as possible.
Making parts as simple as possible is just one design goal, and not a very satisfactory or useful one in its own terms.
All of this turns out to be incredibly hard, and is a literal IQ test. Mediocre developers will always, always create overcomplicated solutions. Top developers have a magical ability to combine a 10,000 foot overview with ground level detail, and will tear through complex problems and reduce them to elegant simplicity.
IMO we should spend less time teaching algorithms and testing algorithmic specifics, and more on analysing complex systems and implementing them with minimal, elegant, intuitive models.
>If you make all of your functions simple, then you simply need more functions to represent the same program
The semantics of the language and the structure of the code help hide irrelevant functional units from the global namespace. Methods attached to an object only need to be considered when operating on some object, for example. Private methods do not pollute the global namespace nor do they need to be present in any mental model of the application unless it is relevant to the context.
While I do think you can go too far with adding functions for its own sake, I don't see that they add to the cognitive load in the same way that possible interactions within a functional unit does. If you're just polluting a global namespace with functions and tiny objects, then that does similarly increase cognitive load and should be avoided.
> No, "Keeping methods short" is not a good way to manage complexity
Agreed
> Allowing more complexity in your functions makes them individually harder to understand
I think that that can mostly be avoided, by sometime creating local scopes {..} to avoid too much state inside a function, combined with whitespace and some section "header" comments (instead of what would have been sub function names).
Can be quite readable I think. And nice to not have to jump back and forth between myriads of files and functions
I have found this to be one of those A or B developer personas that are hard for someone to change, and causes much disagreement. I personally agree 100%, but have known other people who couldn't disagree more, it is what it is.
I've always felt it had a strong correlation to top-down vs bottom-up thinkers in terms of software design. The top-down folks tend to agree with your stance and the bottom-up group do not. If you're naturally going to want to understand all of the nitty gritty details you want to be able to wrap your head around those as quickly as possible. If you're willing to think in terms of the abstractions you want to remove as many of those details from sight as possible to reduce visual noise.
I wish there was an "auto-flattener"/"auto-inliner" tool that would allow you to automagically turn code that was written top-down, with lots of nicely high-level abstractions, into an equivalent code with all the actions mushed together and with infrastructure layers peeled away as much as possible.
Have you ever seen a codebase with infrastructure and piping taking about 70% of the code, with tiny pieces of business logic thrown here and there? It's impossible to figure out where the actual job is being done (and what it actually is): all you can see is just an endless chain of methods that mostly just delegate the responsibility further and further. What could've been a 100-line loop of "foreach item in worklist, do A, B, C" kind is instead split over seven tightly cooperating classes that devote 45% of their code to multiplexing/load-balancing/messaging/job-spooling/etc, another 45% to building trivial auxiliary structure and instantiating each other, and only 10% actually devoted to the actual data processing, but good luck finding those 10%, because there is a never-ending chain of calling each other: A.do_work() calls B.process_item() which calls A.on_item_processing() which calls B.on_processed()... wait, shouldn't there been some work done between "on_item_processing" and "on_processed"? Yes, it was done by an inconspicuously named "prepare_next_worklist_item" function.
Ah, and the icing on the cake: looping is actually done from the very bottom of this call chain by doing a recursive call to the top-most method which at this point is about 20 layers above the current stack frame. Just so you can walk down this path again, now with the feeling.
While I think you are onto something about top-down vs. bottom-up thinkers, one of the issues with a large codebase is literally nobody can do the whole thing bottom-up. So you need some reasonable conventions and abstraction, or the whole thing falls apart under it's own weight.
I’m reminded of an earlier HN discussion about an article called The Wrong Abstraction, where I argued¹ that abstractions have both a benefit and a cost and that their ratio may change as a program evolves and which of those “nitty gritty details” are immediately relevant and which can helpfully be hidden behind abstractions changes.
The point is that bottom-up code is a siren song. It never scales. It makes it a lot easier to get started, but given enough complexity it inevitably breaks down.
Once your codebase gets to somewhere around the 10,000 line mark, it becomes impossible for a single mind to hold the entire program in their head at a single time. The only way to survive past that point is with carefully thought out, water tight layers of abstractions. That almost never happens with bottom-up. Bottom-up is a lot like natural selection. You get a lot of kludges that work great to solve their immediate problem, but behave in undefined and unpredictable ways when you extend them outside their original environment.
Bottom-up can work when you're inside well-encapsulated modular components with bounded scope and size. But there's no way to keep those modules loosely coupled unless you have a elegant top-down architecture imposing order at the large-scale structure.
> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.
That's only 1 part of the complexity equation.
When you have 100 lines in 1 function you know exactly the order in which each line will happen and under which conditions by just looking at it.
If you split it into 10 functions 10-lines-long each now you have 10! possible orderings of calling these functions (ignoring loops and branches). And since this ordering is separated into multiple places - you have to keep it in your mind. Good luck inventing naming that will make obvious which of the 3628800 possible orderings is happening without reading through them.
Short functions are good when they fit the problem. Often they don't.
I feel like this is only a problem if the small functions share a lot of global state. If each one acts upon its arguments and returns values without side effects, ordering is much less of an issue IMO.
>If you split it into 10 functions 10-lines-long each now you have 10! possible orderings of calling these functions (ignoring loops and branches). And since this ordering is separated into multiple places - you have to keep it in your mind. Good luck inventing naming that will make obvious which of the 3628800 possible orderings is happening without reading through them.
It's easy to make this argument in the abstract, but harder to demonstrate with a concrete example. Do you happen to have any 100 lines of code that you could provide that would show this as a challenge to compare to the refactored code?
You're missing likely missing one or more techniques that make this work well:
1. Depth first function ordering, so the execution order of the lines in the function is fairly similar to that of the expanded 100 lines. This makes top to bottom readability reasonable.
2. Explicit naming of the functions to make it clear what they do, not just part1(); part2() etc.
3. Similar levels of abstraction in each function (e.g. not having both a for loop, several if statements based on variables defined in the funtion, and 3 method calls, instead having 4-5 method calls doing the same thing).
4. Explicit pre/post conditions in each method are called out due to the passing in of parameters and the return values. This more effectively helps a reader understand the lifecycle of relevant variables etc.
In your example of 100 lines, the counterpoint is that now I have a method that has at least 100 ways it could work / fail. By breaking that up, I have the ability to reason about each use case / failure mode.
I am surprised that this is the top answer (Edit: at the moment, was)
How does splitting code into multiple functions suddenly change the order of the code?
I would expect that these functions would be still called in a very specific order.
And sometimes it does not even make sense to keep this order.
But here is a little example (in a made up pseudo code):
function positiveInt calcMeaningOfLife(positiveInt[] values)
positiveInt total = 0
positiveInt max = 0
for (positiveInti=0; i < values.length; i++)
total = total + values[i]
max = values[i] > max ? values[i] : max
return total - max
===>
function positiveInt max(positiveInt[] values)
positiveInt max = 0
for (positiveInt i=0; i < values.length; i++)
max = values[i] > max ? values[i] : max
return max
function positiveInt total(positiveInt[] values)
positiveInt total = 0
for (positiveInt i=0; i < values.length; i++)
total = total + values[i]
return total
function positiveInt calcMeaningOfLife(positiveInt[] values)
return total(values)-max(values)
There's certainly some difference in priorities between massive 1000-programmer projects where complexity must be aggressively managed and, say, a 3-person team making a simple web app. Different projects will have a different sweet spot in terms of structural complexity versus function complexity. I've seen code that, IMO, misses the sweet spot in either direction.
Sometimes there is too much code in mega-functions, poor separation of concerns and so on. These are easy mistakes to make, especially for beginners, so there are a lot of warnings against them.
Other times you have too many abstractions and too much indirection to serve any useful purpose. The ratio of named things, functional boundaries, and interface definitions to actual instructions can easily get out of hand when people dogmatically apply complexity-managing patterns to things that aren't very complex. Such over-abstraction can fall under YAGNI and waste time/$ as the code becomes slower to navigate, slower to understand in depth, and possibly slower to modify.
I think in software engineering we suffer more from the former problem than the latter problem, but the latter problem is often more frustrating because it's easier to argue for applying nifty patterns and levels of indirection than omitting them.
Just for a tangible example: If I have to iterate over a 3D data structure with an X Y and Z dimension, and use 3 nested loops to do so, is that too complex a function? I'd say no. It's at least as clear without introducing more functional boundaries, which is effort with no benefit.
Well named functions are only half (or maybe a quarter) of the battle. Function documentation is paramount in complex codebases, since documentation should describe various parameters in detail and outline any known issues, side-effects, or general points about calling the function. It's also a good idea to document when a parameter is passed to another function/method.
Yeah, it's a lot of work, but working on recent projects have really taught me the value of good documentation. Naming a function send_records_to_database is fine, but it can't tell you how it determines which database to send the records to, or how it deals with failed records (if at all), or various alternative use cases for the function. All of that must come from documentation (or reading the source of that function).
Plus, I've found that forcing myself to write function documentation, and justify my decisions, has resulted in me putting more consideration into design. When you have to say, "this function reads <some value> name from <environmental variable>" then you have to spend some time considering if future users will find that to be a sound decision.
> documentation should describe various parameters in detail and outline any known issues, side-effects, or general points about calling the function. It's also a good idea to document when a parameter is passed to another function/method.
I'd argue that writing that much documentation about a single function suggests that the function is a problem and the "send_records_to_database" example is a bad name. It's almost inevitable that the function doing so much and having so much behavior that needs documentation will, at some point, be changed and make the documentation subtly wrong, or at least incomplete.
Yikes, I hope I don't have to read documentation to understand how the code deals with failed records or other use cases. Good code would have the use cases separated from the send_records_to_database so it would be obvious what the records were and how failure conditions are handled.
"Plus, I've found that forcing myself to write function documentation, and justify my decisions, has resulted in me putting more consideration into design."
This, this, and... this.
Sometimes, I step back after writing documentation and realise, this is a bunch of baloney. It could be much simpler, or this is a terrible decision! My point: Writing documentation is about expressing the function a second time -- the first time was code, the second time was natural language. Yeah, it's not a perfect 1:1 (see: the law in any developed country!), but it is a good heuristic.
Documentation is only useful it is up to date and correct. I ignore documentation because I've never found the above are true.
There are contract/proof systems that seem like they might work help. At least the tool ensures it is correct. However I'm not sure if such systems are readable. (I've never used one in the real world)
> The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects.
Code telling a story is a fallacy that programmers keep telling themselves and which fails to die. Code doesn't tell stories, programmers do. Code can't explain why it exists; it can't tell you about the buggy API it relies on and which makes its implementation weird and not straight-forward; it can't say when it's no longer needed.
Good names are important, but it's false that having well-chosen function and arguments names will tell a programmer everything they need to know.
Code can't tell every relevant story, but it can tell a story about how it does what it does. Code is primarily written for other programmers. Writing code in such a way that other people with some familiarity with the problem space can understand easily should be the goal. But this means telling a story to the next reader, the story of how the inputs to some functional unit are translated into its outputs or changes in state. The best way to explain this to another human is almost never the best way to explain it to a computer. But since we have to communicate with other humans and to the computer from the same code, it takes some effort to bridge the two paradigms. Having the code tell a story at the high level by way of the modules, objects and methods being called is how we bridge this gap. But there are better and worse ways to do this.
Software development is a process of translating the natural language-spec of the system into a code-spec. But you can have the natural language-spec embedded in the structure of the code to a large degree. The more, the better.
Your argument falls apart once you need to actually debug one of these monstrosities, as often the bug itself also gets spread out over half a dozen classes and functions, and it's not obvious where to fix it.
More code, more bugs. More hidden code, more hidden bugs. There's a reason those who have worked in software development longer tend to prefer less abstraction: most of them are those who have learned from their experiences, and those who aren't are "architects" optimising for job security.
If a function is only called once it should just be inline, the IDE can collapse. A descriptive comment can replace the function name. It can be a lambda with immediate call and explicit captures if you need to prevent the issue of not knowing which local variables it interacts with as the function grows significantly, or if the concern is others using leftover variables its own can go into a plain scop e. Making you have to jump to a different area of code to read just breaks up linear flow for no gain, especially when you often have to read it anyway to make sure it doesn't have global side effects, might as well read it in the single place it is used.
If it is going to be used more than once and is, then make a function (unless it is so trivial the explicit inline is more readable). If you are designing a public API where it may need to be overridden count it as more than once.
> The best code is code you don't have to read because of well named functional boundaries.
I don't know which is harder. Explaining this about code, or about tests.
The people with no sense of DevX see nothing wrong with writing tests that fail as:
Expected undefined to be "foo"
If you make me read the tests to modify your code, I'm probably going to modify the tests. Once I modify the tests, you have no idea if the new tests still cover all of the same concerns (especially if you wrote tests like the above).
Make the test red before you make it green, so you know what the errors look like.
“There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.”
― C. A. R. Hoare
This quote does not scale. Software contains essential complexity because it was built to fulfill a need. You can make all of the beautiful, feature-impoverished designs you want - they won't make it to production, and I won't use them, because they don't do the thing.
If your software does not do the thing, then it's not useful, it's a piece of art - not an artifact of software engineering that is meant to fulfill a purpose.
But not everybody codes “at scale”. If you have a small, stable team, there is a lot less to worry about.
Secondly it is often better to start with less abstractions and boundaries, and add them when the need becomes apparent, rather than trying to remove ill conceived boundaries and abstractions that were added at earlier times.
Coding at scale is not dependent on the number of people, but on the essential complexity of the problem. One can fail at a one-man project due to lack of proper abstraction with a sufficiently complex problem. Like, try to write a compiler.
> The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects. If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.
That's fine in theory and I still sort-of believe that, but in practice, I came to believe most programming languages are insufficiently expressive for this vision to be true.
Take, as a random example, this bit of C++:
//...
const auto foo = Frobnicate(bar, Quuxify);
Ok, I know what Frobnification is. I know what Quuxify does, it's defined a few lines above. From that single line, I can guess it Frobs every member of bar via Quuxify. But is bar modified? Gotta check the signature of Frobnicate! That means either getting an IDE help popup, or finding the declaration.
template<typename Stuffs, typename Fn>
auto Frobnicate(const std::vector<Stuffs>&, Fn)
-> std::vector<Stuffs>;
From the signature, I can see that bar full of Bars isn't going to be modified. But then I think, is foo.size() going to be equal to bar.size()? What if bar is empty? Can Frobnicate throw an exception? Are there any special constraints on the function Fn passed to it? Does Fn have to be a funcallable thing? Can't tell that until I pop into definition of Frobnicate.
I'll omit the definition here. But now that I see it, I realize that Fn has to be a function of a very particular signature, that Fn is applied to every other element of the input vector (and not all of them, as I assumed), that the code has a bug and will crash if the input vector has less than 2 elements, and it calls three other functions that may or may not have their own restrictions on arguments, and may or may not throw an exception.
If I don't have a fully-configured IDE, I'll likely just ignore it and bear the risk. If I have, I'll routinely jump-to-definition into all these functions, quickly eye them for any potential issues... and, if I have the time, I'll put a comment on top of Frobnicate declaration, documenting everything I just learned - because holy hell, I don't want to waste my time doing the same thing next week. I would rename the function itself to include extra details, but then the name would be 100+ characters long...
Some languages are better at this than others, but my point is, until we have programming languages that can (and force you to) express the entire function contract in its signature and enforce this at compile-time, it's unsafe to assume a given function does what you think it does. Comments would be a decent workaround, if most programmers could be arsed to write them. As it is, you have to dig into the implementation of your dependencies, at least one level deep, if you want to avoid subtle bugs creeping in.
This is a good point and I agree. In fact, I think this really touches on why I always had a hard time understanding C++ code. I first learned to program with C/C++ so I have no problem writing C++, but understanding other people's code has always been much more difficult than other languages. Its facilities for abstraction were (historically) subpar, and even things like aliased variables where you have to jump to the function definition just to see if the parameter will be modified really get in the way of easy comprehension. And then the nested template definitions. You're right that how well relying on well named functional boundaries works depends on the language, and languages aren't at the point where it can be completely relied on.
This is true but having good function names will at least help you avoid going two levels deep. Or N levels. Having a vague understanding of a function call’s purpose from its name helps because you have to trim the search tree somewhere.
Though, if you’re in a nest of tiny forwarding functions, who knows how deep you’ll have to go?
> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.
Which is often unavoidable, many functions are insufficiently explained by those alone unless you want four-word camelcase monstrosities for names. The code of the function should be right-sized. Size and complexity need to be balanced there- simpler and easier-to-follow is sometimes larger. I work on compilers, query processors and compute engines, cognitive load from the subject domains are bad enough without making the code arbitrarily shaped.
[edit] oh yes, what jzoch says below. Locality helps with taming the network of complexity between functions and data.
I think we need to recognize the limits of this concept. To reach for an analogy, both Dr. Seuss and Tolstoy wrote well but I'd much rather inherit source code that reads like 10 pages of the former over 10 pages of the latter. You could be a genuine code-naming artist but at the end of the day all I want to do is render the damn HTML.
> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.
This isn't always true in my experience. Often when I need to dig into the details of a function it's because how it works is more important than what it says it's doing. There are implementation concerns you can't fit into a function name.
Additionally, I have found that function names become outdated at about the same rate as comments do. If the common criticism of code commenting is that "comments are code you don't run", function names also fall into that category.
I don't have a universal rule on this, I think that managing code complexity is highly application-dependent, and dependent on the size of the team looking at the code, and dependent on the age of the code, and dependent on how fast the code is being iterated on and rewritten. However, in many cases I've started to find that it makes sense to inline certain logic, because you get rid of the risk of names going out of date just like code comments, and you remove any ambiguity over what the code actually does. There are some other benefits as well, but they're beyond the scope of the current conversation.
Perfect abstractions are relatively rare, so in instances where abstractions are likely to be very leaky (which happens more often than people suspect), it is better to be extremely transparent about what the code is doing, rather than hiding it behind a function name.
> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.
I'll also push back against this line of thought. The sum total of possible interactions do not decrease when you move code out into a separate function. The same number of lines of code still get run, and each line carries the same potential to have a bug. In fact, in many cases, adding additional interfaces between components and generalizing them can increase the number of code paths and potential failure points.
If you define complexity by the sum total of possible interactions (which is itself a problematic definition, but I'll talk about that below), then complexity always increases when you factor out functions, because the interfaces, error-handling, and boilerplate code around those functions increases the number of possible interactions happening during your function call.
> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.
What I've come to understand is that complexity is relative. A solution that makes a codebase less complex for one person in an organization may make a codebase more complex for someone else in the organization who has different responsibilities over the codebase.
If you are building an application with a large team, and there are clear divisions of responsibilities, then functional boundaries are very helpful because they hide the messy details about how low-level parts of the code work.
However, if you are responsible for maintaining both the high-level and low-level parts of the same codebase, than separating that logic can sometimes make the program harder to manage, because you still have to understand how both parts of the codebase work, but now you also have understand how the interfaces and abstractions between them fit together and what their limitations are.
In single-person projects where I'm the only person touching the codebase I do still use abstractions, but I often opt to limit the number of abstractions, and I inline code more often than I would in a larger project. This is because if I'm the only person working on the code, I need to be able to hold almost the entire codebase in my head at the same time in order to make informed architecture decisions, and managing a large number of abstractions on top of their implementations makes the code harder to reason about and increases the number of things I need to remember. This was a hard-learned lesson for me, but has made (I think) an observable difference in the quality and stability of the code I write.
>> If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.
> This isn't always true in my experience. Often when I need to dig into the details of a function it's because how it works is more important than what it says it's doing. There are implementation concerns you can't fit into a function name.
Both of these things are not quite right. Yes, if you have to dig into the details of a function to understand what it does, it hasn't been explained well enough. No, the prototype cannot contain enough information to explain it. No, you shouldn't look at the implementation either - that leads to brittle code where you start to rely on the implementation behavior of a function that isn't part of the interface.
The interface and implementation of a function are separate. The former should be clearly-documented - a descriptive name is good, but you'll almost always also need docstrings/comments/other documentation - while you should rarely rely on details of the latter, because if you are, that usually means that the interface isn't defined clearly enough and/or the abstraction boundaries are in the wrong places (modulo things like looking under the hood to refactor, improve performance, etc - all abstractions are somewhat leaky, but you shouldn't be piercing them regularly).
> If you define complexity by the sum total of possible interactions (which is itself a problematic definition, but I'll talk about that below), then complexity always increases when you factor out functions, because the interfaces, error-handling, and boilerplate code around those functions increases the number of possible interactions happening during your function call.
This - this is what everyone who advocates for "small functions" doesn't understand.
Finally! I'm glad to hear I'm not the only one. I've gone against 'Clean Code' zealots that end up writing painfully warped abstractions in the effort to adhere to what is in this book. It's OK to duplicate code in places where the abstractions are far enough apart that the alternative is worse. I've had developers use the 'partial' feature in C# to meet Martin's length restrictions to the point where I have to look through 10-15 files to see the full class.
The examples in this post are excellent examples of the flaws in Martin's absolutism.
"How do you develop good software? First, be a good software developer. Then develop some software."
The problem with all these lists is that they require a sense of judgement that can only be learnt from experience, never from checklists. That's why Uncle Bob's advice is simultaneously so correct, and yet so dangerous with the wrong fingers on the keyboard.
I've also never agreed completely with Uncle Bob. I was an OOP zealot for maybe a decade, and I'm now I'm a Rust convert. The biggest "feature" of Rust is that is probably brought semi-functional concepts to the "OOP masses." I found that, with Rust, I spent far more time solving the problem at hand...
Instead of solving how I am going to solve the problem at hand ("Clean Coding"). What a fucking waste of time, my brain power, and my lifetime keystrokes[1].
I'm starting to see that OOP is more suited to programming literal business logic. The best use for the tool is when you actually have a "Person", "Customer" and "Employee" entities that have to follow some form of business rules.
In contradiction to your "Uncle Sassy's" rules, I'm starting to understand where "Uncle Beck" was coming from:
1. Make it work.
2. Make it right.
3. Make it fast.
The amount of understanding that you can garner from make something work leads very strongly into figuring out the best way to make it right. And you shouldn't be making anything fast, unless you have a profiler and other measurements telling you to do so.
"Clean Coding" just perpetuates all the broken promises of OOP.
These two fall under requirements gathering. It's so often forgotten that software has a specific purpose, a specific set of things it needs to do, and that it should be crafted with those in mind.
> 3. Understand not just where you are but where you are headed
And this is the part that breaks down so often. Because software is simultaneously so easy and so hard to change, people fall into traps both left and right, assuming some dimension of extensibility that never turns out to be important, or assuming something is totally constant when it is not.
I think the best advice here is that YAGNI, don't add functionality for extension unless your requirements gathering suggests you are going to need it. If you have experience building a thing, your spider senses will perk up. If you don't have experience building the thing, can you get some people on your team that do? Or at least ask them? If that is not possible, you want to prototype and fail fast. Be prepared to junk some code along the way.
If you start out not knowing any of these things, and also never junking any code along the way, what are the actual odds you got it right?
>Write code that takes the above 3 into account and make sensible decisions. When something feels wrong ... don't do it.
The problem is that people often need specifics to guide them when they're less experienced. Something that "feels wrong" is usually due to vast experience being incorporated into your subconscious aesthetic judgement. But you can't rely on your subconscious until you've had enough trials to hone your senses. Hard rules can and often are overapplied, but its usually better than the opposite case of someone without good judgement attempting to make unguided judgement calls.
My last company was very into Clean Code, to the point where all new hires were expected to do a book club on it.
My personal take away was that there were a few good ideas, all horribly mangled. The most painful one I remember was his treatment of the Law of Demeter, which, as I recall, was so shallow that he didn't even really even thoroughly explain what the law was trying to accomplish. (Long story short, bounded contexts don't mean much if you're allowed to ignore the boundaries.) So most everyone who read the book came to earnestly believe that the Law of Demeter is about period-to-semicolon ratios, and proceeded to convert something like
val frobnitz = Frobnitz.builder()
.withPixieDust()
.withMayonnaise()
.withTarget(worstTweetEver)
.build();
into
var frobnitzBuilder = Frobnitz.builder();
frobnitzBuilder = frobnitzBuilder.withPixieDust();
frobnitzBuilder = frobnitzBuilder.withMayonnaise();
frobnitzBuilder = frobnitzBuilder.withTarget(worstTweetEver);
val frobnitz = frobnitzBuilder.build();
and somehow convince themselves that doing this was producing tangible business value, and congratulate themselves for substantially improving the long-term maintainability of the code.
Meanwhile, violations of the actual Law of Demeter ran rampant. They just had more semicolons.
On that note, I've never seen an explanation of Law of Demeter that made any kind of sense to me. Both the descriptions I read and the actual uses I've seen boiled down to the type of transformation you just described, which is very much pointless.
> Long story short, bounded contexts don't mean much if you're allowed to ignore the boundaries.
I'd like to read more. Do you know of a source that covers this properly?
I love how this is clearly a contextual recommendation.
I'm not a software developer, but a data scientist. In pandas, to write your manipulations in this chained methods fashing is highly encouraged IMO. It's even called "pandorable" code
I think following some ideas in the book, but ignoring others like the ones applicable for the law of demeter can be a recipe for a mess. The book is very opinionated, but if followed well I think it can produce pretty dead simple code. But at the same time, just like with any coding, experience plays massively into how well code is written. Code can be written well when using his methods or when ignoring his methods and it can be written badly when trying to follow some of his methods or when not using his methods at all.
>his treatment of the Law of Demeter, which, as I recall, was so shallow that he didn't even really even thoroughly explain what the law was trying to accomplish.
oof. I mean, yeah, at least explain what the main thing you’re talking about is about, right? This is a pet peeve.
> It's OK to duplicate code in places where the abstractions are far enough apart that the alternative is worse.
I don't recall where I picked up from, but the best advice I've heard on this is a "Rule of 3". You don't have a "pattern" to abstract until you reach (at least) three duplicates. ("Two is a coincidence, three is pattern. Coincidences happen all the time.") I've found it can be a useful rule of thumb to prevent "premature abstraction" (an understandable relative of "premature optimization"). It is surprising sometimes the things you find out about the abstraction only happen when you reach that third duplicate (variables or control flow decisions, for instance, that seem constants in two places for instance; or a higher level idea of why the code is duplicated that isn't clear from two very far apart points but is clearer when you can "triangulate" what their center is).
I don't hate the rule of 3. But i think it's missing the point.
You want to extract common code if it's the same now, and will always be the same in the future. If it's not going to be the same and you extract it, you now have the pain of making it do two things, or splitting. But if it is going to be the same and you don't extract it, you have the risk of only updating one copy, and then having the other copy do the wrong thing.
For example, i have a program where one component gets data and writes it to files of a certain format in a certain directory, and another component reads those files and processes the data. The code for deciding where the directory is, and what the columns in the files are, must be the same, otherwise the programs cannot do their job. Even though there are only two uses of that code, it makes sense to extract them.
Once you think about it this way, you see that extraction also serves a documentation function. It says that the two call sites of the shared code are related to each other in some fundamental way.
Taking this approach, i might even extract code that is only used once! In my example, if the files contain dates, or other structured data, then it makes sense to have the matching formatting and parsing functions extracted and placed right next to each other, to highlight the fact that they are intimately related.
Also known as, “AbstractMagicObjectFactoryBuilderImpl” that builds exactly one (1) factory type that generates exactly (1) object type with no more than 2 options passed into the builder and 0 options passed into the factory. :-)
The Go proverb is "A little copying is better than a little dependency." Also don't deduplicate 'text' because it's the same, deduplicate implementations if they match in both mechanism 'what' it does as well as their semantic usage. Sometimes the same thing is done with different intents which can naturally diverge and the premature deduplication is debt.
I'm coming to think that the rule of three is important within a fairly constrained context, but that other principle is worthwhile when you're working across contexts.
For example, when I did work at a microservices shop, I was deeply dissatisfied with the way the shared utility library influenced our code. A lot of what was in there was fairly throw-away and would not have been difficult to copy/paste, even to four or more different locations. And the shared nature of the library meant that any change to it was quite expensive. Technically, maybe, but, more importantly, socially. Any change to some corner of the library needed to be negotiated with every other team that was using that part of the library. The risk of the discussion spiraling away into an interminable series of bikesheddy meetings was always hiding in the shadows. So, if it was possible to leave the library function unchanged and get what you needed with a hack, teams tended to choose the hack. The effects of this phenomenon accumulated, over time, to create quite a mess.
At a previous company, there was a Clean Code OOP zealot. I heard him discussing with another colleague about the need to split up a function because it was too long (it was 10 lines). I said, from the sidelines, "yes, because nothing enhances readability like splitting a 10 line function into 10, 1-line functions". He didn't realize I was being sarcastic and nodded in agreement that it would be much better that way.
There seems to be a lot of overlap between the Clean Coders and the Neo Coders [0]. I wish we could get rid of both.
[0] People who strive for "The One" architecture that will allow any change no matter what. Seriously, abstraction out the wazoo!
Honestly. If you're getting data from a bar code scanner and you think, "we should handle the case where we get data from a hot air balloon!" because ... what if?, you should retire.
The problem is that `partial` in C# should never even have been considered as a "solution" to write small, maintainable classes. AFAIK partial was introduced for code-behind files, not to structure human written code.
Anyways, you are not alone with that experience - a common mistake I see, no matter what language or framework, is that people fall for the fallacy "separation into files" is the same as "separation of concerns".
Seriously? That's an abuse of partial and just a way of following the rules without actually following them. That code must have been fun to navigate...
Many years ago I worked on a project that had a hard “no hard coded values” rule, as requested by the customer. The team routinely wrote the equivalent to
const char_a = “a”;
And I couldn’t get my manager to understand why this was a problem.
> where I have to look through 10-15 files to see the full class
The Magento 2 codebase is a good example of this. It's both well written and horrible at the same time. Everything is so spread out into constituent technical components, that the code loses the "narrative" of what's going on.
I started OOP in '96 and I was never able to wrap my head around the code these "Clean Code" zealots produced.
Case in point: Bob Martin's "Video Store" example.
My best guess is that clean code, to them, was as little code on the screen as possible, not necessarily "intention revealing code either", instead everything is abstracted until it looks like it does nothing.
I have had the experience of trying to understand how a feature in a C++ project worked (both Audacity and Aegisub I think) only to find that I actually could not find where anything was implemented, because everything was just a glue that called another piece of glue.
Also sat in their IRC channel for months and the lead developer was constantly discussing how he'd refactor it to be cleaner but never seemed to add code that did something.
SOLID code is a very misleading name for a technique that seems to shred the code into confetti.
I personally don't feel all that productive spending like half my time just navigating the code rather than actually reading it, but maybe it's just me.
What is mostly surprising I find most of developers are trying to obey the "rules". Code containing even minuscule duplication must be DRYied, everyone agrees that code must be clean and professional.
Yet it is never enough, bugs are showing up and stuff that was written by others is always bad.
I start thinking that 'Uncle Bob' and 'Clean code' zealots are actually harmful, because it prevents people from taking two steps back and thinking about what they are doing. Making microservices/components/classes/functions that end up never reused and making DRY holy grail.
Personally I am YAGNI > DRY and a lot of times you are not going to need small functions or magic abstractions.
I think the problem is not the book itself, but people thinking that all the rules apply to all the code, al the time. A length restriction is interesting because it makes you think if maybe you should spit your function into more than one, as you might be doing too much in one place. Now, if splitting will make things worse, then just don't.
In C# and .NET specifically, we find ourselves having a plethora of services when they are "human-readable" and short.
A service has 3 "helper" services it calls, which may, in turn have helper services, or worse, depend on a shared repo project.
The only solution I have found is to move these helpers into their own project, and mark the helpers as internal. This achieves 2 things:
1. The "sub-services" are not confused as stand-alone and only the "main/parent" service can be called.
2. The "module" can now be deployed independently if micro-services ever become a necessity.
I would like feedback on this approach. I do honestly thing files over 100 lines long are unreadable trash, and we have achieved a lot be re-using modular services.
We are 1.5 years into a project and our code re-use is sky-rocketing, which allows us to keep errors low.
Of course, a lot of dependencies also make testing difficult, but allow easier mocks if there are no globals.
>I would like feedback on this approach. I do honestly thing files over 100 lines long are unreadable trash
Dunno if this is the feedback you are after, but I would try to not be such an absolutist. There is no reason that a great 100 line long file becomes unreadable trash if you add one line.
What do you say to convince someone? It’s tricky to review a large carefully abstracted PR that introduces a bunch of new logic and config with something like: “just copy paste lol”
Yes, it's the first working implementation before good boundaries are not yet known. After a while it becomes familiar and natural conceptual boundaries arise that leads to 'factoring' and shouldn't require 'refactoring' because you prematurely guessed the wrong boundaries.
I'm all for the 100-200 line working version--can't say I've had a 500. I did once have a single SQL query that was about 2 full pages pushing the limits of DB2 (needed multiple PTFs just to execute it)--the size was largely from heuristic scope reductions. In the end, it did something in about 3 minutes that had no previous solution.
> WET (Write everything twice), figure out the abstraction after you need something a 3rd time
so much this. it is _much_ easier to refactor copy pasta code, than to entangle a mess of "clean code abstractions" for things that isn't even needed _once_. Premature Abstraction is the biggest problem in my eyes.
I think where DRY trips people up is when you have what I call "incidental repetition". Basically, two bits of logic seem to do exactly the same thing, but the contexts are slightly different. So you make a nice abstraction that works well until you need to modify the functionality in one context and not the other...
So long as it remains identical. Refactoring almost identical code requires lots of extremely detailed staring to determine whether or not two things are subtly different. Especially if you don't have good test coverage to start with.
There's a problem with being overly zealous. It's entirely possible to write bad code, either being overly dry or copy paste galore. I think we are prone to these zealous rules because they are concrete. We want an "objective" measure to judge whether something is good or not.
DRY and WET are terms often used as objective measures of implementations, but that doesn't mean that they are rock solid foundations. What does it mean for something to be "repeated"? Without claiming to have TheGreatAnswer™, some things come to mind.
Chaining methods can be very expressive, easy to follow and maintain. They also lead to a lot of repetition. In an effort to be "DRY", some might embark on a misguided effort to combine them. Maybe start replacing
`map(x => x).reduce(y, z => v)`
with
`mapReduce(x => x, (y,z) => v)`
This would be a bad idea, also known as Suck™.
But there may equally be situations where consolidation makes sense. For example, if we're in an ORM helper class and we're always querying the database for an object like so
I totally agree assuming that there will be time to get to the second pass of the "write everything twice" approach...some of my least favorite refactoring work has been on older code that was liberally copy-pasted by well-intentioned developers expecting a chance to come back through later but who never get the chance. All too often the winds of corporate decision making will change and send attention elsewhere at the wrong moment, and all those copy pasted bits will slowly but surely drift apart as unfamiliar new developers come through making small tweaks.
I worked on a small team with a very "code bro" culture. No toxic, but definitely making non-PC jokes. We would often say "Ask your doctor about Premature Abstractuation" or "Bad news, dr. says this code has abstractual dysfunction" in code reviews when someone would build an AbstractFactoryFactoryTemplateConstructor for a one-off item.
When we got absorbed by a larger team and were going to have to "merge" our code review / git history into a larger org's repos, we learned that a sister team had gotten in big trouble with the language cops in HR when they discovered similar language in their git commit history. This brings back memories of my team panicked over trying to rewrite a huge amount of git history and code review stuff to sanitize our language before we were caught too.
> it is _much_ easier to refactor copy pasta code,
Its easy to refactor if its nondivergent copypasta and you do it everywhere it is used not later than the third iteration.
If the refactoring gets delayed, the code diverges because different bugs are noticed and fixed (or thr same bug is noticed and fixed different ways) in different iterations, and there are dozens of instances across the code base (possibly in different projects because it was copypastad across projects rather than refactored into a reusable library), the code has in many cases gotten intermixed with code addressing other concerns...
I think this ties in to something I've been thinking, though it might be project specific.
Good code should be written to be easy to delete.
'Clever' abstractions work against this. We should be less precious about our code and realise it will probably need to change beyond all recognition multiple times. Code should do things simply so the consequences of deleting it are immediately obvious. I think your recommendations fit with this.
Aligns with my current meta-principle, which is that good code is malleable (easily modified, which includes deletion). A lot of design principles simply describe this principle from different angles. Readable code is easy to modify because you can understand it. Terse code is more easily modified because there’s less of it (unless you’ve sacrificed readability for terseness). SRP limits the scope of changes and thus enhances modifiability. Code with tests is easier to modify because you can refactor with less fear. Immutability makes code easier to modify because you don’t have to worry about state changes affecting disparate parts of the program.
Etc... etc...
(Not saying that this is the only quality of good code or that you won’t have to trade some of the above for performance or whatnot at times).
The unpleasant implication of this is that code has a tendency towards becoming worse over time. Because the code that is good enough to be easy to delete or change is, and the code that is too bad to be touched remains.
> * WET (Write everything twice), figure out the abstraction after you need something a 3rd time
There are two opposite situations. One is when several things are viewed as one thing while they're actually different (too much abstraction), and another, where a thing is viewed as different things, when it's actually a single one (when code is just copied over).
In my experience, the best way to solve this is to better analyse and understand the requirements. Do these two pieces of code look the same because they actually mean thing in the meaning of the product? Or they just happen to look the same at this particular moment in time, and can continue to develop in completely different directions as the product grows?
I read Clean Code in 2010 and trying out and applying some of the principles really helped to make my code more maintainable.
Now over 10 years later I have come to realize that you cannot set up too many rules on how to structure and write code. It is like forcing all authors to apply the same writing style or all artists to draw their paintings with the exact same technique.
With that analogy in mind, I think that one of the biggest contributors to messy code is having a lot of developers, all with different preferences, working in the same code base. Just imagine having 100 different writers trying to write a book, this is the challenge we are trying to address.
I'm not sure that's really true. Any publication with 100 different writers almost certainly has some kind of style guide that they all have to follow.
If it's really abstractable it shouldn't be difficult to refactor. It should literally be a substitution. If it's not, then you have varied cases that you'd have to go back and tinker with the abstraction to support.
It's a similar design and planning principle to building sidewalks. You have buildings but you don't know exactly the best paths between everything and how to correctly path things out. You can come up with your own design but people will end up ignoring them if they don't fit their needs. Ultimately, you put some obvious direct connection side walks and then wait to see the paths people take. You've now established where you need connections and how they need to be formed.
I do a lot of prototyping work and if I had to sit down and think out a clean abstraction everytime I wanted to get for a functional prototype, I'd never have a functional prototype--plus I'd waste a lot of cognitive capacity on an abstraction instead of solving the problem my code is addressing. It's best, from my experience, to save that time and write messy code but tuck in budget to refactor later (the key is you have to actually refactor later not just say you will).
Once you've built your prototype, iterated on it several times had people break designs forcing hacked out solutions, and now have something you don't touch often, you usually know what most the product/service needs to look like. You then abstract that out and get 80-90% of what you need if there's real demand.
The expanded features beyond that can be costly if they require significant redesign but at that point, you hopefully have a stable enough product it can warrant continued investment to refactor. If it doesn't, you saved yourself a lot of time and energy worrying trying to create a good abstract design that tends to fail multiple times at early stages. There's a balance point of knowing when to build technical debt, when to pay it off, and when to nullify it.
Again, the critical trick is you have to actually pay off the tech debt if that time comes. The product investor can't look bright eyed and linearly extrapolate progress so far thinking they saved a boatload of capital, they have to understand shortcuts were taken and the rationale was to fix them if serious money came along or chuck them in the bin if not.
WET is great until JSON token parsing breaks and a junior dev fixes it in one place and then I am fixing the same exact problem somewhere else and moving it into a shared file. If it's the exact same functionality, move it into a service/helper.
How do you deal with other colleagues that have all the energy and time to push for these practices and I feel makes things worse than the current state?
Explain that the wrong abstraction makes code more complicated than copy-paste and that before you can start factoring out common code you need to be sure the relationship is fundamental and not coincidental.
> figure out the abstraction after you need something a 3rd time
That's still too much of a "rule".
Whenever I feel (or know) two functions are similar, the factors that determine if I should merge them:
- I see significant benefit too doing so, usually the benefit of a utility that saves writing the same thing in the future, or debugging the same/similar code repeatedly.
- How likely the code is to diverge. Sometimes I just mark things for de-duping, but leave it around a while to see if one of the functions change.
- The function is big enough it cannot just be in-lined where it is called, and the benefit of de-duplication is not outweighed by added complexity to the call stack.
Documentation is rarely adequately maintained, and nothing enforces that it stay accurate and maintained.
Comments in code can lie (they're not functional); can be misplaced (in most languages, they're not attached to the code they document in any enforced way); are most-frequently used to describe things that wouldn't require documenting if they were just named properly; are often little more than noise. Code comments should be exceedingly-rare, and only used to describe exception situations or logic that can't be made more clear through the use of better identifiers or better-composed functions.
External documentation is usually out-of-sight, out-of-mind. Over time, it diverges from reality, to the point that it's usually misleading or wrong. It's not visible in the code (and this isn't an argument in favor of in-code comments). Maintaining it is a burden. There's no agreed-upon standard for how to present or navigate it.
The best way to document things is to name identifiers well, write functions that are well-composed and small enough to understand, stick to single-responsibility principles.
API documentation is important and valuable, especially when your IDE can provide it readily at the point of use. Whenever possible, it should be part of the source code in a formal way, using annotations or other mechanisms tied to the code it describes. I wish more languages would formally include annotation mechanisms for this specific use case.
One of the most difficult to argue comments in code reviews: “let’s make it generic in case we need it some place else”.
First of all, chances that we need it some place else aren’t exactly high, unless you are writing a library and code explicitly design to be shared. And even if such need arises, chances of getting it right generalizing from one example are slim.
Regarding the book though, I have participated in one of the workshops with the author and he seemed to be in favor of WER and against “architecting” levels of abstraction before having concrete examples.
You can disagree over what exactly is clean code. But you will learn to distinguish what dirty code is when you try to maintain it.
As a person that has had to maintain dirty code over the years, hearing someone saying dirty code doesn't exist is really frustrating. Noone wants to clean up your code, but doing it is better than allowing the code to become unmaintainable, that's why people bring up that book. If you do not care about what clean code is, stop making life difficult for people that do.
I think it's more that clean code doesn't exist because there's no objective measure of this (and those services that claim there are are just as dangerous as Clean Code, the book); anyone can come along and find something about the code that could be tidied up. And legacy is legacy, it's a different problem space to the one a greenfield project exists in.
> As a person that has to maintain dirty code
This is a strange credential to present and then use as a basis to be offended. Are you saying that you have dirty code and have to keep it dirty?
This is what I'm doing even while creating new code. There's a few instances for example where the "execution" is down to a single argument - one of "activate", "reactivate" and "deactivate". But I've made them into three distinct, separate code paths so that I can work error and feedback messages into everything without adding complexity via arguments.
I mean yes it's more verbose, BUT it's also super clear and obvious what things do, and they do not leak the underlying implementation.
I’ve never heard the term WET before but that’s exactly what I do.
The other key thing I think is not to over-engineer abstractions you don’t need yet. But to try and leave ‘seams’ where it’s obvious how to tease code about if you need to start building abstractions.
My experience interviewing recently a number of consultants with only a few years experience was the more they mumbled clean code the less they knew what they were doing.
That's not what WET means. The GP is saying you shouldn't isolate logic in a function until you've cut-and-pasted the logic in at least two places and plan to do so in a third.
> Put logic closest to where it needs to live (feature folders)
Can you say more about this?
I think I may have stumbled on a similar insight myself. In a side project (a roguelike game), I've been experimenting with a design that treats features as first-class, composable design units. Here is a list of the subfolder called game-features in the source tree:
actions
collision
control
death
destructibility
game-feature.lisp
hearing
kinematics
log
lore
package.lisp
rendering
sight
simulation
transform
An extract from the docstring of the entire game-feature package:
"A Game Feature is responsible for providing components, events,
event handlers, queries and other utilities implementing a given
aspect of the game. It's primarily a organization tool for gameplay code.
Each individual Game Feature is represented by a class inheriting
from `SAAT/GF:GAME-FEATURE'. To make use of a Game Feature,
an object of such class should be created, preferably in a
system description (see `SAAT/DI').
This way, all rules of the game are determined by a collection of
Game Features loaded in a given game.
Game Features may depend on other Game Features; this is represented
through dependencies of their classes."
The project is still very much work-in-progress (procrastinating on HN doesn't leave me much time to work on it), and most of the above features are nowhere near completion, but I found the design to be mostly sound. Each game feature provides code that implements its own concerns, and exports various functions and data structures for other game features to use. This is an inversion of traditional design, and is more similar to the ECS pattern, except I bucket all conceptually related things in one place. ECS Components and Systems, utility code, event definitions, etc. that implement a single conceptual game aspect live in the same folder. Inter-feature dependencies are made explicit, and game "superstructure" is designed to allow GFs to wire themselves into appropriate places in the event loop, datastore, etc. - so in game startup code, I just declare which features I want to have enabled.
(Each feature also gets its set of integration tests that use synthetic scenarios to verify a particular aspect of the game works as I want it to.)
One negative side effect of this design is that the execution order of handlers for any given event is hard to determine from code. That's because, to have game features easily compose, GFs can request particular ordering themselves (e.g. "death" can demand its event handler to be executed after "destructibility" but before "log") - so at startup, I get an ordering preference graph that I reconcile and linearize (via topological sorting). I work around this and related issues by adding debug utilities - e.g. some extra code that can, after game startup, generate a PlantUML/GraphViz picture of all events, event handlers, and their ordering.
(I apologize for a long comment, it's a bit of work I always wanted to talk about with someone, but never got around to. The source of the game isn't public right now because I'm afraid of airing my hot garbage code.)
I'd be interested in how you attempt this. Is it all in lisp?
It might be hard to integrate related things, e.g. physical simulation/kinematics <- related to collisions, and maybe sight/hearing <- related to rendering; Which is all great if information flows one way, as a tree, but maybe complicated if it's a graph with intercommunication.
I thought about this before, and figured maybe the design could be initially very loose (and inefficient), but then a constraint-solver could wire things up as needed, i.e. pre-calculate concerns/dependencies.
Another idea, since you mention "logs" as a GF: AOP - using " join points" to declaratively annotate code. This better handles code that is less of a "module" (appropriate for functions and libraries) and more of a cross-cutting "aspect" like logging. This can also get hairy though: could you treated "(bad-path) exception handling" as an aspect? what about "security"?
I've gone down roads similar to this. Long story short - the architecture solves for a lower priority class of problem, w/r to games, so it doesn't pay a great dividend, and you add a combination of boilerplate and dynamism that slows down development.
Your top issue in the runtime game loop is always with concurrency and synchronization logic - e.g. A spawns before B, if A's hitbox overlaps with B, is the first frame that a collision event occurs the frame of spawning or one frame after? That's the kind of issue that is hard to catch, occurs not often, and often has some kind of catastrophic impact if handled wrongly. But the actual effect of the event is usually a one-liner like "set a stun timer" - there is nothing to test with respect to the event itself! The perceived behavior is intimately coupled to when its processing occurs and when the effects are "felt" elsewhere in the loop - everything's tied to some kind of clock, whether it's the CPU clock, the rendered frame, turn-taking, or an abstracted timer. These kinds of bugs are a matter of bad specification, rather than bad implementation, so they resist automated testing mightily.
The most straightforward solution is, failing pure functions, to write more inline code(there is a John Carmack posting on inline code that I often use as a reference point). Enforce a static order of events as often as possible. Then debugging is always a matter of "does A happen before B?" It's there in the source code, and you don't need tooling to spot the issue.
The other part of this is, how do you load and initialize the scene? And that's a data problem that does call for more complex dependency management - but again, most games will aim to solve it statically in the build process of the game's assets, and reduce the amount of game state being serialized to save games, reducing the complexity surface of everything related to saves(versioning, corruption, etc). With a roguelike there is more of an impetus to build a lot of dynamic assets(dungeon maps, item placements etc.) which leads to a larger serialization footprint. But ultimately the focus of all of this is on getting the data to a place where you can bring it back up and run queries on it, and that's the kind of thing where you could theoretically use SQLite and have a very flexible runtime data model with a robust query system - but fully exploiting it wouldn't have the level of performance that's expected for a game.
Now, where can your system make sense? Where the game loop is actually dynamic in its function - i.e. modding APIs. But this tends to be a thing you approach gradually and grudgingly, because modders aren't any better at solving concurrency bugs and they are less incentivized to play nice with other mods, so they will always default to hacking in something that stomps the state, creating intermittent race conditions. So in practice you are likely to just have specific feature points where an API can exist(e.g. add a new "on hit" behavior that conditionally changes the one-liner), and those might impose some generalized concurrency logic.
The other thing that might help is to have a language that actually understands that you want to do this decoupling and has the tooling built in to do constraint logic programming and enforce the "musts" and "cannots" at source level. I don't know of a language that really addresses this well for the use case of game loops - it entails having a whole general-purpose language already and then also this other feature. Big project.
I've been taking the approach instead of aiming to develop "little languages" that compose well for certain kinds of features - e.g. instead of programming a finite state machine by hand for each type of NPC, devise a subcategory of state machines that I could describe as a one-liner, with chunks of fixed-function behavior and a bit of programmability. Instead of a universal graphics system, have various programmable painter systems that can manipulate cursors or selections to describe an image. The concurrency stays mostly static, but the little languages drive the dynamic behavior, and because they are small, they are easy to provide some tooling for.
I think one should always be careful not to throw out the baby with the bathwater[0].
Do I force myself to follow every single crazy rule in Clean Code? Heck no. Some of them I don't agree with. But do I find myself to be a better coder because of what I learned from Bob Martin? Heck yes. Most of the points he makes are insightful and I apply them daily in my job.
Being a professional means learning from many sources and knowing that there's something to learn from each of them- and some things to ignore from each of them. It means trying the things the book recommends, and judging the results yourself.
So I'm going to keep recommending Clean Code to new developers, in the hopes that they can learn the good bits, and learn to ignore the bad bits. Because so far, I haven't found a book with more good bits (from my perspective) and fewer bad bits (from my perspective).
I'm completely with you here. Until I read Clean Code, I could never really figure out why my personal projects were so unreadable a year later but the code I read at work was so much better even though it was 8 years old. Sure, I probably took things too far for a while and made my functions too small, or my classes too small, or was too nitpicky on code reviews. But I started to actually think about where I should break a function. I realized that a good name could eliminate almost all the comments I had been writing before, leaving only comments that were actually needed. And as I learned how to break down my code, I was forced to learn how to use my IDE to navigate around. All of a sudden new files weren't a big deal, and that opened up a whole new set of changes that I could start making.
I see a lot of people in here acting like all the advice in Clean Code is obviously true or obviously false, and they claim to know how to write a better book. But, like you, I will continue to recommend Clean Code to new developers on my team. It's the fastest way (that I've found so far, though I see other suggestions in the comments here) to get someone to transition from writing "homework code" (that never has to be revisited) to writing maintainable code. Obviously, there are bad parts of Clean Code, but if that new dev is on my team, I'll talk through why certain parts are less useful than others.
Perfect, Its definitly my personal impression, but while reading the post it looks like the author was looking for a "one size fits all" book and was dissapointed they did not find it.
And to be honest that book will never exist, every knowledge contributes to growing as a professional, just make sure to understand, discuss, and use it (or not) for a real reason, not just becaue its on book A or B.
Its not like people need to choose one book and follow it blindly for the rest of their lives, read more books :D
In my opinion the problem is not that the rules are not one size fits all, but that they are so misguided that Martin himself couldn't come up with a piece of code where they would lead to a good result.
One mistake I think people like the author make is treating these books as some sort of bible that you must follow to the letter. People who
evangelised TDD were the worst offenders of this. "You HAVE to do it like this, it's what the book says!"
You're not supposed to take it literally for every project, these are concepts that you need to adapt to your needs. In that sense I think the book still holds up.
For me this maps so clearly to the Dreyfus model of skill acquisition. Novices need strict rules to guide their behavior. Experts are able to use intuition they have developed. When something new comes along, everyone seems like a novice for a little while.
The Dreyfus model identifies 5 skill levels:
Novice
Wants to achieve a goal, and not particularly interested in learning.
Requires context free rules to follow.
When something unexpected happens will get stuck.
Advanced Beginner
Beginning to break away from fixed rules.
Can accomplish tasks on own, but still has difficulty troubleshooting.
Wants information fast.
Competent
Developed a conceptual model of task environment.
Able to troubleshoot.
Beginning to solve novel problems.
Seeks out and solve problems.
Shows initiative and resourcefulness.
May still have trouble determining which details to focus on when solving a problem.
Proficient
Needs the big picture.
Able to reflect on approach in order to perform better next time.
Learns from experience of others.
Applies maxims and patterns.
Expert
Primary source of knowledge and information in a field.
Constantly look for better ways of doing things.
Write books and articles and does the lecture circuit.
Work from intuition.
Knows the difference between irrelevant and important details.
> Primary source of knowledge and information in a field. Constantly look for better ways of doing things. Write books and articles and does the lecture circuit.
Meh. I'm probably being picky, but it doesn't surprise me that a Thought Leader would put themselves and what they do as Thought Leader in the Expert category. I see them more as running along a parallel track. They write books and run consulting companies and speak at conferences and create a brand, and then there are those of us who get good at writing code because we do it every day, year after year. Kind of exactly the difference between sports commentators and athletes.
The problem is that the book presents things that are at best 60/40 issues as hard rules, which leads novices++ follow them to the detriment of everything else.
Uncle Bob himself acts like it is a bible, so if you buy into the rest of his crap then you'll likely buy into that too.
If treated as guidelines you are correct Clean Code is only eh instead of garbage. But taken in the full context of how it is presented/intended to be taken by the author it is damaging to the industry.
I've read his blog and watched his videos. While his attitude comes off as evangelical, his actual advice is very often "Do it when it makes sense", "There are exceptions - use engineering judgment", etc.
Yup. I see the book as guide to a general goal, not a specific objective that can be defined. To actually reach that goal is sometimes completely impossible and in many other cases it introduces too much complexity.
However, in most cases heading towards that goal is a beneficial thing--you just have to recognize when you're getting too close and bogging down in complying with every detail.
I still consider it the best programming book I've ever read.
I understand that the people that follow Clean Code religiously are annoying, but the author seems to be doing the same thing in reverse: because some advice is nuanced or doesn't apply all the time then we should stop recommending the book and forget it altogether.
I agree with the sentiment that you don't want to over abstract, but Bob doesn't suggest that (as far as I know). He suggests extract till you drop, meaning simplify your functions down to doing one thing and one thing only and then compose them together.
Hands down, one of the best bits I learned from Bob was the "your code should read like well-written prose." That has enabled me to write some seriously easy to maintain code.
That strikes me as being too vague to be of practical use. I suspect the worst programmers can convince themselves their code is akin to poetry, as bad programmers are almost by definition unable to tell the difference. (Thinking back to when I was learning programming, I'm sure that was true of me.) To be valuable, advice needs to be specific.
If you see a pattern of a junior developer committing unacceptably poor quality code, I doubt it would be productive to tell them Try to make it read more like prose. Instead you'd give more concrete advice, such as choosing good variable names, or the SOLID principles, or judicious use of comments, or sensible indentation.
Perhaps I'm missing something though. In what way was the code should read like well-written prose advice helpful to you?
Specifically in relation to naming. I was caught up in the dogma of "things need to be short" (e.g., using silly variable names like getConf instead of getWebpackConfig). The difference is subtle, but that combined with reading my code aloud to see if it reads like a sentence ("prose") is helpful.
"This module is going to generate a password reset token. First, it's going to make sure we have an emailAddress as an input, then it's going to generate a random string which I'll refer to as token, and then I want to set that token on the user with this email address."
I'm in the "code should read like poetry" camp. Poetry is the act of conveying meaning that isn't completely semantic - meter and rhyme being the primary examples. In code, that can mean maintaining a cadence of variable names, use of whitespace that helps illuminate structure, or writing blocks or classes where the appearance of the code itself has some mapping to what it does. You can kludge a solution together, or craft a context in which the suchness of what you are trying to convey becomes clear in a narrative climax.
Code should be simple and tight and small. It should also, however, strive for an eighth grade reading level.
You shouldn't try to make your classes so small that you're abusing something like nested ternary operators which are difficult to read. You shouldn't try to break up your concepts so much that while the sentences are easy, the meaning of the whole class becomes muddled. You should stick with concepts everyone knows and not try to invent your own domain specific language in every class.
Less code is always more, right up until it becomes difficult to read, then you've gone too far. On the other hand if you extract a helper method from a method which read fine to begin with, then you've made the code harder to read, not easier, because its now bigger with an extra concept. But if that was a horrible conditional with four clauses which you can express with a "NeedsFrobbing" method and a comment about it, then carry on (generating four methods from that conditional to "simplify" it is usually worse, though, due to the introduction of four concepts that could be often better addressed with just some judicious whitespace to separate them).
And I need to learn how to write in English more like Hemmingway, particularly before I've digested coffee. That last paragraph got away from me a bit.
Absolutely this. Code should tell a story, the functions and objects you use are defined by the context of the story at that level of description. If you have to translate between low-level operations to reconstruct the high level behavior of some unit of work, you are missing some opportunities for useful abstractions.
Coding at scale is about managing complexity. The best code is code you don't have to read because of well named functional boundaries. Natural language is our facility for managing complexity generally. It shouldn't be surprising that the two are mutually beneficial.
I tried to write code with small functions and was dissuaded from doing that at both my teams over the past few years. The reason is that it can be hard to follow the logic if it's spread out among several functions. Jumping back and forth breaks your flow of thought.
I think the best compromise is small summary comments at various points of functions that "hold the entire thought".
The point of isolating abstractions is that you don't have to jump and back and forth. You look at a function, and you understand from its contract and calling convention you immediately know what it does. The specific details aren't relevant for the layer of abstraction you're looking at.
Because of well structured abstractions, thoughtful naming conventions, documentation where required, and extensive testing you trust that the function does what it says. If I'm looking at a function like commitPending(), I simply see writeToDisk() and move on. I'm in the object representation layer, and jumping down into the details of the I/O layers breaks flow by moving to a different level of abstraction. The point is I trust writeToDisk() behaves reasonably and safely, and I don't need to inspect its contents, and definitely don't want to inline its code.
If you find that you frequently need to jump down the tree from sub-routine to sub-routine to understand the high level code, then that's a definite code smell. Most likely something is fundamentally broken in your abstraction model.
Check out the try/catch and logging pattern I use in the linked post. I added that specifically so I could identify where errors were ocurring without having to guess.
When I get the error in the console/browser, the path to the error is included for me like "[generatePasswordResetToken.setTokenOnUser] Must pass value to $set to perform an update."
With that, I know exactly where the error is ocurring and can jump straight into debugging it.
Nice! However, none of this is required for this endpoint. Here's why:
1. The connect action could be replaced by doing the connection once on app startup.
2. The validation could be replaced with middleware like express-joi.
3. The stripe/email steps should be asynchronous (ex: simple crons). This way, you create the user and that's it. If Stripe is down, or the email provider is down, you still create the user. If the server restarts while someone calls the endpoint, you don't end up with a user with invalid Stripe config. You just create a user with stripeSetup=false and welcomeEmailSent=false and have some crons that every 5 seconds query for these users and do their work. Also, ensure you query for false and not "not equal to true" here as it's not efficient.
Off topic but is connecting to Mongo on every API hit best practice? I abstract my connection to a module and keep that open for the life of the application.
Yes, that one did a lot to me too. Especially when business logic gets complicated, I want to be able to skip parts by roughly reading meaning of the section without seeing details.
One long stream of commands is ok to read, if you are author or already know what it should do. But otherwise it forces you to read too many irrelevant details on a way toward what you need.
Robert Martin and his aura always struck me as odd. In part because of how revered he always was at organizations I worked. Senior developers would use his work to end arguments, and many code reviews discussions would be judged by how closely they adhere to Clean Code.
Of course reading Clean Code left me more confused than enlightened due precisely to what he presents as good examples of Code. The author of the article really does hit the nail on the head about Martin's code style - it's borderline unreadable a lot of times.
Who the f. even is Robert Martin?! What has he built? As far as I am able to see he is famous and revered because he is famous and revered.
He ran a consultancy and knew how to pump out books into a world of programmers that wanted books
I was around in the early 90s through to the early 2000s when a lot of the ideas came about slowly got morphed by consultants who were selling this stuff to companies as essentially "religion". The nuanced thoughts of a lot of the people who had most of the original core ideas is mostly lost.
It's a tricky situation, at the core of things, there are some really good ideas, but the messaging by people like "uncle bob" seem to fail to communicate the mindset in a way that develops thinking programmers. Mainly because him, and people like Ron Jerfferies, really didn't actually build anything serious once they became consultants and started giving out all these edicts. If you watched them on forums/blogs at the time, they were really not that good. There were lots of people who were building real things and had great perspectives, but their nuanced perspectives were never really captured into books, and it would be hard to as it is more about the mentality of using ideas and principles and making good pragmatic choices and adapting things and not being limited by "rules" but about incorporating the essence of the ideas into your thinking processes.
So many of those people walked away from a lot of those communities when it morphed into "Agile" and started being dominated by the consultants.
10 or so years ago when I first got into development I looked to people like Martin's for how I should write code.
But I had more and more difficulty reconciling bizarrely optimistic patterns with reality. This from the article perfectly sums it up:
>Martin says that functions should not be large enough to hold nested control structures (conditionals and loops); equivalently, they should not be indented to more than two levels.
Back then as now I could not understand how one person can make such confident and unambiguous statements about business logic across the spectrum of use cases and applications.
It's one thing to say how something should be written in ideal circumstances, it's another to essentially say code is spaghetti garbage because it doesn't precisely align to a very specific dogma.
This is the point that I have the most trouble understanding in critiques of Fowler, Bob, and all writers who write about coding: in my reading, I had always assumed that they were writing about the perfect-world ideal that needs to be balanced with real-world situations. There's a certain level of bluster and over-confidence required in that type of technical writing that I understood to be a necessary evil in order to get points across. After all, a book full of qualifications will fail to inspire confidence in its own advice.
This is true only for people first coming to development. If you're just starting your journey, you are likely looking for quantifiable absolutes as to what is good and what isn't.
After you're a bit more seasoned, I think qualified comments are probably far more welcome than absolutes.
> After all, a book full of qualifications will fail to inspire confidence in its own advice.
I don't think that's true at all. One of the old 'erlang bibles' is "learn you some erlang" and it full of qualifications titled "don't drink the kool-aid" (notably not there in the haskell inspiration for the book). It does not fail to inspire confidence to have qualifications scattered throughout and to me it actually gives me MORE confidence that the content is applicable and the tradeoffs are worth it.
Now, some people will claim that having 8-character indentations makes
the code move too far to the right, and makes it hard to read on a
80-character terminal screen. The answer to that is that if you need
more than 3 levels of indentation, you're screwed anyway, and should fix
your program.
Coding at scale is about managing complexity. The best code is code you don't have to read because of well named functional boundaries. Without these functional boundaries, you have to understand how every line of a function works, and then mentally model the entire graph of interactions at once, because of the potential for interactions between lines within a functional boundary. The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows. The cognitive load to understand code grows as the number of possible interactions grow. Keeping methods short and hiding behavior behind well named functional boundaries is how you manage complexity in code.
The idea of code telling a story is that a unit of work should explain what it does through its use of well named variables, function/object names, and how data flows between function/objects. If you have to dig into the details of a function to understand what it does, you have failed to sufficiently explain what the function does through its naming and set of arguments.
This is the problem right here. I don't just read code I've written and I don't only read perfectly abstracted code. When I am stuck reading someone's code who loves the book and tries their best to follow those conventions I find it far more difficult - because I am usually reading their code to fully understand it myself (ie in a review) or to fix a bug I find it infuriating that I am jumping through dozens of files just so everything looks nice on a slide - names are great, I fully appreciate good naming but pretending that using a ton of extra files just to improve naming slightly isnt a hindrance is wild.
I will take the naming hit in return for locality. I'd like to be able to hold more than 5 lines of code in my head but leaping all over the filesystem just to see 3 line or 5 line classes that delegate to yet another class is too much.
I would extend this one level higher to say managing complexity is about managing risk. Risk is usually what we really care about.
From the article:
>any one person's opinions about another person's opinions about "clean code" are necessarily highly subjective.
At some point CS as a profession has to find the right balance of art and science. There's room for both. Codifying certain standards is the domain of professions (in the truest sense of the word) and not art.
Software often likens itself to traditional engineering disciplines. Those traditional engineering disciplines manage risk through codified standards built through industry consensus. Somebody may build a pressure system that doesn't conform to standards. They don't get to say "well your idea of 'good' is just an opinion so it's subjective". By "professional" standards they have built something outside the acceptable risk envelope and, if it's a regulated engineering domain, they can't use it.
This isn't to mean a coder would have to follow rigid rules constantly or that it needs a regulatory body, but that the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.
Go back to earlier machines and each version had it's own assembly language and instruction set. Nobody would ever go back to that era.
OOP was pitched as a one-size-fits-all solution to all problems, and as a checklist of items that would turn a cheap offshored programmer into a real software engineer thanks to design patterns and abstractions dictated by a "Software Architect". We all know it to be false, and bordering on snake oil, but it still had some good ideas. Having a class encapsulate complexity and defining interfaces is neat. It forces to think in terms of abstractions and helps readability.
> This isn't to mean a coder would have to follow rigid rules constantly or that it needs a regulatory body, but that the practice of deviating from standardized best-practices should be communicated in terms of the risk rather than claiming it's just subjective.
As more and more years pass, I'm less and less against a regulatory body. Would help with getting rid of snake oil salesman in the industry and limit offshoring to barely qualified coders. And simplify hiring too by having a known certification that tells you someone at least meets a certain bar.
The problem I see with this is that programming could be described as a kind of general problem solving. Other engineering disciplines standardize methods that are far more specific, e.g. how to tighten screws.
It's hard to come up with specific rules for general problems though. Algorithms are just solution descriptions in a language the computer and your colleagues can understand.
When we look at specific domains, e.g. finance and accounting software, we see industry standards have already emerged, like dealing with fixed point numbers instead of floating point to make calculation errors predictable.
If we now start codifying general software engineering, I'm worried we will just codify subjective opinions about general problem solving. And that will stop any kind of improvement.
Instead we have to accept that our discipline is different from the others, and more of a design or craft discipline.
That seems like such a hard problem. Why not tackle a simpler one?
> then mentally model the entire graph of interactions at once
...partially applies even if you have well-named functional boundaries. You said it yourself:
> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows. The cognitive load to understand code grows as the number of possible interactions grow.
Programs have a certain essential complexity. Making a function "simpler" means making it less complex, which means that that complexity has to go somewhere else. If you make all of your functions simple, then you simply need more functions to represent the same program, which increases the total number of possible interactions between nodes and therefore the cognitive load of understanding the whole graph/program.
Allowing more complexity in your functions makes them individually harder to understand, but reduces the total number of functions needed and therefore makes the entire program more comprehensible.
Also note that just because a function's implementation is complex doesn't mean that its interface also has to be complex.
And, functions with complex implementations are only themselves difficult to understand - functions with complex interfaces make the whole system more difficult to understand.
Having hundreds or thousands of simple functions is the opposite of this advice.
You can also consider this in more scientific terms.
Code is a mental model of a set of operations. The best possible model has as few moving parts as possible, there are as few connections between the parts as possible, each part is as simple as possible, and both the parts and the connections between them are as intuitively obvious as possible.
Making parts as simple as possible is just one design goal, and not a very satisfactory or useful one in its own terms.
All of this turns out to be incredibly hard, and is a literal IQ test. Mediocre developers will always, always create overcomplicated solutions. Top developers have a magical ability to combine a 10,000 foot overview with ground level detail, and will tear through complex problems and reduce them to elegant simplicity.
IMO we should spend less time teaching algorithms and testing algorithmic specifics, and more on analysing complex systems and implementing them with minimal, elegant, intuitive models.
The semantics of the language and the structure of the code help hide irrelevant functional units from the global namespace. Methods attached to an object only need to be considered when operating on some object, for example. Private methods do not pollute the global namespace nor do they need to be present in any mental model of the application unless it is relevant to the context.
While I do think you can go too far with adding functions for its own sake, I don't see that they add to the cognitive load in the same way that possible interactions within a functional unit does. If you're just polluting a global namespace with functions and tiny objects, then that does similarly increase cognitive load and should be avoided.
Agreed
> Allowing more complexity in your functions makes them individually harder to understand
I think that that can mostly be avoided, by sometime creating local scopes {..} to avoid too much state inside a function, combined with whitespace and some section "header" comments (instead of what would have been sub function names).
Can be quite readable I think. And nice to not have to jump back and forth between myriads of files and functions
I've always felt it had a strong correlation to top-down vs bottom-up thinkers in terms of software design. The top-down folks tend to agree with your stance and the bottom-up group do not. If you're naturally going to want to understand all of the nitty gritty details you want to be able to wrap your head around those as quickly as possible. If you're willing to think in terms of the abstractions you want to remove as many of those details from sight as possible to reduce visual noise.
Have you ever seen a codebase with infrastructure and piping taking about 70% of the code, with tiny pieces of business logic thrown here and there? It's impossible to figure out where the actual job is being done (and what it actually is): all you can see is just an endless chain of methods that mostly just delegate the responsibility further and further. What could've been a 100-line loop of "foreach item in worklist, do A, B, C" kind is instead split over seven tightly cooperating classes that devote 45% of their code to multiplexing/load-balancing/messaging/job-spooling/etc, another 45% to building trivial auxiliary structure and instantiating each other, and only 10% actually devoted to the actual data processing, but good luck finding those 10%, because there is a never-ending chain of calling each other: A.do_work() calls B.process_item() which calls A.on_item_processing() which calls B.on_processed()... wait, shouldn't there been some work done between "on_item_processing" and "on_processed"? Yes, it was done by an inconspicuously named "prepare_next_worklist_item" function.
Ah, and the icing on the cake: looping is actually done from the very bottom of this call chain by doing a recursive call to the top-most method which at this point is about 20 layers above the current stack frame. Just so you can walk down this path again, now with the feeling.
¹ https://news.ycombinator.com/item?id=23742118
Once your codebase gets to somewhere around the 10,000 line mark, it becomes impossible for a single mind to hold the entire program in their head at a single time. The only way to survive past that point is with carefully thought out, water tight layers of abstractions. That almost never happens with bottom-up. Bottom-up is a lot like natural selection. You get a lot of kludges that work great to solve their immediate problem, but behave in undefined and unpredictable ways when you extend them outside their original environment.
Bottom-up can work when you're inside well-encapsulated modular components with bounded scope and size. But there's no way to keep those modules loosely coupled unless you have a elegant top-down architecture imposing order at the large-scale structure.
That's only 1 part of the complexity equation.
When you have 100 lines in 1 function you know exactly the order in which each line will happen and under which conditions by just looking at it.
If you split it into 10 functions 10-lines-long each now you have 10! possible orderings of calling these functions (ignoring loops and branches). And since this ordering is separated into multiple places - you have to keep it in your mind. Good luck inventing naming that will make obvious which of the 3628800 possible orderings is happening without reading through them.
Short functions are good when they fit the problem. Often they don't.
It's easy to make this argument in the abstract, but harder to demonstrate with a concrete example. Do you happen to have any 100 lines of code that you could provide that would show this as a challenge to compare to the refactored code?
You're missing likely missing one or more techniques that make this work well:
1. Depth first function ordering, so the execution order of the lines in the function is fairly similar to that of the expanded 100 lines. This makes top to bottom readability reasonable.
2. Explicit naming of the functions to make it clear what they do, not just part1(); part2() etc.
3. Similar levels of abstraction in each function (e.g. not having both a for loop, several if statements based on variables defined in the funtion, and 3 method calls, instead having 4-5 method calls doing the same thing).
4. Explicit pre/post conditions in each method are called out due to the passing in of parameters and the return values. This more effectively helps a reader understand the lifecycle of relevant variables etc.
In your example of 100 lines, the counterpoint is that now I have a method that has at least 100 ways it could work / fail. By breaking that up, I have the ability to reason about each use case / failure mode.
How does splitting code into multiple functions suddenly change the order of the code?
I would expect that these functions would be still called in a very specific order.
And sometimes it does not even make sense to keep this order.
But here is a little example (in a made up pseudo code):
===> Better? No?Sometimes there is too much code in mega-functions, poor separation of concerns and so on. These are easy mistakes to make, especially for beginners, so there are a lot of warnings against them.
Other times you have too many abstractions and too much indirection to serve any useful purpose. The ratio of named things, functional boundaries, and interface definitions to actual instructions can easily get out of hand when people dogmatically apply complexity-managing patterns to things that aren't very complex. Such over-abstraction can fall under YAGNI and waste time/$ as the code becomes slower to navigate, slower to understand in depth, and possibly slower to modify.
I think in software engineering we suffer more from the former problem than the latter problem, but the latter problem is often more frustrating because it's easier to argue for applying nifty patterns and levels of indirection than omitting them.
Just for a tangible example: If I have to iterate over a 3D data structure with an X Y and Z dimension, and use 3 nested loops to do so, is that too complex a function? I'd say no. It's at least as clear without introducing more functional boundaries, which is effort with no benefit.
Yeah, it's a lot of work, but working on recent projects have really taught me the value of good documentation. Naming a function send_records_to_database is fine, but it can't tell you how it determines which database to send the records to, or how it deals with failed records (if at all), or various alternative use cases for the function. All of that must come from documentation (or reading the source of that function).
Plus, I've found that forcing myself to write function documentation, and justify my decisions, has resulted in me putting more consideration into design. When you have to say, "this function reads <some value> name from <environmental variable>" then you have to spend some time considering if future users will find that to be a sound decision.
I'd argue that writing that much documentation about a single function suggests that the function is a problem and the "send_records_to_database" example is a bad name. It's almost inevitable that the function doing so much and having so much behavior that needs documentation will, at some point, be changed and make the documentation subtly wrong, or at least incomplete.
This, this, and... this.
Sometimes, I step back after writing documentation and realise, this is a bunch of baloney. It could be much simpler, or this is a terrible decision! My point: Writing documentation is about expressing the function a second time -- the first time was code, the second time was natural language. Yeah, it's not a perfect 1:1 (see: the law in any developed country!), but it is a good heuristic.
There are contract/proof systems that seem like they might work help. At least the tool ensures it is correct. However I'm not sure if such systems are readable. (I've never used one in the real world)
Code telling a story is a fallacy that programmers keep telling themselves and which fails to die. Code doesn't tell stories, programmers do. Code can't explain why it exists; it can't tell you about the buggy API it relies on and which makes its implementation weird and not straight-forward; it can't say when it's no longer needed.
Good names are important, but it's false that having well-chosen function and arguments names will tell a programmer everything they need to know.
Code can't tell every relevant story, but it can tell a story about how it does what it does. Code is primarily written for other programmers. Writing code in such a way that other people with some familiarity with the problem space can understand easily should be the goal. But this means telling a story to the next reader, the story of how the inputs to some functional unit are translated into its outputs or changes in state. The best way to explain this to another human is almost never the best way to explain it to a computer. But since we have to communicate with other humans and to the computer from the same code, it takes some effort to bridge the two paradigms. Having the code tell a story at the high level by way of the modules, objects and methods being called is how we bridge this gap. But there are better and worse ways to do this.
Software development is a process of translating the natural language-spec of the system into a code-spec. But you can have the natural language-spec embedded in the structure of the code to a large degree. The more, the better.
It is like saying the books do not tell stories, writers do.
More code, more bugs. More hidden code, more hidden bugs. There's a reason those who have worked in software development longer tend to prefer less abstraction: most of them are those who have learned from their experiences, and those who aren't are "architects" optimising for job security.
If it is going to be used more than once and is, then make a function (unless it is so trivial the explicit inline is more readable). If you are designing a public API where it may need to be overridden count it as more than once.
Some of the above is language dependent.
I don't know which is harder. Explaining this about code, or about tests.
The people with no sense of DevX see nothing wrong with writing tests that fail as:
If you make me read the tests to modify your code, I'm probably going to modify the tests. Once I modify the tests, you have no idea if the new tests still cover all of the same concerns (especially if you wrote tests like the above).Make the test red before you make it green, so you know what the errors look like.
Like good god, extract that boiler plate into a function. Use comments and white space to break it up and explain the workflow.
Oh! I like this. I never considered this particular reason why making tests fail first might be a good idea.
this quote scales
If your software does not do the thing, then it's not useful, it's a piece of art - not an artifact of software engineering that is meant to fulfill a purpose.
Secondly it is often better to start with less abstractions and boundaries, and add them when the need becomes apparent, rather than trying to remove ill conceived boundaries and abstractions that were added at earlier times.
That's fine in theory and I still sort-of believe that, but in practice, I came to believe most programming languages are insufficiently expressive for this vision to be true.
Take, as a random example, this bit of C++:
Ok, I know what Frobnification is. I know what Quuxify does, it's defined a few lines above. From that single line, I can guess it Frobs every member of bar via Quuxify. But is bar modified? Gotta check the signature of Frobnicate! That means either getting an IDE help popup, or finding the declaration. From the signature, I can see that bar full of Bars isn't going to be modified. But then I think, is foo.size() going to be equal to bar.size()? What if bar is empty? Can Frobnicate throw an exception? Are there any special constraints on the function Fn passed to it? Does Fn have to be a funcallable thing? Can't tell that until I pop into definition of Frobnicate.I'll omit the definition here. But now that I see it, I realize that Fn has to be a function of a very particular signature, that Fn is applied to every other element of the input vector (and not all of them, as I assumed), that the code has a bug and will crash if the input vector has less than 2 elements, and it calls three other functions that may or may not have their own restrictions on arguments, and may or may not throw an exception.
If I don't have a fully-configured IDE, I'll likely just ignore it and bear the risk. If I have, I'll routinely jump-to-definition into all these functions, quickly eye them for any potential issues... and, if I have the time, I'll put a comment on top of Frobnicate declaration, documenting everything I just learned - because holy hell, I don't want to waste my time doing the same thing next week. I would rename the function itself to include extra details, but then the name would be 100+ characters long...
Some languages are better at this than others, but my point is, until we have programming languages that can (and force you to) express the entire function contract in its signature and enforce this at compile-time, it's unsafe to assume a given function does what you think it does. Comments would be a decent workaround, if most programmers could be arsed to write them. As it is, you have to dig into the implementation of your dependencies, at least one level deep, if you want to avoid subtle bugs creeping in.
Though, if you’re in a nest of tiny forwarding functions, who knows how deep you’ll have to go?
Which is often unavoidable, many functions are insufficiently explained by those alone unless you want four-word camelcase monstrosities for names. The code of the function should be right-sized. Size and complexity need to be balanced there- simpler and easier-to-follow is sometimes larger. I work on compilers, query processors and compute engines, cognitive load from the subject domains are bad enough without making the code arbitrarily shaped.
[edit] oh yes, what jzoch says below. Locality helps with taming the network of complexity between functions and data.
[edit] oh no, here come the downvotes!
Come now, is four words really all that "monstrously" much?
> The code of the function should be right-sized.
Feels like that should go for its name too.
> Size and complexity need to be balanced there- simpler and easier-to-follow is sometimes larger.
The longer the code, the longer the name?
Deleted Comment
This isn't always true in my experience. Often when I need to dig into the details of a function it's because how it works is more important than what it says it's doing. There are implementation concerns you can't fit into a function name.
Additionally, I have found that function names become outdated at about the same rate as comments do. If the common criticism of code commenting is that "comments are code you don't run", function names also fall into that category.
I don't have a universal rule on this, I think that managing code complexity is highly application-dependent, and dependent on the size of the team looking at the code, and dependent on the age of the code, and dependent on how fast the code is being iterated on and rewritten. However, in many cases I've started to find that it makes sense to inline certain logic, because you get rid of the risk of names going out of date just like code comments, and you remove any ambiguity over what the code actually does. There are some other benefits as well, but they're beyond the scope of the current conversation.
Perfect abstractions are relatively rare, so in instances where abstractions are likely to be very leaky (which happens more often than people suspect), it is better to be extremely transparent about what the code is doing, rather than hiding it behind a function name.
> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.
I'll also push back against this line of thought. The sum total of possible interactions do not decrease when you move code out into a separate function. The same number of lines of code still get run, and each line carries the same potential to have a bug. In fact, in many cases, adding additional interfaces between components and generalizing them can increase the number of code paths and potential failure points.
If you define complexity by the sum total of possible interactions (which is itself a problematic definition, but I'll talk about that below), then complexity always increases when you factor out functions, because the interfaces, error-handling, and boilerplate code around those functions increases the number of possible interactions happening during your function call.
> The complexity (sum total of possible interactions) grows as the number of lines within a functional boundary grows.
What I've come to understand is that complexity is relative. A solution that makes a codebase less complex for one person in an organization may make a codebase more complex for someone else in the organization who has different responsibilities over the codebase.
If you are building an application with a large team, and there are clear divisions of responsibilities, then functional boundaries are very helpful because they hide the messy details about how low-level parts of the code work.
However, if you are responsible for maintaining both the high-level and low-level parts of the same codebase, than separating that logic can sometimes make the program harder to manage, because you still have to understand how both parts of the codebase work, but now you also have understand how the interfaces and abstractions between them fit together and what their limitations are.
In single-person projects where I'm the only person touching the codebase I do still use abstractions, but I often opt to limit the number of abstractions, and I inline code more often than I would in a larger project. This is because if I'm the only person working on the code, I need to be able to hold almost the entire codebase in my head at the same time in order to make informed architecture decisions, and managing a large number of abstractions on top of their implementations makes the code harder to reason about and increases the number of things I need to remember. This was a hard-learned lesson for me, but has made (I think) an observable difference in the quality and stability of the code I write.
> This isn't always true in my experience. Often when I need to dig into the details of a function it's because how it works is more important than what it says it's doing. There are implementation concerns you can't fit into a function name.
Both of these things are not quite right. Yes, if you have to dig into the details of a function to understand what it does, it hasn't been explained well enough. No, the prototype cannot contain enough information to explain it. No, you shouldn't look at the implementation either - that leads to brittle code where you start to rely on the implementation behavior of a function that isn't part of the interface.
The interface and implementation of a function are separate. The former should be clearly-documented - a descriptive name is good, but you'll almost always also need docstrings/comments/other documentation - while you should rarely rely on details of the latter, because if you are, that usually means that the interface isn't defined clearly enough and/or the abstraction boundaries are in the wrong places (modulo things like looking under the hood to refactor, improve performance, etc - all abstractions are somewhat leaky, but you shouldn't be piercing them regularly).
> If you define complexity by the sum total of possible interactions (which is itself a problematic definition, but I'll talk about that below), then complexity always increases when you factor out functions, because the interfaces, error-handling, and boilerplate code around those functions increases the number of possible interactions happening during your function call.
This - this is what everyone who advocates for "small functions" doesn't understand.
I have witnessed more people bend over backwards and do the most insane things in the name of avoiding "Uncle Bob's" baleful stare.
It turns out that following "Uncle Sassy's" rules will get you a lot further.
1. Understand your problem fully
2. Understand your constraints fully
3. Understand not just where you are but where you are headed
4. Write code that takes the above 3 into account and make sensible decisions. When something feels wrong ... don't do it.
Quality issues are far more often planning, product management, strategic issues than something as easily remedied as the code itself.
The problem with all these lists is that they require a sense of judgement that can only be learnt from experience, never from checklists. That's why Uncle Bob's advice is simultaneously so correct, and yet so dangerous with the wrong fingers on the keyboard.
Instead of solving how I am going to solve the problem at hand ("Clean Coding"). What a fucking waste of time, my brain power, and my lifetime keystrokes[1].
I'm starting to see that OOP is more suited to programming literal business logic. The best use for the tool is when you actually have a "Person", "Customer" and "Employee" entities that have to follow some form of business rules.
In contradiction to your "Uncle Sassy's" rules, I'm starting to understand where "Uncle Beck" was coming from:
1. Make it work.
2. Make it right.
3. Make it fast.
The amount of understanding that you can garner from make something work leads very strongly into figuring out the best way to make it right. And you shouldn't be making anything fast, unless you have a profiler and other measurements telling you to do so.
"Clean Coding" just perpetuates all the broken promises of OOP.
[1]: https://www.hanselman.com/blog/do-they-deserve-the-gift-of-y...
> 2. Understand your constraints fully
These two fall under requirements gathering. It's so often forgotten that software has a specific purpose, a specific set of things it needs to do, and that it should be crafted with those in mind.
> 3. Understand not just where you are but where you are headed
And this is the part that breaks down so often. Because software is simultaneously so easy and so hard to change, people fall into traps both left and right, assuming some dimension of extensibility that never turns out to be important, or assuming something is totally constant when it is not.
I think the best advice here is that YAGNI, don't add functionality for extension unless your requirements gathering suggests you are going to need it. If you have experience building a thing, your spider senses will perk up. If you don't have experience building the thing, can you get some people on your team that do? Or at least ask them? If that is not possible, you want to prototype and fail fast. Be prepared to junk some code along the way.
If you start out not knowing any of these things, and also never junking any code along the way, what are the actual odds you got it right?
The problem is that people often need specifics to guide them when they're less experienced. Something that "feels wrong" is usually due to vast experience being incorporated into your subconscious aesthetic judgement. But you can't rely on your subconscious until you've had enough trials to hone your senses. Hard rules can and often are overapplied, but its usually better than the opposite case of someone without good judgement attempting to make unguided judgement calls.
well there goes the entire tech industry
My personal take away was that there were a few good ideas, all horribly mangled. The most painful one I remember was his treatment of the Law of Demeter, which, as I recall, was so shallow that he didn't even really even thoroughly explain what the law was trying to accomplish. (Long story short, bounded contexts don't mean much if you're allowed to ignore the boundaries.) So most everyone who read the book came to earnestly believe that the Law of Demeter is about period-to-semicolon ratios, and proceeded to convert something like
into and somehow convince themselves that doing this was producing tangible business value, and congratulate themselves for substantially improving the long-term maintainability of the code.Meanwhile, violations of the actual Law of Demeter ran rampant. They just had more semicolons.
> Long story short, bounded contexts don't mean much if you're allowed to ignore the boundaries.
I'd like to read more. Do you know of a source that covers this properly?
oof. I mean, yeah, at least explain what the main thing you’re talking about is about, right? This is a pet peeve.
I don't recall where I picked up from, but the best advice I've heard on this is a "Rule of 3". You don't have a "pattern" to abstract until you reach (at least) three duplicates. ("Two is a coincidence, three is pattern. Coincidences happen all the time.") I've found it can be a useful rule of thumb to prevent "premature abstraction" (an understandable relative of "premature optimization"). It is surprising sometimes the things you find out about the abstraction only happen when you reach that third duplicate (variables or control flow decisions, for instance, that seem constants in two places for instance; or a higher level idea of why the code is duplicated that isn't clear from two very far apart points but is clearer when you can "triangulate" what their center is).
You want to extract common code if it's the same now, and will always be the same in the future. If it's not going to be the same and you extract it, you now have the pain of making it do two things, or splitting. But if it is going to be the same and you don't extract it, you have the risk of only updating one copy, and then having the other copy do the wrong thing.
For example, i have a program where one component gets data and writes it to files of a certain format in a certain directory, and another component reads those files and processes the data. The code for deciding where the directory is, and what the columns in the files are, must be the same, otherwise the programs cannot do their job. Even though there are only two uses of that code, it makes sense to extract them.
Once you think about it this way, you see that extraction also serves a documentation function. It says that the two call sites of the shared code are related to each other in some fundamental way.
Taking this approach, i might even extract code that is only used once! In my example, if the files contain dates, or other structured data, then it makes sense to have the matching formatting and parsing functions extracted and placed right next to each other, to highlight the fact that they are intimately related.
Also known as, “AbstractMagicObjectFactoryBuilderImpl” that builds exactly one (1) factory type that generates exactly (1) object type with no more than 2 options passed into the builder and 0 options passed into the factory. :-)
For example, when I did work at a microservices shop, I was deeply dissatisfied with the way the shared utility library influenced our code. A lot of what was in there was fairly throw-away and would not have been difficult to copy/paste, even to four or more different locations. And the shared nature of the library meant that any change to it was quite expensive. Technically, maybe, but, more importantly, socially. Any change to some corner of the library needed to be negotiated with every other team that was using that part of the library. The risk of the discussion spiraling away into an interminable series of bikesheddy meetings was always hiding in the shadows. So, if it was possible to leave the library function unchanged and get what you needed with a hack, teams tended to choose the hack. The effects of this phenomenon accumulated, over time, to create quite a mess.
I know this as AHA = Avoid Hasty Abstractions:
- https://kentcdodds.com/blog/aha-programming
- https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction
Something I’ve mentioned to my direct reports during code reviews: Sometimes, code is duplicated because it just so happens to do something similar.
However, these are independent widgets and changes to one should not affect the other; in other words, not suitable for abstraction.
This type of reasoning requires understanding the problem domain (i.e., use-case and the business functionality ).
I like abstraction as much as the next guy but this is closer to obfuscation than abstraction.
[0] People who strive for "The One" architecture that will allow any change no matter what. Seriously, abstraction out the wazoo!
Honestly. If you're getting data from a bar code scanner and you think, "we should handle the case where we get data from a hot air balloon!" because ... what if?, you should retire.
Anyways, you are not alone with that experience - a common mistake I see, no matter what language or framework, is that people fall for the fallacy "separation into files" is the same as "separation of concerns".
The Magento 2 codebase is a good example of this. It's both well written and horrible at the same time. Everything is so spread out into constituent technical components, that the code loses the "narrative" of what's going on.
Case in point: Bob Martin's "Video Store" example.
My best guess is that clean code, to them, was as little code on the screen as possible, not necessarily "intention revealing code either", instead everything is abstracted until it looks like it does nothing.
Also sat in their IRC channel for months and the lead developer was constantly discussing how he'd refactor it to be cleaner but never seemed to add code that did something.
I personally don't feel all that productive spending like half my time just navigating the code rather than actually reading it, but maybe it's just me.
What is mostly surprising I find most of developers are trying to obey the "rules". Code containing even minuscule duplication must be DRYied, everyone agrees that code must be clean and professional.
Yet it is never enough, bugs are showing up and stuff that was written by others is always bad.
I start thinking that 'Uncle Bob' and 'Clean code' zealots are actually harmful, because it prevents people from taking two steps back and thinking about what they are doing. Making microservices/components/classes/functions that end up never reused and making DRY holy grail.
Personally I am YAGNI > DRY and a lot of times you are not going to need small functions or magic abstractions.
A service has 3 "helper" services it calls, which may, in turn have helper services, or worse, depend on a shared repo project.
The only solution I have found is to move these helpers into their own project, and mark the helpers as internal. This achieves 2 things:
1. The "sub-services" are not confused as stand-alone and only the "main/parent" service can be called. 2. The "module" can now be deployed independently if micro-services ever become a necessity.
I would like feedback on this approach. I do honestly thing files over 100 lines long are unreadable trash, and we have achieved a lot be re-using modular services.
We are 1.5 years into a project and our code re-use is sky-rocketing, which allows us to keep errors low.
Of course, a lot of dependencies also make testing difficult, but allow easier mocks if there are no globals.
Dunno if this is the feedback you are after, but I would try to not be such an absolutist. There is no reason that a great 100 line long file becomes unreadable trash if you add one line.
That is not the fault of this book or any book. The problem is people treating the guidelines as rituals instead of understanding their purpose.
I'm all for the 100-200 line working version--can't say I've had a 500. I did once have a single SQL query that was about 2 full pages pushing the limits of DB2 (needed multiple PTFs just to execute it)--the size was largely from heuristic scope reductions. In the end, it did something in about 3 minutes that had no previous solution.
My clean code book:
* Put logic closest to where it needs to live (feature folders)
* WET (Write everything twice), figure out the abstraction after you need something a 3rd time
* Realize there's no such thing as "clean code"
so much this. it is _much_ easier to refactor copy pasta code, than to entangle a mess of "clean code abstractions" for things that isn't even needed _once_. Premature Abstraction is the biggest problem in my eyes.
Write Code. Mostly functions. Not too much.
So long as it remains identical. Refactoring almost identical code requires lots of extremely detailed staring to determine whether or not two things are subtly different. Especially if you don't have good test coverage to start with.
Just wanted to appreciate the nod to good nutritional advice here ("Eat food. Mostly plants. Not too much"), well done
DRY and WET are terms often used as objective measures of implementations, but that doesn't mean that they are rock solid foundations. What does it mean for something to be "repeated"? Without claiming to have TheGreatAnswer™, some things come to mind.
Chaining methods can be very expressive, easy to follow and maintain. They also lead to a lot of repetition. In an effort to be "DRY", some might embark on a misguided effort to combine them. Maybe start replacing
with This would be a bad idea, also known as Suck™.But there may equally be situations where consolidation makes sense. For example, if we're in an ORM helper class and we're always querying the database for an object like so
then it with make sense to consolidate that into My $0.02:Don't needlessly copy-pastes that which is abstractable.
Don't over abstract at the cost of simplicity and flexibility.
Don't be a zealot.
I totally agree assuming that there will be time to get to the second pass of the "write everything twice" approach...some of my least favorite refactoring work has been on older code that was liberally copy-pasted by well-intentioned developers expecting a chance to come back through later but who never get the chance. All too often the winds of corporate decision making will change and send attention elsewhere at the wrong moment, and all those copy pasted bits will slowly but surely drift apart as unfamiliar new developers come through making small tweaks.
When we got absorbed by a larger team and were going to have to "merge" our code review / git history into a larger org's repos, we learned that a sister team had gotten in big trouble with the language cops in HR when they discovered similar language in their git commit history. This brings back memories of my team panicked over trying to rewrite a huge amount of git history and code review stuff to sanitize our language before we were caught too.
Its easy to refactor if its nondivergent copypasta and you do it everywhere it is used not later than the third iteration.
If the refactoring gets delayed, the code diverges because different bugs are noticed and fixed (or thr same bug is noticed and fixed different ways) in different iterations, and there are dozens of instances across the code base (possibly in different projects because it was copypastad across projects rather than refactored into a reusable library), the code has in many cases gotten intermixed with code addressing other concerns...
Think about data structures (types) first. Mostly immutable structures. Then add your functions working on those structures. Not too many.
Good code should be written to be easy to delete.
'Clever' abstractions work against this. We should be less precious about our code and realise it will probably need to change beyond all recognition multiple times. Code should do things simply so the consequences of deleting it are immediately obvious. I think your recommendations fit with this.
This article tends to make the rounds on here every once in a while: https://programmingisterrible.com/post/139222674273/how-to-w...
Etc... etc...
(Not saying that this is the only quality of good code or that you won’t have to trade some of the above for performance or whatnot at times).
There are two opposite situations. One is when several things are viewed as one thing while they're actually different (too much abstraction), and another, where a thing is viewed as different things, when it's actually a single one (when code is just copied over).
In my experience, the best way to solve this is to better analyse and understand the requirements. Do these two pieces of code look the same because they actually mean thing in the meaning of the product? Or they just happen to look the same at this particular moment in time, and can continue to develop in completely different directions as the product grows?
In practice (time pressure) you might end up duplicating it many times, at which point it becomes difficult to refactor.
It's a similar design and planning principle to building sidewalks. You have buildings but you don't know exactly the best paths between everything and how to correctly path things out. You can come up with your own design but people will end up ignoring them if they don't fit their needs. Ultimately, you put some obvious direct connection side walks and then wait to see the paths people take. You've now established where you need connections and how they need to be formed.
I do a lot of prototyping work and if I had to sit down and think out a clean abstraction everytime I wanted to get for a functional prototype, I'd never have a functional prototype--plus I'd waste a lot of cognitive capacity on an abstraction instead of solving the problem my code is addressing. It's best, from my experience, to save that time and write messy code but tuck in budget to refactor later (the key is you have to actually refactor later not just say you will).
Once you've built your prototype, iterated on it several times had people break designs forcing hacked out solutions, and now have something you don't touch often, you usually know what most the product/service needs to look like. You then abstract that out and get 80-90% of what you need if there's real demand.
The expanded features beyond that can be costly if they require significant redesign but at that point, you hopefully have a stable enough product it can warrant continued investment to refactor. If it doesn't, you saved yourself a lot of time and energy worrying trying to create a good abstract design that tends to fail multiple times at early stages. There's a balance point of knowing when to build technical debt, when to pay it off, and when to nullify it.
Again, the critical trick is you have to actually pay off the tech debt if that time comes. The product investor can't look bright eyed and linearly extrapolate progress so far thinking they saved a boatload of capital, they have to understand shortcuts were taken and the rationale was to fix them if serious money came along or chuck them in the bin if not.
That's still too much of a "rule".
Whenever I feel (or know) two functions are similar, the factors that determine if I should merge them:
- I see significant benefit too doing so, usually the benefit of a utility that saves writing the same thing in the future, or debugging the same/similar code repeatedly.
- How likely the code is to diverge. Sometimes I just mark things for de-duping, but leave it around a while to see if one of the functions change.
- The function is big enough it cannot just be in-lined where it is called, and the benefit of de-duplication is not outweighed by added complexity to the call stack.
Document.
Your.
Shit.
everything else can do one. Just fucking write documentation as if you're the poor bastard trying to maintain this code with no context or time.
Comments in code can lie (they're not functional); can be misplaced (in most languages, they're not attached to the code they document in any enforced way); are most-frequently used to describe things that wouldn't require documenting if they were just named properly; are often little more than noise. Code comments should be exceedingly-rare, and only used to describe exception situations or logic that can't be made more clear through the use of better identifiers or better-composed functions.
External documentation is usually out-of-sight, out-of-mind. Over time, it diverges from reality, to the point that it's usually misleading or wrong. It's not visible in the code (and this isn't an argument in favor of in-code comments). Maintaining it is a burden. There's no agreed-upon standard for how to present or navigate it.
The best way to document things is to name identifiers well, write functions that are well-composed and small enough to understand, stick to single-responsibility principles.
API documentation is important and valuable, especially when your IDE can provide it readily at the point of use. Whenever possible, it should be part of the source code in a formal way, using annotations or other mechanisms tied to the code it describes. I wish more languages would formally include annotation mechanisms for this specific use case.
One of the most difficult to argue comments in code reviews: “let’s make it generic in case we need it some place else”. First of all, chances that we need it some place else aren’t exactly high, unless you are writing a library and code explicitly design to be shared. And even if such need arises, chances of getting it right generalizing from one example are slim.
Regarding the book though, I have participated in one of the workshops with the author and he seemed to be in favor of WER and against “architecting” levels of abstraction before having concrete examples.
You can disagree over what exactly is clean code. But you will learn to distinguish what dirty code is when you try to maintain it.
As a person that has had to maintain dirty code over the years, hearing someone saying dirty code doesn't exist is really frustrating. Noone wants to clean up your code, but doing it is better than allowing the code to become unmaintainable, that's why people bring up that book. If you do not care about what clean code is, stop making life difficult for people that do.
Not sure why you're being downvoted, but as an unrelated aside, the quote you're responding to literally did not say this.
> that's why people bring up that book
I think the point is that following that book does not really lead to Clean Code.
> As a person that has to maintain dirty code
This is a strange credential to present and then use as a basis to be offended. Are you saying that you have dirty code and have to keep it dirty?
I mean yes it's more verbose, BUT it's also super clear and obvious what things do, and they do not leak the underlying implementation.
The other key thing I think is not to over-engineer abstractions you don’t need yet. But to try and leave ‘seams’ where it’s obvious how to tease code about if you need to start building abstractions.
Can you say more about this?
I think I may have stumbled on a similar insight myself. In a side project (a roguelike game), I've been experimenting with a design that treats features as first-class, composable design units. Here is a list of the subfolder called game-features in the source tree:
An extract from the docstring of the entire game-feature package: The project is still very much work-in-progress (procrastinating on HN doesn't leave me much time to work on it), and most of the above features are nowhere near completion, but I found the design to be mostly sound. Each game feature provides code that implements its own concerns, and exports various functions and data structures for other game features to use. This is an inversion of traditional design, and is more similar to the ECS pattern, except I bucket all conceptually related things in one place. ECS Components and Systems, utility code, event definitions, etc. that implement a single conceptual game aspect live in the same folder. Inter-feature dependencies are made explicit, and game "superstructure" is designed to allow GFs to wire themselves into appropriate places in the event loop, datastore, etc. - so in game startup code, I just declare which features I want to have enabled.(Each feature also gets its set of integration tests that use synthetic scenarios to verify a particular aspect of the game works as I want it to.)
One negative side effect of this design is that the execution order of handlers for any given event is hard to determine from code. That's because, to have game features easily compose, GFs can request particular ordering themselves (e.g. "death" can demand its event handler to be executed after "destructibility" but before "log") - so at startup, I get an ordering preference graph that I reconcile and linearize (via topological sorting). I work around this and related issues by adding debug utilities - e.g. some extra code that can, after game startup, generate a PlantUML/GraphViz picture of all events, event handlers, and their ordering.
(I apologize for a long comment, it's a bit of work I always wanted to talk about with someone, but never got around to. The source of the game isn't public right now because I'm afraid of airing my hot garbage code.)
It might be hard to integrate related things, e.g. physical simulation/kinematics <- related to collisions, and maybe sight/hearing <- related to rendering; Which is all great if information flows one way, as a tree, but maybe complicated if it's a graph with intercommunication.
I thought about this before, and figured maybe the design could be initially very loose (and inefficient), but then a constraint-solver could wire things up as needed, i.e. pre-calculate concerns/dependencies.
Another idea, since you mention "logs" as a GF: AOP - using " join points" to declaratively annotate code. This better handles code that is less of a "module" (appropriate for functions and libraries) and more of a cross-cutting "aspect" like logging. This can also get hairy though: could you treated "(bad-path) exception handling" as an aspect? what about "security"?
Your top issue in the runtime game loop is always with concurrency and synchronization logic - e.g. A spawns before B, if A's hitbox overlaps with B, is the first frame that a collision event occurs the frame of spawning or one frame after? That's the kind of issue that is hard to catch, occurs not often, and often has some kind of catastrophic impact if handled wrongly. But the actual effect of the event is usually a one-liner like "set a stun timer" - there is nothing to test with respect to the event itself! The perceived behavior is intimately coupled to when its processing occurs and when the effects are "felt" elsewhere in the loop - everything's tied to some kind of clock, whether it's the CPU clock, the rendered frame, turn-taking, or an abstracted timer. These kinds of bugs are a matter of bad specification, rather than bad implementation, so they resist automated testing mightily.
The most straightforward solution is, failing pure functions, to write more inline code(there is a John Carmack posting on inline code that I often use as a reference point). Enforce a static order of events as often as possible. Then debugging is always a matter of "does A happen before B?" It's there in the source code, and you don't need tooling to spot the issue.
The other part of this is, how do you load and initialize the scene? And that's a data problem that does call for more complex dependency management - but again, most games will aim to solve it statically in the build process of the game's assets, and reduce the amount of game state being serialized to save games, reducing the complexity surface of everything related to saves(versioning, corruption, etc). With a roguelike there is more of an impetus to build a lot of dynamic assets(dungeon maps, item placements etc.) which leads to a larger serialization footprint. But ultimately the focus of all of this is on getting the data to a place where you can bring it back up and run queries on it, and that's the kind of thing where you could theoretically use SQLite and have a very flexible runtime data model with a robust query system - but fully exploiting it wouldn't have the level of performance that's expected for a game.
Now, where can your system make sense? Where the game loop is actually dynamic in its function - i.e. modding APIs. But this tends to be a thing you approach gradually and grudgingly, because modders aren't any better at solving concurrency bugs and they are less incentivized to play nice with other mods, so they will always default to hacking in something that stomps the state, creating intermittent race conditions. So in practice you are likely to just have specific feature points where an API can exist(e.g. add a new "on hit" behavior that conditionally changes the one-liner), and those might impose some generalized concurrency logic.
The other thing that might help is to have a language that actually understands that you want to do this decoupling and has the tooling built in to do constraint logic programming and enforce the "musts" and "cannots" at source level. I don't know of a language that really addresses this well for the use case of game loops - it entails having a whole general-purpose language already and then also this other feature. Big project.
I've been taking the approach instead of aiming to develop "little languages" that compose well for certain kinds of features - e.g. instead of programming a finite state machine by hand for each type of NPC, devise a subcategory of state machines that I could describe as a one-liner, with chunks of fixed-function behavior and a bit of programmability. Instead of a universal graphics system, have various programmable painter systems that can manipulate cursors or selections to describe an image. The concurrency stays mostly static, but the little languages drive the dynamic behavior, and because they are small, they are easy to provide some tooling for.
Do I force myself to follow every single crazy rule in Clean Code? Heck no. Some of them I don't agree with. But do I find myself to be a better coder because of what I learned from Bob Martin? Heck yes. Most of the points he makes are insightful and I apply them daily in my job.
Being a professional means learning from many sources and knowing that there's something to learn from each of them- and some things to ignore from each of them. It means trying the things the book recommends, and judging the results yourself.
So I'm going to keep recommending Clean Code to new developers, in the hopes that they can learn the good bits, and learn to ignore the bad bits. Because so far, I haven't found a book with more good bits (from my perspective) and fewer bad bits (from my perspective).
[0]https://en.wikipedia.org/wiki/Don%27t_throw_the_baby_out_wit...
I see a lot of people in here acting like all the advice in Clean Code is obviously true or obviously false, and they claim to know how to write a better book. But, like you, I will continue to recommend Clean Code to new developers on my team. It's the fastest way (that I've found so far, though I see other suggestions in the comments here) to get someone to transition from writing "homework code" (that never has to be revisited) to writing maintainable code. Obviously, there are bad parts of Clean Code, but if that new dev is on my team, I'll talk through why certain parts are less useful than others.
Deleted Comment
And to be honest that book will never exist, every knowledge contributes to growing as a professional, just make sure to understand, discuss, and use it (or not) for a real reason, not just becaue its on book A or B.
Its not like people need to choose one book and follow it blindly for the rest of their lives, read more books :D
You're not supposed to take it literally for every project, these are concepts that you need to adapt to your needs. In that sense I think the book still holds up.
The Dreyfus model identifies 5 skill levels:
Novice
Wants to achieve a goal, and not particularly interested in learning. Requires context free rules to follow. When something unexpected happens will get stuck.
Advanced Beginner
Beginning to break away from fixed rules. Can accomplish tasks on own, but still has difficulty troubleshooting. Wants information fast.
Competent
Developed a conceptual model of task environment. Able to troubleshoot. Beginning to solve novel problems. Seeks out and solve problems. Shows initiative and resourcefulness. May still have trouble determining which details to focus on when solving a problem.
Proficient
Needs the big picture. Able to reflect on approach in order to perform better next time. Learns from experience of others. Applies maxims and patterns.
Expert
Primary source of knowledge and information in a field. Constantly look for better ways of doing things. Write books and articles and does the lecture circuit. Work from intuition. Knows the difference between irrelevant and important details.
Meh. I'm probably being picky, but it doesn't surprise me that a Thought Leader would put themselves and what they do as Thought Leader in the Expert category. I see them more as running along a parallel track. They write books and run consulting companies and speak at conferences and create a brand, and then there are those of us who get good at writing code because we do it every day, year after year. Kind of exactly the difference between sports commentators and athletes.
If treated as guidelines you are correct Clean Code is only eh instead of garbage. But taken in the full context of how it is presented/intended to be taken by the author it is damaging to the industry.
In no way is he treating his book as a bible.
However, in most cases heading towards that goal is a beneficial thing--you just have to recognize when you're getting too close and bogging down in complying with every detail.
I still consider it the best programming book I've ever read.
I agree: Clean Coders and TDDers are cut from the same cloth.
Deleted Comment
I agree with the sentiment that you don't want to over abstract, but Bob doesn't suggest that (as far as I know). He suggests extract till you drop, meaning simplify your functions down to doing one thing and one thing only and then compose them together.
Hands down, one of the best bits I learned from Bob was the "your code should read like well-written prose." That has enabled me to write some seriously easy to maintain code.
That strikes me as being too vague to be of practical use. I suspect the worst programmers can convince themselves their code is akin to poetry, as bad programmers are almost by definition unable to tell the difference. (Thinking back to when I was learning programming, I'm sure that was true of me.) To be valuable, advice needs to be specific.
If you see a pattern of a junior developer committing unacceptably poor quality code, I doubt it would be productive to tell them Try to make it read more like prose. Instead you'd give more concrete advice, such as choosing good variable names, or the SOLID principles, or judicious use of comments, or sensible indentation.
Perhaps I'm missing something though. In what way was the code should read like well-written prose advice helpful to you?
For example, using what I learned I read this code (https://github.com/cheatcode/nodejs-server-boilerplate/blob/...) as:
"This module is going to generate a password reset token. First, it's going to make sure we have an emailAddress as an input, then it's going to generate a random string which I'll refer to as token, and then I want to set that token on the user with this email address."
Code should be simple and tight and small. It should also, however, strive for an eighth grade reading level.
You shouldn't try to make your classes so small that you're abusing something like nested ternary operators which are difficult to read. You shouldn't try to break up your concepts so much that while the sentences are easy, the meaning of the whole class becomes muddled. You should stick with concepts everyone knows and not try to invent your own domain specific language in every class.
Less code is always more, right up until it becomes difficult to read, then you've gone too far. On the other hand if you extract a helper method from a method which read fine to begin with, then you've made the code harder to read, not easier, because its now bigger with an extra concept. But if that was a horrible conditional with four clauses which you can express with a "NeedsFrobbing" method and a comment about it, then carry on (generating four methods from that conditional to "simplify" it is usually worse, though, due to the introduction of four concepts that could be often better addressed with just some judicious whitespace to separate them).
And I need to learn how to write in English more like Hemmingway, particularly before I've digested coffee. That last paragraph got away from me a bit.
Coding at scale is about managing complexity. The best code is code you don't have to read because of well named functional boundaries. Natural language is our facility for managing complexity generally. It shouldn't be surprising that the two are mutually beneficial.
I think the best compromise is small summary comments at various points of functions that "hold the entire thought".
Because of well structured abstractions, thoughtful naming conventions, documentation where required, and extensive testing you trust that the function does what it says. If I'm looking at a function like commitPending(), I simply see writeToDisk() and move on. I'm in the object representation layer, and jumping down into the details of the I/O layers breaks flow by moving to a different level of abstraction. The point is I trust writeToDisk() behaves reasonably and safely, and I don't need to inspect its contents, and definitely don't want to inline its code.
If you find that you frequently need to jump down the tree from sub-routine to sub-routine to understand the high level code, then that's a definite code smell. Most likely something is fundamentally broken in your abstraction model.
I think this is the best, honestly, it reaps most benefits of small single-use functions, without compromising readability or requiring jumping.
This is also how John Carmarck recommends in this popular essay: http://number-none.com/blow/john_carmack_on_inlined_code.htm...
When I get the error in the console/browser, the path to the error is included for me like "[generatePasswordResetToken.setTokenOnUser] Must pass value to $set to perform an update."
With that, I know exactly where the error is ocurring and can jump straight into debugging it.
1. The connect action could be replaced by doing the connection once on app startup.
2. The validation could be replaced with middleware like express-joi.
3. The stripe/email steps should be asynchronous (ex: simple crons). This way, you create the user and that's it. If Stripe is down, or the email provider is down, you still create the user. If the server restarts while someone calls the endpoint, you don't end up with a user with invalid Stripe config. You just create a user with stripeSetup=false and welcomeEmailSent=false and have some crons that every 5 seconds query for these users and do their work. Also, ensure you query for false and not "not equal to true" here as it's not efficient.
This is a good assertion but ironically it's not Bob Martin's line. He was quoting Grady Booch.
One long stream of commands is ok to read, if you are author or already know what it should do. But otherwise it forces you to read too many irrelevant details on a way toward what you need.
Of course reading Clean Code left me more confused than enlightened due precisely to what he presents as good examples of Code. The author of the article really does hit the nail on the head about Martin's code style - it's borderline unreadable a lot of times.
Who the f. even is Robert Martin?! What has he built? As far as I am able to see he is famous and revered because he is famous and revered.
I was around in the early 90s through to the early 2000s when a lot of the ideas came about slowly got morphed by consultants who were selling this stuff to companies as essentially "religion". The nuanced thoughts of a lot of the people who had most of the original core ideas is mostly lost.
It's a tricky situation, at the core of things, there are some really good ideas, but the messaging by people like "uncle bob" seem to fail to communicate the mindset in a way that develops thinking programmers. Mainly because him, and people like Ron Jerfferies, really didn't actually build anything serious once they became consultants and started giving out all these edicts. If you watched them on forums/blogs at the time, they were really not that good. There were lots of people who were building real things and had great perspectives, but their nuanced perspectives were never really captured into books, and it would be hard to as it is more about the mentality of using ideas and principles and making good pragmatic choices and adapting things and not being limited by "rules" but about incorporating the essence of the ideas into your thinking processes.
So many of those people walked away from a lot of those communities when it morphed into "Agile" and started being dominated by the consultants.
But I had more and more difficulty reconciling bizarrely optimistic patterns with reality. This from the article perfectly sums it up:
>Martin says that functions should not be large enough to hold nested control structures (conditionals and loops); equivalently, they should not be indented to more than two levels.
Back then as now I could not understand how one person can make such confident and unambiguous statements about business logic across the spectrum of use cases and applications.
It's one thing to say how something should be written in ideal circumstances, it's another to essentially say code is spaghetti garbage because it doesn't precisely align to a very specific dogma.
After you're a bit more seasoned, I think qualified comments are probably far more welcome than absolutes.
I don't think that's true at all. One of the old 'erlang bibles' is "learn you some erlang" and it full of qualifications titled "don't drink the kool-aid" (notably not there in the haskell inspiration for the book). It does not fail to inspire confidence to have qualifications scattered throughout and to me it actually gives me MORE confidence that the content is applicable and the tradeoffs are worth it.
https://learnyousomeerlang.com/introduction#about-this-tutor...
My recollection of the clean code book and Fowler books were very much “I think these are smells, but smells in the code are also fine”
Note: Robert Martin and Martin Fowler are different people. Are you saying Fowler said this?
Fixed the typo, thanks.
Does it say that though?