I think the author is taking general advice and applying it to a niche situation.
> So by violating the first rule of clean code — which is one of its central tenants — we are able to drop from 35 cycles per shape to 24 cycles per shape
Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something. If you're writing a AAA video game, or high performance calculation software then sure, go crazy, get those improvements.
But most of us aren't doing that. Most developers are doing work where the biggest problem is adding the next umpteenth features that Product has planned (but hasn't told us about yet). Clean code optimizes for improving time-to-market for those features, and not for the CPU doing less work.
100%, I’ve done tonnes of (backend) performance optimization, profiling, etc. on higher level applications, and the perf bottlenecks have never been any of the things discussed in this article. It’s normally things like:
- Slow DB queries
- Lack of concurrency/parallelism
- Lack of caching/memoization for some expensive thing that could be cached
- Excessive serialization/deserialization (things like ORMs that create massive in memory objects)
- GC tuning/not enough memory
- Programmer doing something dumb, like using an array when they should be using a set (and then doing a huge number of membership checks)
With that being said, I have worked on the odd performance optimization where we had to get quite low level. For example, when working on vehicle routing problems, they’re super computationally heavy, need to be optimized like crazy and the hot spots can indeed involve pretty low level optimizations. But it’s been rare in the work I’ve done.
This article is probably meaningful for people who work on databases, games, OSes, etc., but for most devs/apps these tips will yield zero noticeable performance improvements. Just write code in a way you find clean/maintainable/readable, and when you have perf issues, profile them and ship the appropriate fix.
Casey Muratori knows a lot about optimizing performance in game engines. He then assumes that all other software must be slow because of the exact same problems.
I think the core problem here is that he assumes that everything is inside a tight loop, because in a game engine that's rendering 60+ times a second (and probably running physics etc at a higher rate than that) that's almost always true.
Also the fact that his example of what "everyone" supposedly calls "clean code" looks like some contrived textbook example from 20 years ago strains his credibility.
Edit: come to think of it, the only person I know of who actually uses the phrase "clean code" as if it's some kind of concrete thing with actual rules is Uncle Bob. Is Casey assuming the entire commercial software industry === Uncle Bob? It's like he talked to one enterprise java dev like 10 years ago and based his opinion of the entire industry on them.
All of these are instances of doing something _wasteful_, which is the #1 issue he mentions in the list of things that cause performance degradation.
Now, your argument seems to be: in the real world, there's so much waste, that virtual function calls pale in comparison.
This does not debunk his main point, which seems to me at least the following: all things being equal, writing code with virtual functions that do a tiny amount of work and "hiding implementation details" makes performance worse, sometimes by an order of magnitude.
Now, there maybe situations where you _have_ to use virtual functions, because you are writing a library for other people to use, and you can't dictate ahead of time how they will use it.
This again does not invalidate the point. You need to be _aware_ of the performance implications of this, and mitigate it. He said the following in the comment section on the article:
> Try to make it so that you do very rare virtual function calls, behind which you do a _large_ amount of work, rather than the "clean" code way of using lots of little function calls.
Write slow code now, profile and optimize later is how we got all slow software because second step - optimization practically never happens in my experience.
Along with the heuristic that hardware and electricity are cheap, developers are expensive. That's probably why managers in my experience almost never ask developers to optimize slow code - they believe it would be cheaper to use more hardware if this fixes the problem (in some areas like HFT or GamDev more hardware is not the answer so optimization do happen). In rare cases I've seen optimization being done initiative always come from IC (who knew that the code could work fast / use less resources).
So nowadays writing code I assume that it never will be optimized later and try to do less dump stuff from the beginning.
If you program using design patterns that are 10x slower, your application end up 10x slower, even after you've optimised the hot spots away, and the profiler will not give you any idea that it could be still 10x faster.
ORMs and slow DB queries kind of go hand in hand. Also, you'd be surprised at how efficient arrays are for membership checks so long as the number of items is even moderately small (as a rough rule of thumb, the only thing that really matters with these kinds of checks is how many cache lines are involved).
Well said. The aphorism "Premature optimization is the root of all evil" is meant to mean "Build it right first, then optimize only what needs to be optimized". There's really no need to start cooking spaghetti right off the bat. Clean code with some performance tweaks will be more maintainable in the long run without sacrificing performance.
To be fair to OP, does anything about your current paradigm even allow you to evaluate his claims? Is your code even in a form that you can for example remove polymorphism and use simple arrays of data you want to work over?
You can of course only optimize what you are looking to optimize. I am not surprised (honestly) that some engineers do not realize they are in fact primed for the kind of things they will find based on what they are looking by the mere choice of where they want to look.
Maybe for low user count is valid. We run a large multitenant, microservice based application and the physical machines where the Kubernetes pods reside have their CPUs at 90%. The application makes such a large use of "clean coding", "design patterns", SOLID that would make Uncle Bob proud. We would have been better without using so much abstractions on top of abstractions.
In a business context, it usually happens that hardware is cheaper than software (licensing) which is cheaper than engineering labor. Tack on the opportunity cost of delaying business advances/features and it's usually cheaper to just throw hardware at it.
There's a tipping point where you have so much hardware there's big savings with optimization. Things like Postgres and the Linux kernel have a lot of optimization put into them and there's an insane amount of hardware out running that code.
I'd actually say this article is generally unhelpful - it's good to be aware but as someone who works on sorting out performance critical things I want the code to be as clean as humanly possible going in. Whether you write clean or dirty code if you're a junior developer you're probably not going to write performant code and even senior devs may be able to sniff what might be a bottleneck in advance but most of us have learned to avoid premature optimization like the plague.
Maintainability and cleanliness is the best virtue code can have. If you have extremely clearly written code that has performance issues I can swoop in with analysis tools figure out where the pain point is and refactor it out. Sometimes this is a real headache[1] sometimes not - what I can guarantee is that if the code is "dirty" it's going to be a headache and it'll take more time.
I'd personally take issue with this article over the polymorphism claim though - polymorphism is a tool but it isn't the be-all and end-all tool. A lot of your data can live as structs/blobs in memory with tight internal type definition but without any OO principals. Personally I am a huge fan of functional programming (but not pure functional programming) so objects that I use are relatively few and far between and exist to fulfill a very specific purpose.
I've had two occasions in working when I needed to break out an asm block - the compiler was being a thick headed dummy and this code needed to receive incoming signals without exception or delay - but once that critical section was passed? Back to high level programming and statements favoring expressiveness over raw bare metal performance.
If you want an interesting experience talk to your closest non-technical manager type - be that a product team manager or the company owner - and ask them if they'd prefer if you focus on reducing how long your product takes to execute by 20% over the next five years or if they'd prefer you to lower the growth of the developer labor budget by 20% for the next five years by focusing on maintainability over performance. With the exception of extremely niche cases maintainability is always the golden standard.
1. For instance, I've dealt with OOM issues that have required transforming all logic on a query result to be lazily evaluated on a data stream after main execution finishes - like the logic goes up and down the stack and only then begins processing results. In this particular case the problem was rather easy to deal with because we essentially swapped out the actual value passing on each layer for a lazy result set being passed around - because the code was clean. Sometimes you'll definitely need to massively re-engineer things though.
What I've seen is slow queries but a bigger problem is actually too many queries. It's easy to do especially when using an ORM.
It mostly happens on change when you want to add something to an existing query the changer just add their new query and slop it into a loop, boom performance is gone.
One I found with profiling--when writing the code n was quite small. Many database operations simply iterated over an array to decide where to store an item. The time spent dealing with the data was a tiny fraction of the database round trip time, there simply was no reason to get fancy.
Over the years they've grown and one bin showed up where jobs would sit around for weeks--for that case n went from rarely having a screenful of lines to a few thousand--oops, now more than 2/3 of the time was spent in those searches. (And I have a sneaking suspicion that a good portion of the remaining time comes from using field names to retrieve values. The profiler doesn't separate that out, though, because it's not my code.)
Author could use a little more memoization in his example, but I suspect that breaks some of the simplicity of his argument.
If shape Area is computed often enough that you care about inlining the calculation, why not compute & store it every time the height / width change. That’d be easy enough in an architecture based on information hiding, and might illustrate a legitimate engineering trade-off between those architectural choices.
> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something. If you're writing a AAA video game, or high performance calculation software then sure, go crazy, get those improvements.
That's really not even close to true. Loading random websites frequently costs multiple seconds worth of local processing time, and indeed, that's often because of the exact kind of overabstraction that this article criticizes (e.g. people use React and then design React component hierarchies that seem "conceptually clean" instead of ones that perform a rendering strategy that makes sense.)
> That's really not even close to true. Loading random websites frequently costs multiple seconds worth of (...)
You attempted to present an argument that's a textbook example of an hyperbolic fallacy.
There are worlds of difference between "this code does not sit in a hot path" and "let's spend multiple seconds of local processing time".
This blend of specious reasoning is the reason why the first rule of software optimization is "don't". Proponents of mindlessly going about the 1% edge cases fail to understand that the whole world is comprised of the 99% of cases where shaving off that millisecond buys you absolutely nothing, with a tradeoff of producing unmaintainable code.
The truth of the matter is that in 99% of the cases there is absolutely no good reason to run after these relatively large performance improvements if in the end the user notices absolutely nothing. Moreso if you're writing async code that stays far away from any sort of hot path.
I don't disagree that the balance is shifting towards "why is this taking so long". There's ebbs and flows in that ecosystem.
But overall, I think you overestimate how much time you spend loading the website and how much time it's just sitting there, mostly idle.
And in the end, as long as it's fast enough that users don't stop using the site/webapp/program/whatever, then it's fine, imho. When it becomes too slow, the developers will be asked to improve performance. Because in the end, economics is the driver, not performance.
> That's really not even close to true. Loading random websites frequently costs multiple seconds worth of local processing time
Unless we’re talking about specific compute-intensive websites, this is almost certainly network loading latency.
Modern web browsers are very fast. Moderns CPUs are very fast. Common, random websites aren’t churning through “multiple seconds” of CPU time just to render.
Well, you can make things run even faster by hand-coding it in assembler... but performance isn't the reason we use high level languages. I agree with you that ignoring performance characteristics in favor of speed-to-market is an awful and pervasive practice in modern software development, but the linked article isn't talking about or making that case at all. He's saying that he can make his own custom object oriented C language that runs faster than C++ itself, but that's not news - people were saying that in 1995 (at least). The maintainability hit isn't worth it.
Since you bring up React in your example, which framework should one use to build better performing web apps?
I know React tends to lack in both dev UX and performance (at least in my exp). Personally I've taken a look at Svelte and Solid, and liked them both. I haven't had the chance to build anything larger than a toy app, though.
For what it's worth, less than 4% of websites use React (approximately 4% use any JS framework) . If you believe the web is slow because of React you are wrong. It's not even due to JS.
I would argue that ignoring performance, a lot of "clean" code isn't really that much clearer and more maintainable at all (At least by the Robert Martin definition of "Clean Code"). Things like dependency injection and runtime polymorphism can make it really hard to trace exactly what happens in what sequence in the code, and things like asynchronous callbacks can make things like call stacks a nightmare (granted, you do need the latter a lot). Small functions can make code hard to understand because you need to jump to a lot of places and lose the thread on the sequential state (bonus points if the small functions are only ever used once by one other function). The more I work in code bases the more I find that overusing these "clean" ideas can obscure the code more than if it was just written plainly. I think a lot of times, if a technique confuses compiler optimization and static code analysis, it's probably going to confuse humans also.
There's some videos on the internet claiming object oriented programming is pretty bad in many situations. And lately I've been wondering if there's a kernel of truth in this statement. As an alternative, often procedural programming is advised instead.
I think a lot of the clean code advice in general related to object oriented programming.
I've noticed that once my Lua programs (games) grow to reasonable size, it becomes kinda hard to maintain. And I tend to use an object oriented programming style (of course it also doesn't help that Lua is not typesafe). After I finish my current game, I want to try to make a game using a procedural approach. I wonder if this would solve some of the issues I see in my current code base.
One of the core ideas of procedural programming is that data and functionality is not mixed in classes as we do in object oriented programming. Instead, you might have a module that contains some functions and some data objects the functions act upon. This approach would make some other aspects of game programming with Lua easier as well (e.g. serialisation), but perhaps it will make the code also easier maintainable as the size of the codebase grows. It's something I want to contemplate upon.
> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something. If you're writing a AAA video game, or high performance calculation software then sure, go crazy, get those improvements.
CPU meter when clicking anything on "modern" webpage proves that's a lie.
Also, sure, even if "clicking on things" is maybe 1-5% vs "looking at things" THAT'S THE CRITICAL PATH.
Once the app rendered a view obviously it is not doing much but user is also not waiting on anyting and is "being productive", parsing whatever is displayed.
The critical path, the wasted time is the time app takes to render stuff and "but 99% of the time is not doing it" is irrelevant.
>Clean code optimizes for improving time-to-market for those features
Does it though? Where's the evidence for it? The vast majority of people I've worked with over the last couple decades who like to bring up "clean code", tend towards the wrong abstractions and over abstracting.
I almost always prefer working with someone who writes the kind of code Casey was than someone who follows the clean code examples I've spent my career dealing with. I've seen and worked with many examples of Data Oriented Design that were far from unmaintainable or unreadable.
Completely agree. These rules simply do not lead to better outcomes in all cases. Looking at the rules and playing Devil's advocate for fun:
> Prefer polymorphism to “if/else” and “switch”
Algebraic data types and pattern matching (a more general version of switch), make many types of data transformation far easier to understand and maintain (versus e.g. the visitor pattern which uses adhoc polymorphism).
> Code should not know about the internals of objects it’s working with
This is interpreted by many as "don't expose data types". Actually some data types are safe to expose. We have a JSON library at work where the actual JSON data type is kept abstract and pattern matching cannot be used. This is despite the fact that JSON is a published (and stable) spec and therefore already exposed!
> Functions should be small
"Small" is a strange metric to optimise for, which is why I don't like Perl. Functions should be readable and easy to reason about. Let's optimise for "simple" instead.
> Functions should do one thing
This is not always practical or realistic advice. Most functions in OOP languages are procedures that will likely perform side effects in addition to returning a result (e.g. object methods). Should we also not do logging? :)
> “DRY” - Don’t Repeat Yourself
The cost of any abstraction must be weighed up against the repetition on a case-by-case basis. For example, many languages do not abstract the for-loop and effectively encourage users to write it out over and over again, because they have decided that the cost of abstracting it (internal iteration using higher-order functions) is too high.
You are confounding three separate skills. Finding the right abstractions is an art, whether you write clean code or not. Writing high performance code is another art.
A really good developer writes clean code using the right abstraction (finding those tends to take the most time and experience) and drop down to a different level of abstraction for high performance areas where it makes sense.
The fact that bad developers suck and write bad code no matter if they use clean code or not does not reflect on the methodology
Having done a lot of performance work on Gecko (Firefox), we generally knew where cycles mattered a lot (low level graphics like rasterization, js, dom bindings, etc...) and we used every trick there. But for the majority of the millions of LoC of the codebase these details didn't matter like you say.
If we had perf issues that showed up outside they were higher level design issues like 1) trying to take a thumbnail of the page at full resolution for a tab thumbnail while loading another tab, not because the thumbnailing code itself was slow, or 2) running slow O(tabs) JS teardown during shutdown when we could run a O(~1) clean up step instead.
What you're basically saying is "modern computers are so much faster than anyone needs them to be, it's okay to make them a little slower."
This works until your computer is old enough to be slower than what a majority of wealthy people (ie desirable customers) are using, at which point you need to buy a newer, faster computer, even though your current one was already "faster than anyone could reasonably need it to be".
This is all harmless enough—a little disrespectful perhaps, to make other people waste their money, but not so terrible—until you consider the environmental impact of all these new computers, which the average spreadsheet absolutely should not need but does anyway. It's also an equity issue—someone on a fixed income can't necessarily afford a new machine.
What would actually happen if Moore's law ended tomorrow, and we were no longer able to make computers any faster than they are today? It would really suck for scientists and hardcore gamers, but I actually think a majority of computer users would benefit The experience of someone who just writes documents and checks email would be unchanged, except that their current computers would never slow down!
> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something.
Assuming it is right, there is something called multitasking, the CPU, RAM, and most importantly, the cache is not all yours, if there is 1000 pieces of software like yours, that's 100%. You may argue that 1000 pieces of software is unreasonable, and you would mostly be right, but it happens, and mostly for the same reason software isn't optimized: quantity over quality.
Another issue is that you have to make a distinction between throughput and latency. You don't have to keep up with a sustained 100 actions per second, people don't go that fast, but you definitely have to respond within tens of milliseconds, because more than that is noticeable. Latency is much harder to optimize and if you are in the critical path, these cycles may matter.
A lot of devices are battery powered these days, and all these wasted cycle are reducing the battery life of the entire system. Mobile devices are crazy powerful these days, but this power is meant to be used sparingly. And even with line powered devices, I think we waste enough energy as it is...
And finally, what is the point of "clean code"? Hopefully not just because it gives software architects boners. The point is usually to make software that will last: easier maintenance, less bugs, etc... But performance bugs exist too, and one of the most common software evolution is to do more of what the software already does. An image editing software will process more and bigger images, a database will store more entries and more details about each entry, documents will get larger, etc... You may even find that your users are using some feature on a scale you never intended, maybe someone is pasting entire books on your note taking app, and it may turn out working quite well... if you cared about performance. Not caring about performance is technical debt, and it may negate the advantage of using "clean code" in the first place.
> I think the author is taking general advice and applying it to a niche situation.
I don't. how many programs are running in your OS right now? how much CPU do you need to keep those things plus the things you need running in a performant manner?
how much CPU would you need if things performed better? the answer is "less" every time.
better software performance = less money required for hardware to obtain the responsiveness you require.
it's important, and it's important completely independently of how it is framed here.
just wait till you've seen software get slower for 30 years. to put it another way, watch hardware get faster and faster and faster for 30 years while you observe software continually consume all of the available headroom until it feels slow again. watch that happen for THREE DECADES and wait for someone to tell you that everything is fine and that someone saying "software is unnecessarily slow" is wrong because they aren't framing their argument how you think it should be framed.
>Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something.
All that says is you should focus your energy on the increasing the value of .1%. It's not actually an argument to not spend any energy.
It's like saying 'Astronauts only spend .1% of their time in space' or 'tomatoes only spend .1% of their existence being eaten' - that .1% is the whole point.
You can debate how best to maximize that value, more features or more performance. The OP is suggesting folks are just leaving performance on the floor and then making vacuous arguments to excuse it.
I don't know man, my TV has hardware several orders of magnitude faster and more advanced than the hardware that took us to the moon, and it takes dozens of seconds for apps like Netflix or Amazon Prime Video to load dashboards / change profiles or several seconds to do simple navigation or adjust playback. People just don't know how to properly write software these days, universities just churn out code monkeys with a vicious feedback loop occurring at the workplaces afterwards.
yes. I've observed hardware get faster and faster for 30+ years and I've watched software consume all of that headroom the entire time, for no clear reason other than the way we write software is just getting worse and worse and worse.
Not only time to market, but also maintainability.
In non-performance-critical areas, it's pretty important that when the original dev team leaves, new hires can still fix bugs and add features without breaking things.
> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something
It's because of this mentality that almost all desktop software nowadays is bloated garbage that needs 2GB of RAM and a 5Ghz CPU to perform even the most basic task that could be done with 1/100th of the resources 20 years ago.
No, it's not because of this mentality. Remember the timeless quote:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
Yet we should not pass up our opportunities in that critical 3%."
Using Electron and bloated frameworks that "abstract" things away is the biggest problem for most modern software, not the fact the developers aren't optimizing 10 cycles from the CPU away. It's a fundamental issue of how the software is made and not what the code is. If you need to run a whole web browser for your application, you already lost, there is no optimization that you can do there.
I think the sibling comment here in this thread shows what some developers think. Electron is not reasonable. "Developers" using Electron should be punished by CBT and chinese torture.
Okay maybe that's too much. But I suggest a new quote:
"We should forget about Electron, say about 100% of the time: electron is the root of all evil.
Yet we should not pass up our opportunities in rewriting everything in Rust."
RAM and GPU are cheap. Most users aren't going to notice. Meanwhile by choosing Electron, the developers were able to roll out the app on Windows, Mac, and Linux at nearly zero marginal cost per additional platform.
But somehow modern software (Outlook, I’m looking at you) has trouble keeping up at my typing speed, and there’s a visible delay before characters appear on the screen. It doesn’t matter what the software does 99.9% of the time, if it’s an utter pig that crucial 0.1% of the time when the user is providing input.
An oft recited of thumb: Make it work, make it pretty, make it fast - in that order. That is, performance bottlenecks are easier to find and fix if your code is clean to begin with.
I sometimes wish performance was an issue in the projects I work with, but if it is, it's on a higher level / architectural level - things like a point-and-click API gateway performing many separate queries to the SAP server in a loop with no short-circuiting mechanism. That's billions of lines of code being executed (I'm guessing) but the performance bottleneck is in how some consultant clicked some things together.
Other than school assignments, I've never had a situation where I ran into performance issues. I've had plenty of situations where I had to deal with poorly written ("unclean") code and spent extra brain cycles trying to make sense of it though.
> That is, performance bottlenecks are easier to find and fix if your code is clean to begin with
That is not at all what "make it work, make it pretty, make it fast" is about. That saying is about prioritization. Making it fast doesn't mean anything if it doesn't work.
However, if you are doing performance-sensitive work then this is a very bad strategy. You need to design a performant architecture up front otherwise you'll likely have orders of magnitude worse performance, even after optimizing your code.
Ex: if your "make it work" design has shared mutable state, you're going to have a bad time when you want to scale that horizontally and unlock 100x better throughput/performance.
Here is the thing: The paradigms of your initial design, or if you're in a better situation, the first refactor, is likely to stay. If performance is what you think about last, you choose different paradigms (like e.g. polymorphism). From my experience, a winning strategy is to think about things that scale already in the initial design, and what scales in a backend-system is often the amount of data you want to pipe through it. Thus, keeping it data centric (primitive types and arrays for raw data), while using polymorphism and functional interfaces to keep the methodologies clean, is usually a good idea.
So, most code that I write tends to have dead simple data types, but combines it with classes (and subclasses) that represent methods (strategies) on how to retrieve, transform, present and store the data. The 'make it work' phase may do this in a simple script, but the actual data model tends to stay the same.
I think that I and all non-technical folks around me experience issues with applications performance daily. I think that sometimes "making it work" should include some particular performance metrics. If it is not fast enough it doesn't really work. Now "fast enough" is something to be defined and different depending on the application part.
Far too often I see applications that assume low latency and unbreakable Internet connection. They seem to do almost no caching at all. For example thumbnails.
Also many of applications will be almost unusable (or trigger OOM) when you try to work with a big file. Sometimes a big file has merely tens of MB, sometimes problems start with a 3MB file. Those are the issues that occur without thinking about performance from the start - memory is free, you can copy things around, everything will fit in RAM.
One more thing. When your application consists of a client and a server it may turn out that you will put yourself in a corner when not thinking about performance early on. Everything will work without any troubles at first and then it turns out there are some latency issues with more data and you can't easily upgrade the client for example. Or you had an architecture that allows to spin up more servers and handle the load closer to the client, but it can cut your margins.
Sure I leave apps sitting around waiting for my input quite often. But then when I do start inputting stuff, they lock up, fail to respond to my inputs, show me loading screens, blank screens, delayed responses, and otherwise waste my time. Pretty silly given how much money I pay for the hardware.
> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something. If you're writing a AAA video game, or high performance calculation software then sure, go crazy, get those improvements.
Value of time is disproportionately weighted by user attention, which is at its highest right around when user input is happening.
>Look, most modern software is spending 99.9% of the time waiting for user input
That's fine, that's as it should be and isn't an interesting metric. The computer should wait for me, not the other way around. Needlessly waiting for the computer is a sign of s%&t software and not as uncommon as we'd like, huh?
Did you read the title? He's addressing performance.
And you're saying for many developers, performance is not the biggest priority.
That's fine. But it doesn't make him even 0.000001% wrong. And he's not applying anything to any niche situations. You just missed his point. Performance.
In my experience, software has been getting slower faster than hardware has been getting faster. Bringing focus back to performance would be a welcome improvement.
Like everything else in the software industry the context does matter: at larger scales small gains in performance translate to large savings in costs (infrastructure, maintenance, etc.).
Also, "clean code" (as in from the "Clean Code" book) is generally not good advice for most programs anyway. Not only does it eat performance, it's not all that great for building maintainable, extensible systems.
What does it matter if it does this 0.1% ten times slower than it could? Then user will have to wait for the software which slows the most expensive component of the whole work setup, the human.
if the software ever takes more than 10ms to do something it is stealing time from the human. the human is the slowest part of the system so nothing else should ever make the slowest part even slower.
I think we absolutely should care, because when the devices do something, the user still expects software to be fast.
So if it is not fast enough, he buys a new device, because this is what he can do. He can't just buy the software rewritten. Usually
>most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something
Bad mindset.
A GFCI breaker spends 99.99999% of the time waiting with zero leakage current. Yet, when it does detect leakage current, you want the breaker to trip as quickly as possible.
See where I'm going?
Imagine if it would take 5 seconds from flicking a light switch until the lights actually turn on. Because the switch is waiting for user input 99.99% of the time, right? Would you install that light switch in your home?
Except when it isn't. I remember an article making rounds the other day, that claimed the whole "most software spend most time waiting on I/O" common wisdom is no longer true, as most software these days is CPU-bound, and a good chunk of that is parsing JSON.
Eh, that heavily depends on language and dataset you're working with. I've seen "simple" data with some fat thing like RoR on top of it having 10x the latency of the underlying database after all the ORMing.
I’d like to work in one of these teams/products where the database is the bottleneck. Basically everywhere I’ve worked, I’ve had to work with backends that have been foot-draggingly slow compared to the actual database.
I think there is a relatively comfortable middle ground. High quality, readable and performant code are not mutually exclusive optimization quantities - though they will start to compete against each other in the extreme. Often, O(WTF) algorithms are complicated, full of useless fluff and hard to understand - occasionally attempting to follow Uncle Bob's unresearched and unexamined ideas.
Honestly in many cases you can have your cake and eat it too if you just write in a functional or data-flow style rather than a rigorous OO style. Since this is C++, using std::algorithm or another algorithms library would let you abstract your implementation details while relying on the compiler's ability to optimise/vectorise/inline code as needed.
This applies doubly so if you can rely on templates & structural typing to push your polymorphism to compile time. clang & gcc are surprisingly good at optimisation as long when you don't have to bounce off a vtable and code is clean / avoids "manual optimisation".
Also while I'm not saying I don't believe the author here, I wish they would have used https://quick-bench.com/ or https://godbolt.org/ so that readers could trivially verify the results & methodology.
I don’t think the percentage of time waiting for input has anything to do with this. Outside of video games, the way most people will see performance problems is in the latency of their UI interactions. You press a button and want to see the result as fast as possible.
In other words, the user’s entire perception of your program’s performance falls into that 0.1%.
Performance is not an absolute. At the end of the day it is about user experience. From a computer science point of view, we can measure memory and CPU usage, but if the users haven't been complaining then what problems are you actually solving (at least from a product POV)?
Performance for performance sake is an interesting and appealing challenge to us engineers. I was writing C code in the 90s and I miss being that close to the hardware, trying to spare every clock cycle that I could while working with machines that had sparse resources.
But today I'm building SaaS products for millions of simultaneous active users. When customers complain about performance it is often not what us engineers think of as "performance." They're NOT saying things like "Your app is eating all memory on my phone" or "the rendering of this table looks choppy." It's usually issues related to server-side replication lag causing data inconsistencies or in some cases network timeouts due to slow responding services.
The point is the age old advice that we were giving aspiring game programmers back in the 90s:
Figure out and understand your priorities.
The famous inverse square root function in the Quake III Arena source code is a great example. If memory serves me, they needed this calculation as part of their particle physics engine. The problem is that calculating inverse square roots precisely is very expensive, especially at the scale they were required to. So they exploited how 32-bit floating point numbers are represented in binary in order to do a fast, good enough approximation. This is a good example of a targeted, purposeful optimization.
Back in the 90s we were obsessed over getting the most out of our hardware, especially when coding games. So we picked up all sorts of performance hacks and optimizations and learned how to code in assembly so that we could get even closer to the bare metal if we needed to. The result was impossible to understand and maintain code and so experienced engineers taught us young'uns:
Write clean code first, then profile to understand what your bottlenecks are, then to make TARGETED optimizations aimed at solving performance issues in order of priority.
That priority always being driven by user experience and/or scalability requirements.
Anything else is premature optimization. You're speculating about where your performance bottlenecks are, and you're throwing out maintainable code for speculative reasons; not actually knowing how much of an impact your optimizations are going to have on user experience.
Clean code (OOP, DRY, etc) is optimized for maintainability and extensibility, not necessarily performance.
In fact, I think it’s pretty well understood that clean code is a tradeoff wrt performance, at least that’s the way I’ve always understood it.
Clean code works well for something like a web app that’ll need to be maintained by scores of different engineers over many years or decades.
At least that’s the theory. In practice, at least some level of abstraction makes it a bit easier to rip and replace parts of the app without a total rewrite.
While I agree with Casey, for some situations it's hard to do. You can't really develop web app in C# and Java in a simple way. Not only you have to fight the language design, but all framework and libraries are written with OOP, clean code and design patterns in mind.
So you might write your code in a straight way, careful to not lose efficiency and then you are going to call slow code.
So his advices are easier to follow on some scenarios than others.
There is speed and there's the perception of speed. Some code (games) has to run fast. But most of the code we work on has to only have the perception of speed. If you're loading all of your resources and making the user stare at a twirly, you're doing it wrong.
What if you have a growing team of N people, and they need to be able to add shape subclasses / new shape logic all the time? As with anything this is a tradeoff. Often having decoupled code is more important than raw performance.
There is. The first one is that the code with the switch pattern can only process simple shapes. Let's say you want to get the area of a donut now. Well, you need to change the whole code to compute the area of the union. Or imagine that there are other places in the code that need to know the area of a single shape. Do they need to copy/paste the same code?
dude then why does most modern software feel like shit even on high performance systems?
Well written video games, the kind of thing Casey works on usually, beat the hell out of basically any other category of software in terms of user responsiveness. At least of all the software I use regularly.
Yeah, the article might be technically correct but is ultimately pointless. In almost every software engineering environment the priority is always going to be writing readable and composable code over something that runs 5 microseconds faster. All of your clever efficiency gains are anyways going to be wiped out by a single database call.
Not just general advice, but advice that is meant to be applied to TDD specifically. The principals of clean code are meant to help with certain challenges that arise out of TDD. It is often going to seem strange and out of place if you have rejected TDD. Remember, clean code comes from Robert C. Martin, who is a member of the XP/Agile gang.
The author cherry-picking one small piece out of what Uncle Bob promotes and criticizing it in isolation without even mentioning why it is suggested with respect to the bigger picture seemed disingenuous. It does highlight that testable code may not be the most performant code, but was anyone believing otherwise? There are always tradeoffs to be made.
This thread is, predictably, another demonstration of conflating optimisation with being aware of performance.
The presented transformation of code away from “clean” code had nothing to do with optimisation. In fact, it made the code more readable IMO. Then it demonstrated that most of those “clean” code commandments are detrimental to performance. So obviously when people saw the word “performance”, they immediately jumped “omg, you’re optimising, stop immediately!”
Another irritating reaction here is the straw man of optimising every last instruction: the course so far has been about demonstrating how much performance there even is on the table with reasonable code, to build up an intuition about what orders of magnitude are even possible. Casey repeated several times that what level of performance is right for your situation will depend on you and your situation. But you should be aware of what’s possible, about the multipliers you get from all those decisions.
And of course people bring up profilers: no profiler will tell you whether a function is optimal or not — only what portion of runtime is spent where. And if all your life you’ve been programming in Python, then your intuition about performance often is on the level of “well I guess in C it could be 5-10 times faster; I’ll focus on something else”, which always comes up in response to complaints about Python. Not even close.
Agreed, but most people here haven't paid for the rest of the course, and Casey sometimes forgot that here he's addressing a wider audience.
For instance he fails to explain why he didn't bother addressing the tail of his unrolled loop. He does in the course, but here he's just assuming it's irrelevant, and doesn't address again potential criticism like "he doesn't even bother to write correct code, look at that lazy unrolling!".
Despite being based on science, software development is full of bs, opinions and lacks measurable metrics. How "clean" this code is? In numbers between 0 and 1 for example. What's the metric there? Does using polymorphism automatically make your code clean? We cannot reliably reason about even single responsibility b/c there's no real measure how to count responsibilities of a function. In such la-la-land some desperately seek any formal measure, and there's one - performance - you can actually put a number on a piece of code. But you cannot do this for code "cleanliness". So some people prefer to ignore unmensurable trait and replace it with measurable
Just because something is hard to measure, doesn't mean it's not important to do - I think there's even a fallacy for that?
I think it's also important to separate computer science from software engineering. Algorithms and data structures can be reasoned about mathematically, but what about agile VS waterfall or functional VS oop?
Maybe software engineering relates more to philosophy than math. There are theories and some are more sound than others, but there is no objective truth. Most of us agree that it's important for code to be readable, but we still don't have the answer for what's the best readable code. Even first principle such as DRY are challenged, eg by WET.
I recall vaguely that there was a situation where there were two schools of thought with different approaches. One was about hacking away things, the other about having correct programs. Maybe Berkeley vs MIT? Such different opinions at the basic level of a discipline are more likely in philosophy than math, I think.
>what level of performance is right for your situation will depend on you and your situation
A lot of my performance and code quality chops came from projects where the team was comfortable with the performance of the system but the business was not. They wanted to stop when it was right for them but not right for the situation. It ended up negatively affecting my opinion of them because ultimately I started to see it as deflecting. It's fine because I don't know what else to do, not because this is the best that can be done.
So he puts polymorphic function calls into enormous loops to simulate a heavy load with a huge amount of data to conclude "we have 20x loss in performance everywhere"? He is either a huge troll or he has a typical fallacy of premature optimization: if we would call this virtual method 1 billion times we will lose hours per day, but if we optimize it will take less than a second! The real situation: a virtual method is called only a few hundred times and is barely visible in profiling tools.
No one is working with a huge amount of data in big loops using virtual methods to take every element out of a huge dataset like he is showing. That's a false pre-position he is trying to debunk. Polymorphic classes/structs are used to represent some business logic of applications or structured data with a few hundred such objects that keep some states and a small amount of other data so they are never involved in intensive computations as he shows. In real projects, such "horrible" polymorphic calls never pop up under profiling and usually occupy a fraction of a percent overall.
> The real situation: a virtual method is called only a few hundred times and is barely visible in profiling tools.
The reality is that the entire Java ecosystem revolves around call stacks hundreds of calls deep where most (if not all) of those are virtual calls through an interface.
Even in web server scenarios where the user might be "5 milliseconds away", I've seen these overheads add up to the point where it is noticeable.
ASP.NET Core for example has been optimised recently to go the opposite route of not using complex nested call paths in the core of the system and has seen dramatic speedups.
For crying out loud, I've seen Java web servers requiring 100% CPU time across 16 cores for half an hour to start up! HALF AN HOUR!
I bet that half an hour startup is not because of nested calls at all. I worked with a ton of Java code and if something is slow, it is usually shitty I/O related code or some algorithmic stupidity, not because of virtual calls and what not.
> So he puts polymorphic function calls into enormous loops to simulate a heavy load with a huge amount of data to conclude "we have 20x loss in performance everywhere"?
You're mistaken, the load size has nothing to do with the end result. The result is normalized to give an estimate of how much faster the simple code is than the polymorphic code irregardless of input size. (Kinda like deaths per 100k instead of giving an absolute number of deaths for statistics about diseases).
So yes, your code is running 20x slower than it should be all the time.
Especially when you make every class an interface, with... get this, one implementation! This is based on real world experience and is not a joke. There are real companies with real people that write real code where every single class is an interface with exactly one implementation. Which, as Casey has shown, results in upwards of a 20x slowdown in the worst case.
Obviously, you probably won't get a 20x speedup by getting rid of the polymorphic garbage. But it's equally asinine to assume that polymorphic functions are only called a few hundred times. I guarantee you your PC is making millions of polymorphic function calls per minute between: the OS, the browser, windows Anti-Malware scanner, steam running in the background, oracle running its checks to remind you to update Java, etc. There are hundreds of processes running all the time on a modern device, these devices are wasting enormous amounts of resources.
> Especially when you make every class an interface, with... get this, one implementation! This is based on real world experience and is not a joke.
And when you run this through a profiler, you will not notice how slow your code is, because everything is slow. Slowness is infused throughout the whole system.
Just because you haven't been exposed this issue doesn't mean it doesn't exist. "the real situation", "no one", "in real projects", "never pop up"...give me a break lol.
One can reasonably well guess/know the expected input sizes to their programs. You ain’t (hopefully) loading your whole database into memory, and unless you are writing a simulation/game engine or another specialized application, your application is unlikely to have a single scorching hot loop, that’s just not how most programs look like. If it is, then you should design for it, which may even mean changing programming languages for that part (e.g. for video codecs not even C et al. cut it, you have to do assembly), but more likely you just use a bit less ergonomic primitive of your language.
Occasional CPU architect here .... probably the worst thing you can do in your code is to load something from memory (the core of method dispatch) and then jump to it, it sort of breaks many of the things we do to optimise our hardware - it causes CPU stalls, branch prediction failures, etc etc
There is one thing worse you can do (and I caught a C++ compiler doing it when we were profiling code while building an x86 clone years ago) instead of loading the address and jumping to it push the address then return to it, that not only breaks pipelines but also return stack optimisations
Yeah. I opened discord earlier, and it took about 10 seconds to open. My CPU is an apple M1, running about 3ghz per core. Assuming its single threaded (it wasn't), discord is taking about 30 billion cycles to open. (Or around 50 network round-trips at a 200ms ping).
As in like a micro service? Ahahaha. Our CTO just pushed for microservices everywhere and we're not even that far along and we're chasing all kinds of performance problems. Insanity.
Just a quick nudge back on this: people in DSP would disagree with the assertion that nobody is going over big loops and using a virtual method on each element. We often have to process at least 88k elements per second in real time, through many many different processes. If any of those processes are defined using factories that spit out classes with polymorphic inheritance and virtual functions it certainly becomes an issue.
As a result some styles of writing code just don’t work for the audio thread at all, and we’d have to simply avoid or rewrite libraries written this way.
There are just some domains where standard practice for cleanliness is different because of your constraints.
I mean, it’s to the point we’ve got die hards in this industry who insist on putting all functions inlined in headers (not that I agree!)
I agree with your sentiment. But those things exist (not that that validates the authors argument) and I still shake in terror when during covid I was asked to take a look at a virus spread simulation (cellular automaton) that was written by a university professor and his postdoc team for software engineering at a large university that modeled evey cell in a 100k x 100k grid as a class which used virtual methods for every computation between cells. Rewrote that in Cuda and normal buffers/ arrays.. and an epoch ran in milliseconds instead of hours.
In all fairness to them, "simulating many stuff interacting with each other" is the poster child of OO. It's just, that, well, it's not how CPU works.
Then again, at some point we had "Lisp machines", maybe some day there will be a computer architecture where memory / computations patterns are adapted to massive simulation - rather than shoehorning on existing architecture.
And those will fail just as miserably as Lisp machines.
The same problems happen when you have 1000 requests being dealt with simultaneously each working on small collections. Web Servers for real businesses do not sit idle, they churn at high % and reducing CPU load on them lets you save money, and/or improve latency for users, which can make you money.
So go on all of you, write everything in Python with 90 levels of indirection, my stock will go up.
Reminds me of a joke where programmer optimized most frequently used method in imgur clone from 1s to 0.01s, because customer complained UI was slow to respond.
Congratulations. Taps on the back, champagne all around. Customers call. Same complaint.
Programmer asks "Well, did something change at least?". "Loading bar now flickers more", answers customer.
There is no doubting Casey's chops when he talks about performance, but as someone who has spent many hours watching (and enjoying!) his videos, as he stares puzzled at compiler errors, scrolls up and down endlessly at code he no longer remembers writing, and then - when it finally does compile - immediately has to dig into the debugger to work out something else that's gone wrong, I suspect the real answer to programmer happiness is somewhere in the middle.
When working with a larger code base, there will always be parts that you don't remember writing and you'll inevitably have to read the code to understand it. That's just part of the job/task, regardless of the style it's written in.
In shared code particularly with a culture of refactoring, there's no guarantee that the function call you see is doing what you remember it doing a year ago.
When I was coming up I got gifted a bunch of modules at several jobs because the original writer couldn't be arsed to keep up with the many incremental changes I'd been making. They had a mentality that code was meant to be memorized instead of explored, and I was just beginning to understand code exploration from a writer's perspective. So they were both on the wrong side of history and the wrong side of me. Fuck it, if you want it so much, kid, it's yours now. Good luck.
Of course any code requires some refresher at time, but the difficulty and time required to figure it out again is a spectrum that goes all the way down to the seventh circle of hell.
> There is no doubting Casey's chops when he talks about performance,
Remember that everyone has their blind spots!
I follow Casey on twitter, and a couple years ago there was a weird thread where he had hung his browser for 4-5 seconds by running some JS to assign CSS rules to ~50K div elements. And Casey was a million percent confident that the hang was due to JS being slow, and had nothing to do with CSS or DOM rendering.
If you're talking about Handmade Hero, the real answer to programmer happiness is not using a language you despise and refusing to leverage the features of, not refusing to use libraries in that language or frameworks, not re-implementing everything from first principles, and to actually have your game designed first (not designing while you code.)
Casey is a bad example of a game designer and he'll be the first to admit it. However, it is worth noting that Jonathan Blow very much does design while he codes and recommends the practice. He also generally abstains from library dependencies and implements a lot of thing himself.
Of course, part of the point of Handmade Hero is to show that you can totally reimplement everything from first principles. Libraries are not magical black boxes, they're code written by human beings like you or me, and you can understand what they're doing.
For instance, he wrote his own PNG decoder[0] live on stream, with hardly any prior knowledge of the spec, even though I'm confident that under normal circumstances he'd just use stb_image. I'm sure he did this just to show how you'd go about doing that sort of thing.
[0] He only implemented the parts necessary to load a non-progressive 24bit color image, but that still involved writing his own DEFLATE implementation.
I've worked in projects where no one seemed to know SQL, where massive speed improvements were made by fixing very low hanging fruits like removing select * queries, adding naive indexes, removing N+1 queries etc.
Likewise, I've worked in code bases where performance had been dreadful, yet there were no obvious bottlenecks. Little by little, replacing iterators with loops, objects/closures with enum-backed structs/tables, early exits and so on accumulating to the point where speed ups ranged from 2X to 50X without changing algorithms (outside of fixing basic mistakes like not pre allocating vectors).
Always fun to see these videos. I highly recommend his `Performance Aware Programming` course linked in the description. It's concise and to the point, which is a nice break from his more casual videos/streams which tend to be long-winded/ranty.
Just taking the little bit of time to think about what the computer needs to do and making a reasonable effort to not do unnecessary stuff goes a long way. That 2x-50x factor is in fact very familiar. That’s something loading in a second rather than in a minute, or something feeling snappy instead of slightly laggy.
And it matters much more than people say it does. The “premature optimisation...” quote has been grossly misused to a degree that it’s almost comical. It’s not a good excuse for being careless.
One thing I find frustrating in game development, is we often put off optimization until the end of the project when we know we are happy with the game. But that means _I_ have to live with a slow code base for 2 years.
It takes 19 seconds for the main menu to load when you push play on our current game in Unity. It's killing me.
Meanwhile in my lua side project, its less than a second.
> I've worked in projects where no one seemed to know SQL, where massive speed improvements were made by fixing very low hanging fruits like removing select * queries, adding naive indexes, removing N+1 queries etc.
Can you recommend any SQL book with main focus on performance improvements like this?
I've learnt as I've needed it, so I'm afraid I don't have a single source to point at. Of the things that I've listed, a quick google search on each should give you enough info to be useful.
> Select * queries
Sometimes you only want two columns, but you ask for 5. Say you query a million rows, where you ask for but throw away 60% of all the data you get back.
> Naive indexes
As in, just slapping an index on a table that doesn't have one makes such a big difference that sometimes it's all you need.
> N+1 queries
This is more of a problem in ORMs, but any time you call the database N times instead of 1 time. A classic example is writing a for loop that asks for one row at a time, instead of asking for all rows once.
If some field is referenced in WHERE clause, add an index for it.
If there are a few fields referenced in a single WHERE add a single index that includes all of them.
If you have index that has a, b, c then it is as if you also had indexes a, b and a.
If condition in WHERE is = put this field at the beginning of an index. If it's < (or similar) put it at the end. You'll get best results if you have none or only one < in your query.
This guy is so dogmatic about it it hurts. I would argue that clean code is a spectrum from how flexible vs how rigid you want your abstractions to be. If your abstractions are too flexible for good performance, dial them back when you see the issue. If your abstractions are too rigid for your software to be extendable, then introduce indirection.
We can all write code that glues a very fixed set of things end to end and squeeze every last CPU cycle of performance out of it, but as we all know, software requirements change, and things like polymorphism allow for much better composition of functionality.
Casey is a bit of a hardcore crusader on the topic, but I'd hardly call dogmatic someone who can provide you evidence and measurements backing their thesis.
The tests he put together here are hardly something I'd call a straw-man argument, they seem like reasonable simplification of real-cases.
Evidence in a micro benchmark of a single page of code.
The focus on performance here ignores the fact that most programs are large systems of many things that interact with each other. That is where good design and abstractions and “clean code” can really help.
Like all things it is about finding a balance and applying the right techniques to the right parts of a larger system.
Performance measurements are only one dimension of code quality. Having a laser focus on it disregards why you would want to sacrifice performance for a different dimension of code quality, such as extensibility for different requirements.
You should check if your code is in the hot path before optimizing, because the more you couple things together the harder it is to change it around. For instance, in Casey's example, if you wanted to add a polygon shape but you've optimized calculating area into multiplying height x width by a coefficient, that requires a significant refactor. If you are sure you don't need polygons, that's a perfectly fine optimization. But if you do, you need to start deoptimizing.
> they seem like reasonable simplification of real-cases.
Paraphrasing Russ Ackoff, doing the right thing and doing a thing right is the difference between wisdom or effectiveness and efficiency. What Casey is doing here may be efficient, but calculating a billion rectangles doesn't present a realistic or general use case.
"Clean Code" or any paradigm of the sort aims to make qualitative, not quantitative improvements to code. What you gain isn't performance but clarity when you build large systems, reduction in errors, reduce complexity, and so on. Nobody denies that you can make a program faster by manually squeezing performance out if it. But that isn't the only thing that matters, even if it's something you can easily benchmark.
Looking at a tiny code example tells you very little about the consequences of programming like this. If we program with disregard for the system in favour of performance of one of its parts, what does that mean three years down the line, in a codebase with millions of lines of code?`That's just one question.
Unfortunately, everything in our profession is a tradeoff. Faster and maintainable are two of the many quality metrics you can optimize for that will be at odds at times. What the right balance is for a given piece of code depends on so much context. It's a hard balance to get right.
These examples are absolutely a strawman. He's imagining there's one specific access pattern that's executed thousands of times per second. In a realistic codebase you're accessing the data less often but in multiple different (often subtly so!) ways. Cache efficiency is everything for modern CPUs, so you can't "simplify" the access patterns without making your benchmarks unrepresentative.
Yes... but I'm also seeing dogmatism from the opposing camps here in the comments section.
The reality is that how flexible your interfaces and abstractions are and their design has to be a part of your original design considerations when building something. It's a bad move to just hand wave away performance concerns because you religiously adhere to some design patterns. It's also a bad move to drop down to using intrinsics for everything from the get-go and thinking you know better than the compiler when it's a codepath that isn't even computationally expensive or a bottleneck a priori.
I think part of the problem with supposed "clean" code is that it tends to be a matter of opinion. Is the polymorphic version cleaner than the switch statement version? I would argue the latter is actually easier to read. There's no real reason to think "clean" code is actually clean other than anecdotes and that someone wrote it in a book, but the performance is something that can be objectively measured.
The "Clean Code" that Casey is talking about is a book and a code philosophy that was explained in depth in talks and trainings and seminars, so I would disagree that it is a matter of opinion.
I think it really truly depends. I think it's always good to do the minimal viable thing first instead of being an architecture astronaut, but if you've been asked for three (random ballpark number) different implementations for the same requirement it might be time to start adding some indirection.
The best idea in clean code is to stop coupling domain models to implementation details like databases/the web/etc. Once you grok that, then you're in a better position to work on eliminating unnecessary coupling within the model itself.
There's lots of ways to do this poorly and well. There's no process for it. That's a feature. I feel like a lot of the flak clean code gets boils down to, "I followed it dogmatically and look what it made me do!" It didn't make you do anything; it's trying to teach you aesthetics, not a process. Internalize the aesthetics and you won't need a rigid process.
Obviously when you do this you probably need more code than you'd normally write. That can be viewed as a maintenance burden in some situations, esp. when you don't have product market fit. Again, this shows that treating clean code like some process that always produces better code in every situation is extremely naive.
There was a moment in grade school where I was sat down and it was explained to me that you don't have to take a test in order. You can skip around if you want to, and I ran so far with that notion that at 25 I probably should have written a book on how to take tests, while I could still remember most of it.
One of the few other "lightning bolt out of the blue" experiences I can recall was realizing that some code constructs are easier for both the human and the compiler to understand. You can by sympathetic to both instead of compromising. They both have a fundamental problem around how many concepts they can juggle at the exact same time. For any given commit message or PR you can adjust your reasoning for whichever stick the reviewer has up their butt about justifying code changes.
you have Japan infrastructure, and you have Turkey infrastructure
6.1 quake in Japan = nothing destroyed
6.1 quake in Turkey = everything collapses
The engineers in Turkey probably didn't value performance and efficiency
It's the same for developers, you choose your camp wisely, otherwise people will complain at you if they can no longer bear your choice
You act like innocent, but your code choice translate to a cost (higher server bill for your company, higher energy bill for your customers/users, time wasted for everyone, depleting rare materials at a faster rate, growing tech junk)
Selfishness is high in the software industry
We are lucky it's not the same for the HW industry, but it's getting hard for them to hide your incompetence, as more things now run on a battery, and the battery tech is kinda struggling
Good thing is they get to sell more HW since the CPU is "becoming slower" lol
So we now got smartwatches that one need to recharge every damn day
The shapes example is pretty contrived so I don't really have an opinion on it either way. But imagine you have something like a File interface and you have implementations of it e.g. DiskFile, NetworkFile, etc., and you anticipate other implementors. Why would you do anything other than have a polymorphic interface?
I don't like most of these "principles", as anyone can verify by looking at my previous comments, but this article is cherry-picking to its utmost level of unfairness.
These "clean code" principles should not, and generally are not, ever used at performance critical code, in particular computer graphics. I've never seen anyone seriously try to write computer graphics while "keeping functions small" and "not mixing levels of abstraction". We can go further: you won't be going anywhere in computer graphics by trying to "write pure functions" or "avoiding side effects".
These "clean code principles" are, however, rather useful for large, corporate systems, with loads of business process rules maintained by different teams of people, working for multiple third parties with poor job retaining. You don't need to think about vector performance for processing credit card payments. You don't need to think about input latency for batch processing data warehouse jobs, but you need this types of applications to work reliably. Way more reliably than a videogame or a streaming service.
Right tools for the right jobs, people need to stop trying to hammer everything into the same tools. This is not only a bad practice in software, it's a bad practice in life, the search for an ever elusive silver bullet, a panacea, a miracle. Just drop it and get real.
> you won't be going anywhere in computer graphics by trying to "write pure functions" or "avoiding side effects"
Not sure about this; in my experience (in a different domain, audio processing) you totally can get away with both of these a lot of the time.
Function inclining works well, so you can write small pure functions in a lot of cases (especially if you accept a function that reads from one buffer and writes to another as pure).
As for avoiding side effects, this is normally more about keeping your state updates small and localised (allowing more parts to be pure), which is often not a problem performance-wise.
IME it's much easier to improve the performance of a piece of code which is easy to reason about and change with some level of confidence that your optimisation will not break things.
I know there's some unavoidable global state in computer graphics, but presumably there is lots of code that doesn't directly touch that.
Exactly! I work with a lot of high-performance code and also a lot of non-high-performance code (think all the plumbing around the core computation) and I definitely use a lot of "clean code patterns" in the non-performance-critical parts. They're the ones that tend to change more, that more people touch, that get done faster... It's just about knowing what to use and when.
> These "clean code" principles should not, and generally are not, ever used at performance critical code, in particular computer graphics.
I agree that this is mostly true, but maybe not for beginners in the field
When I was reading "Raytracing in One Weekend" (known as _the_ introductory literature on the topic), I was very surprised to see that the author designed the code so objects extend a `Hittable` class and the critical ray-intersection function `hit` is dynamically dispatched through the `virtual` keyword and thus suffers a huge performance penalty
This is the hottest code path in the program, and a ray-tracer is certainly performance critical, but the author is instructing students/readers to use this "clean code" principle and it drastically slows down the program.
So I agree most computer graphics programmers aren't writing "clean code", but I think a lot of new programmers are being taught them because of introductory literature
Yeah, but why is good and fast opposite rather than orthogonal? Why languages and compilers are not built so at least we can do both. Or even better, why is good and fast not the same thing in computer languages?
But I'd point that languages and compilers are built so we can have both.
The problem is some definitions of good really aren't, and it isn't just because they're slow, it's because they make some things "good" at the detriment of others. Uncle Bob's Clean Code is really good at making some portions of the code "good" and "simple", but when put together they are interconnected in ways that are more difficult to understand.
Yes and no, I think that what can happen if you only focus on "performance critical code" is the rest of your code is slow and incredibly unfriendly to cache, but there's absolutely nothing you can really do about it past a point. Like, if all your code is 5x slower than it possibly could be, then even after you fix the performance critical bits you have this ugly tax everywhere else that you can't do anything about other than a rewrite. And I do think that matters, look at various projects written in a language like python where bits have been written in C, but you still have all this slow interpreted stuff. (I like Python btw, I'm just making a point that if performance matters it doesn't just matter in loops)
I don’t understand why there is still the false dichotomy between performance and speed of development/readability. Arguments on HN and in other software circles suggest performant code cannot be well organized, and that well organized code cannot be performant. That’s false.
In my experience, writing the code with readability, ease of maintenance, and performance all in mind gets you 90% of each of the benefits you’d have gotten focusing on only one of the above. For instance, maybe instead of pretending that an O(n^2) algorithm is any “cleaner” than an O(n log n) algorithm because it was easier for you to write, maybe just use the better algorithm. Or, instead of pretending Python is more readable or easier to develop in than Rust (assuming developers are skilled in both), just write it in Rust. Or, instead of pretending that you had to write raw assembly to eke out the last drop of performance in your function, maybe target the giant mess elsewhere in your application where 80% of the time is spent.
A lot of the “clean” vs “fast” argument is, as I’ve said above, pretending. People on both sides pretend you cannot have both, ever, when in actuality you can have almost all of what is desired in 95% of cases.
I'd even go so far as to say that "clean" code is a requirement for performance optimization. For a loose definition of clean.
Code that is unreadable, tightly coupled, untestable or just messy is much, much harder to work in than code that is readable, loosely coupled, well-tested and clean. This has been proven often and is really a no-brainer. Performance-optimizing is finding the bottleneck, then rewriting that without changing the functional behaviour. For this you need resp. readability (to find bottlenecks you must be able to understand flow and code), ability to rewrite (tightly coupled code cannot be rewritten in isolation) and insurance the behaviour doesn't change (test coverage).
Ergo: a clean archictecture is a requirement to make code more performant in the first place. Even if that architecture is bad for performance in itself, it enables future improvements.
I find it funny that people keep throwing non-caps "clean" code here in this post without knowing the context, you now have "clean architecture" too, and it made it more difficult to know what you're talking about.
Clean Code is actually a book by Uncle Bob. And Clean Architecture is the name of another book by him.
What Casey is criticizing isn't "good code". He's criticizing Uncle Bob's philosophy.
> So by violating the first rule of clean code — which is one of its central tenants — we are able to drop from 35 cycles per shape to 24 cycles per shape
Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something. If you're writing a AAA video game, or high performance calculation software then sure, go crazy, get those improvements.
But most of us aren't doing that. Most developers are doing work where the biggest problem is adding the next umpteenth features that Product has planned (but hasn't told us about yet). Clean code optimizes for improving time-to-market for those features, and not for the CPU doing less work.
- Slow DB queries
- Lack of concurrency/parallelism
- Lack of caching/memoization for some expensive thing that could be cached
- Excessive serialization/deserialization (things like ORMs that create massive in memory objects)
- GC tuning/not enough memory
- Programmer doing something dumb, like using an array when they should be using a set (and then doing a huge number of membership checks)
With that being said, I have worked on the odd performance optimization where we had to get quite low level. For example, when working on vehicle routing problems, they’re super computationally heavy, need to be optimized like crazy and the hot spots can indeed involve pretty low level optimizations. But it’s been rare in the work I’ve done.
This article is probably meaningful for people who work on databases, games, OSes, etc., but for most devs/apps these tips will yield zero noticeable performance improvements. Just write code in a way you find clean/maintainable/readable, and when you have perf issues, profile them and ship the appropriate fix.
I think the core problem here is that he assumes that everything is inside a tight loop, because in a game engine that's rendering 60+ times a second (and probably running physics etc at a higher rate than that) that's almost always true.
Also the fact that his example of what "everyone" supposedly calls "clean code" looks like some contrived textbook example from 20 years ago strains his credibility.
Edit: come to think of it, the only person I know of who actually uses the phrase "clean code" as if it's some kind of concrete thing with actual rules is Uncle Bob. Is Casey assuming the entire commercial software industry === Uncle Bob? It's like he talked to one enterprise java dev like 10 years ago and based his opinion of the entire industry on them.
Now, your argument seems to be: in the real world, there's so much waste, that virtual function calls pale in comparison.
This does not debunk his main point, which seems to me at least the following: all things being equal, writing code with virtual functions that do a tiny amount of work and "hiding implementation details" makes performance worse, sometimes by an order of magnitude.
Now, there maybe situations where you _have_ to use virtual functions, because you are writing a library for other people to use, and you can't dictate ahead of time how they will use it.
This again does not invalidate the point. You need to be _aware_ of the performance implications of this, and mitigate it. He said the following in the comment section on the article:
> Try to make it so that you do very rare virtual function calls, behind which you do a _large_ amount of work, rather than the "clean" code way of using lots of little function calls.
Along with the heuristic that hardware and electricity are cheap, developers are expensive. That's probably why managers in my experience almost never ask developers to optimize slow code - they believe it would be cheaper to use more hardware if this fixes the problem (in some areas like HFT or GamDev more hardware is not the answer so optimization do happen). In rare cases I've seen optimization being done initiative always come from IC (who knew that the code could work fast / use less resources).
So nowadays writing code I assume that it never will be optimized later and try to do less dump stuff from the beginning.
You can of course only optimize what you are looking to optimize. I am not surprised (honestly) that some engineers do not realize they are in fact primed for the kind of things they will find based on what they are looking by the mere choice of where they want to look.
There's a tipping point where you have so much hardware there's big savings with optimization. Things like Postgres and the Linux kernel have a lot of optimization put into them and there's an insane amount of hardware out running that code.
That said, slow software sucks.
Maintainability and cleanliness is the best virtue code can have. If you have extremely clearly written code that has performance issues I can swoop in with analysis tools figure out where the pain point is and refactor it out. Sometimes this is a real headache[1] sometimes not - what I can guarantee is that if the code is "dirty" it's going to be a headache and it'll take more time.
I'd personally take issue with this article over the polymorphism claim though - polymorphism is a tool but it isn't the be-all and end-all tool. A lot of your data can live as structs/blobs in memory with tight internal type definition but without any OO principals. Personally I am a huge fan of functional programming (but not pure functional programming) so objects that I use are relatively few and far between and exist to fulfill a very specific purpose.
I've had two occasions in working when I needed to break out an asm block - the compiler was being a thick headed dummy and this code needed to receive incoming signals without exception or delay - but once that critical section was passed? Back to high level programming and statements favoring expressiveness over raw bare metal performance.
If you want an interesting experience talk to your closest non-technical manager type - be that a product team manager or the company owner - and ask them if they'd prefer if you focus on reducing how long your product takes to execute by 20% over the next five years or if they'd prefer you to lower the growth of the developer labor budget by 20% for the next five years by focusing on maintainability over performance. With the exception of extremely niche cases maintainability is always the golden standard.
1. For instance, I've dealt with OOM issues that have required transforming all logic on a query result to be lazily evaluated on a data stream after main execution finishes - like the logic goes up and down the stack and only then begins processing results. In this particular case the problem was rather easy to deal with because we essentially swapped out the actual value passing on each layer for a lazy result set being passed around - because the code was clean. Sometimes you'll definitely need to massively re-engineer things though.
Definitely get the single-threaded house in order before attempting to speed up by running in parallel.
Don't forget virtual machines and interpreters.
What I've seen is slow queries but a bigger problem is actually too many queries. It's easy to do especially when using an ORM.
It mostly happens on change when you want to add something to an existing query the changer just add their new query and slop it into a loop, boom performance is gone.
Over the years they've grown and one bin showed up where jobs would sit around for weeks--for that case n went from rarely having a screenful of lines to a few thousand--oops, now more than 2/3 of the time was spent in those searches. (And I have a sneaking suspicion that a good portion of the remaining time comes from using field names to retrieve values. The profiler doesn't separate that out, though, because it's not my code.)
- shit UX "ideas" that trigger 10 new API calls to show dialogs and popups.
- Logging libs under pressure
- overemphasis on testability
If shape Area is computed often enough that you care about inlining the calculation, why not compute & store it every time the height / width change. That’d be easy enough in an architecture based on information hiding, and might illustrate a legitimate engineering trade-off between those architectural choices.
That's really not even close to true. Loading random websites frequently costs multiple seconds worth of local processing time, and indeed, that's often because of the exact kind of overabstraction that this article criticizes (e.g. people use React and then design React component hierarchies that seem "conceptually clean" instead of ones that perform a rendering strategy that makes sense.)
You attempted to present an argument that's a textbook example of an hyperbolic fallacy.
There are worlds of difference between "this code does not sit in a hot path" and "let's spend multiple seconds of local processing time".
This blend of specious reasoning is the reason why the first rule of software optimization is "don't". Proponents of mindlessly going about the 1% edge cases fail to understand that the whole world is comprised of the 99% of cases where shaving off that millisecond buys you absolutely nothing, with a tradeoff of producing unmaintainable code.
The truth of the matter is that in 99% of the cases there is absolutely no good reason to run after these relatively large performance improvements if in the end the user notices absolutely nothing. Moreso if you're writing async code that stays far away from any sort of hot path.
But overall, I think you overestimate how much time you spend loading the website and how much time it's just sitting there, mostly idle.
And in the end, as long as it's fast enough that users don't stop using the site/webapp/program/whatever, then it's fine, imho. When it becomes too slow, the developers will be asked to improve performance. Because in the end, economics is the driver, not performance.
Unless we’re talking about specific compute-intensive websites, this is almost certainly network loading latency.
Modern web browsers are very fast. Moderns CPUs are very fast. Common, random websites aren’t churning through “multiple seconds” of CPU time just to render.
I know React tends to lack in both dev UX and performance (at least in my exp). Personally I've taken a look at Svelte and Solid, and liked them both. I haven't had the chance to build anything larger than a toy app, though.
I think a lot of the clean code advice in general related to object oriented programming.
I've noticed that once my Lua programs (games) grow to reasonable size, it becomes kinda hard to maintain. And I tend to use an object oriented programming style (of course it also doesn't help that Lua is not typesafe). After I finish my current game, I want to try to make a game using a procedural approach. I wonder if this would solve some of the issues I see in my current code base.
One of the core ideas of procedural programming is that data and functionality is not mixed in classes as we do in object oriented programming. Instead, you might have a module that contains some functions and some data objects the functions act upon. This approach would make some other aspects of game programming with Lua easier as well (e.g. serialisation), but perhaps it will make the code also easier maintainable as the size of the codebase grows. It's something I want to contemplate upon.
Oh, it's not. Look at the crap the Java community used to produce. 37 levels of abstraction is not maintainable.
CPU meter when clicking anything on "modern" webpage proves that's a lie.
Also, sure, even if "clicking on things" is maybe 1-5% vs "looking at things" THAT'S THE CRITICAL PATH.
Once the app rendered a view obviously it is not doing much but user is also not waiting on anyting and is "being productive", parsing whatever is displayed.
The critical path, the wasted time is the time app takes to render stuff and "but 99% of the time is not doing it" is irrelevant.
It's all horribly performing turd that needs a 4000$ MacBook to be tolerable at best.
Deleted Comment
Does it though? Where's the evidence for it? The vast majority of people I've worked with over the last couple decades who like to bring up "clean code", tend towards the wrong abstractions and over abstracting.
I almost always prefer working with someone who writes the kind of code Casey was than someone who follows the clean code examples I've spent my career dealing with. I've seen and worked with many examples of Data Oriented Design that were far from unmaintainable or unreadable.
> Prefer polymorphism to “if/else” and “switch”
Algebraic data types and pattern matching (a more general version of switch), make many types of data transformation far easier to understand and maintain (versus e.g. the visitor pattern which uses adhoc polymorphism).
> Code should not know about the internals of objects it’s working with
This is interpreted by many as "don't expose data types". Actually some data types are safe to expose. We have a JSON library at work where the actual JSON data type is kept abstract and pattern matching cannot be used. This is despite the fact that JSON is a published (and stable) spec and therefore already exposed!
> Functions should be small
"Small" is a strange metric to optimise for, which is why I don't like Perl. Functions should be readable and easy to reason about. Let's optimise for "simple" instead.
> Functions should do one thing
This is not always practical or realistic advice. Most functions in OOP languages are procedures that will likely perform side effects in addition to returning a result (e.g. object methods). Should we also not do logging? :)
> “DRY” - Don’t Repeat Yourself
The cost of any abstraction must be weighed up against the repetition on a case-by-case basis. For example, many languages do not abstract the for-loop and effectively encourage users to write it out over and over again, because they have decided that the cost of abstracting it (internal iteration using higher-order functions) is too high.
A really good developer writes clean code using the right abstraction (finding those tends to take the most time and experience) and drop down to a different level of abstraction for high performance areas where it makes sense.
The fact that bad developers suck and write bad code no matter if they use clean code or not does not reflect on the methodology
If we had perf issues that showed up outside they were higher level design issues like 1) trying to take a thumbnail of the page at full resolution for a tab thumbnail while loading another tab, not because the thumbnailing code itself was slow, or 2) running slow O(tabs) JS teardown during shutdown when we could run a O(~1) clean up step instead.
This works until your computer is old enough to be slower than what a majority of wealthy people (ie desirable customers) are using, at which point you need to buy a newer, faster computer, even though your current one was already "faster than anyone could reasonably need it to be".
This is all harmless enough—a little disrespectful perhaps, to make other people waste their money, but not so terrible—until you consider the environmental impact of all these new computers, which the average spreadsheet absolutely should not need but does anyway. It's also an equity issue—someone on a fixed income can't necessarily afford a new machine.
What would actually happen if Moore's law ended tomorrow, and we were no longer able to make computers any faster than they are today? It would really suck for scientists and hardcore gamers, but I actually think a majority of computer users would benefit The experience of someone who just writes documents and checks email would be unchanged, except that their current computers would never slow down!
Assuming it is right, there is something called multitasking, the CPU, RAM, and most importantly, the cache is not all yours, if there is 1000 pieces of software like yours, that's 100%. You may argue that 1000 pieces of software is unreasonable, and you would mostly be right, but it happens, and mostly for the same reason software isn't optimized: quantity over quality.
Another issue is that you have to make a distinction between throughput and latency. You don't have to keep up with a sustained 100 actions per second, people don't go that fast, but you definitely have to respond within tens of milliseconds, because more than that is noticeable. Latency is much harder to optimize and if you are in the critical path, these cycles may matter.
A lot of devices are battery powered these days, and all these wasted cycle are reducing the battery life of the entire system. Mobile devices are crazy powerful these days, but this power is meant to be used sparingly. And even with line powered devices, I think we waste enough energy as it is...
And finally, what is the point of "clean code"? Hopefully not just because it gives software architects boners. The point is usually to make software that will last: easier maintenance, less bugs, etc... But performance bugs exist too, and one of the most common software evolution is to do more of what the software already does. An image editing software will process more and bigger images, a database will store more entries and more details about each entry, documents will get larger, etc... You may even find that your users are using some feature on a scale you never intended, maybe someone is pasting entire books on your note taking app, and it may turn out working quite well... if you cared about performance. Not caring about performance is technical debt, and it may negate the advantage of using "clean code" in the first place.
I don't. how many programs are running in your OS right now? how much CPU do you need to keep those things plus the things you need running in a performant manner?
how much CPU would you need if things performed better? the answer is "less" every time.
better software performance = less money required for hardware to obtain the responsiveness you require.
it's important, and it's important completely independently of how it is framed here.
just wait till you've seen software get slower for 30 years. to put it another way, watch hardware get faster and faster and faster for 30 years while you observe software continually consume all of the available headroom until it feels slow again. watch that happen for THREE DECADES and wait for someone to tell you that everything is fine and that someone saying "software is unnecessarily slow" is wrong because they aren't framing their argument how you think it should be framed.
All that says is you should focus your energy on the increasing the value of .1%. It's not actually an argument to not spend any energy.
It's like saying 'Astronauts only spend .1% of their time in space' or 'tomatoes only spend .1% of their existence being eaten' - that .1% is the whole point.
You can debate how best to maximize that value, more features or more performance. The OP is suggesting folks are just leaving performance on the floor and then making vacuous arguments to excuse it.
In non-performance-critical areas, it's pretty important that when the original dev team leaves, new hires can still fix bugs and add features without breaking things.
It's because of this mentality that almost all desktop software nowadays is bloated garbage that needs 2GB of RAM and a 5Ghz CPU to perform even the most basic task that could be done with 1/100th of the resources 20 years ago.
I think the sibling comment here in this thread shows what some developers think. Electron is not reasonable. "Developers" using Electron should be punished by CBT and chinese torture.
Okay maybe that's too much. But I suggest a new quote:
I sometimes wish performance was an issue in the projects I work with, but if it is, it's on a higher level / architectural level - things like a point-and-click API gateway performing many separate queries to the SAP server in a loop with no short-circuiting mechanism. That's billions of lines of code being executed (I'm guessing) but the performance bottleneck is in how some consultant clicked some things together.
Other than school assignments, I've never had a situation where I ran into performance issues. I've had plenty of situations where I had to deal with poorly written ("unclean") code and spent extra brain cycles trying to make sense of it though.
That is not at all what "make it work, make it pretty, make it fast" is about. That saying is about prioritization. Making it fast doesn't mean anything if it doesn't work.
However, if you are doing performance-sensitive work then this is a very bad strategy. You need to design a performant architecture up front otherwise you'll likely have orders of magnitude worse performance, even after optimizing your code.
Ex: if your "make it work" design has shared mutable state, you're going to have a bad time when you want to scale that horizontally and unlock 100x better throughput/performance.
So, most code that I write tends to have dead simple data types, but combines it with classes (and subclasses) that represent methods (strategies) on how to retrieve, transform, present and store the data. The 'make it work' phase may do this in a simple script, but the actual data model tends to stay the same.
Far too often I see applications that assume low latency and unbreakable Internet connection. They seem to do almost no caching at all. For example thumbnails.
Also many of applications will be almost unusable (or trigger OOM) when you try to work with a big file. Sometimes a big file has merely tens of MB, sometimes problems start with a 3MB file. Those are the issues that occur without thinking about performance from the start - memory is free, you can copy things around, everything will fit in RAM.
One more thing. When your application consists of a client and a server it may turn out that you will put yourself in a corner when not thinking about performance early on. Everything will work without any troubles at first and then it turns out there are some latency issues with more data and you can't easily upgrade the client for example. Or you had an architecture that allows to spin up more servers and handle the load closer to the client, but it can cut your margins.
make it work
make it work correctly
make it work fast
Pretty was never in the picture. But if anyone wants to add it, it should come last.
If that's true, why does it take forever to load and frequently fail to keep up with my input?
I hear this a lot, but that 0.1% when I'm actually waiting to do its calculation, it better be fast.
And the rest 99.9% of time, it better not eat my battery and memory...
Value of time is disproportionately weighted by user attention, which is at its highest right around when user input is happening.
This talk(Preventing the Collapse of Civilization) by Jonathan Blow disagrees with you.
Link: https://www.youtube.com/watch?v=ZSRHeXYDLko
That's fine, that's as it should be and isn't an interesting metric. The computer should wait for me, not the other way around. Needlessly waiting for the computer is a sign of s%&t software and not as uncommon as we'd like, huh?
And you're saying for many developers, performance is not the biggest priority.
That's fine. But it doesn't make him even 0.000001% wrong. And he's not applying anything to any niche situations. You just missed his point. Performance.
If you're scaling to 1000s of users then yes. If you have a GUI for a monthly task that two administrators use, then no.
The less something gets used the longer the payback time on the initial development.
Also, "clean code" (as in from the "Clean Code" book) is generally not good advice for most programs anyway. Not only does it eat performance, it's not all that great for building maintainable, extensible systems.
Wrong focus. Software time doesn't matter, user time does.
Users don't want to wait, even slow typists want instand results as soon as they hit Enter.
Deleted Comment
This is bad for sustainability
> I think the author is taking general advice and applying it to a niche situation.
I think the author is taking a general situation and applying common sense to it.
So why isn't my browser at 0% CPU when IN THE BACKGROUND then?
Bad mindset.
A GFCI breaker spends 99.99999% of the time waiting with zero leakage current. Yet, when it does detect leakage current, you want the breaker to trip as quickly as possible.
See where I'm going?
Imagine if it would take 5 seconds from flicking a light switch until the lights actually turn on. Because the switch is waiting for user input 99.99% of the time, right? Would you install that light switch in your home?
This applies doubly so if you can rely on templates & structural typing to push your polymorphism to compile time. clang & gcc are surprisingly good at optimisation as long when you don't have to bounce off a vtable and code is clean / avoids "manual optimisation".
Also while I'm not saying I don't believe the author here, I wish they would have used https://quick-bench.com/ or https://godbolt.org/ so that readers could trivially verify the results & methodology.
In other words, the user’s entire perception of your program’s performance falls into that 0.1%.
Performance for performance sake is an interesting and appealing challenge to us engineers. I was writing C code in the 90s and I miss being that close to the hardware, trying to spare every clock cycle that I could while working with machines that had sparse resources.
But today I'm building SaaS products for millions of simultaneous active users. When customers complain about performance it is often not what us engineers think of as "performance." They're NOT saying things like "Your app is eating all memory on my phone" or "the rendering of this table looks choppy." It's usually issues related to server-side replication lag causing data inconsistencies or in some cases network timeouts due to slow responding services.
The point is the age old advice that we were giving aspiring game programmers back in the 90s:
Figure out and understand your priorities.
The famous inverse square root function in the Quake III Arena source code is a great example. If memory serves me, they needed this calculation as part of their particle physics engine. The problem is that calculating inverse square roots precisely is very expensive, especially at the scale they were required to. So they exploited how 32-bit floating point numbers are represented in binary in order to do a fast, good enough approximation. This is a good example of a targeted, purposeful optimization.
Back in the 90s we were obsessed over getting the most out of our hardware, especially when coding games. So we picked up all sorts of performance hacks and optimizations and learned how to code in assembly so that we could get even closer to the bare metal if we needed to. The result was impossible to understand and maintain code and so experienced engineers taught us young'uns:
Write clean code first, then profile to understand what your bottlenecks are, then to make TARGETED optimizations aimed at solving performance issues in order of priority.
That priority always being driven by user experience and/or scalability requirements.
Anything else is premature optimization. You're speculating about where your performance bottlenecks are, and you're throwing out maintainable code for speculative reasons; not actually knowing how much of an impact your optimizations are going to have on user experience.
Deleted Comment
Clean code (OOP, DRY, etc) is optimized for maintainability and extensibility, not necessarily performance.
In fact, I think it’s pretty well understood that clean code is a tradeoff wrt performance, at least that’s the way I’ve always understood it.
Clean code works well for something like a web app that’ll need to be maintained by scores of different engineers over many years or decades.
At least that’s the theory. In practice, at least some level of abstraction makes it a bit easier to rip and replace parts of the app without a total rewrite.
So you might write your code in a straight way, careful to not lose efficiency and then you are going to call slow code.
So his advices are easier to follow on some scenarios than others.
Is it? What processes are running on your computer right now, how many are waiting for your interaction?
Further the 0.1% of the time I do interact, I want the results promptly.
Well written video games, the kind of thing Casey works on usually, beat the hell out of basically any other category of software in terms of user responsiveness. At least of all the software I use regularly.
every place we work at is FAR more likely to scale how many compute instances we are using, than optimize the application.
The word is TENET. Tenants rent stuff.
Dead Comment
you're right. some things are inevitably slow so we should therefore never care about performance in any situation.
avoid belittling the efforts here just because they don't apply to all situations.
Not just general advice, but advice that is meant to be applied to TDD specifically. The principals of clean code are meant to help with certain challenges that arise out of TDD. It is often going to seem strange and out of place if you have rejected TDD. Remember, clean code comes from Robert C. Martin, who is a member of the XP/Agile gang.
The author cherry-picking one small piece out of what Uncle Bob promotes and criticizing it in isolation without even mentioning why it is suggested with respect to the bigger picture seemed disingenuous. It does highlight that testable code may not be the most performant code, but was anyone believing otherwise? There are always tradeoffs to be made.
It reminds me of Twitter or other companies which starting to change programming languages for performance reasons.
The presented transformation of code away from “clean” code had nothing to do with optimisation. In fact, it made the code more readable IMO. Then it demonstrated that most of those “clean” code commandments are detrimental to performance. So obviously when people saw the word “performance”, they immediately jumped “omg, you’re optimising, stop immediately!”
Another irritating reaction here is the straw man of optimising every last instruction: the course so far has been about demonstrating how much performance there even is on the table with reasonable code, to build up an intuition about what orders of magnitude are even possible. Casey repeated several times that what level of performance is right for your situation will depend on you and your situation. But you should be aware of what’s possible, about the multipliers you get from all those decisions.
And of course people bring up profilers: no profiler will tell you whether a function is optimal or not — only what portion of runtime is spent where. And if all your life you’ve been programming in Python, then your intuition about performance often is on the level of “well I guess in C it could be 5-10 times faster; I’ll focus on something else”, which always comes up in response to complaints about Python. Not even close.
For instance he fails to explain why he didn't bother addressing the tail of his unrolled loop. He does in the course, but here he's just assuming it's irrelevant, and doesn't address again potential criticism like "he doesn't even bother to write correct code, look at that lazy unrolling!".
Thankfully there's a free lecture that explains the broad concept with an example. It's this lecture that convinced me to try out his course, where I hope he'll go into more details: https://www.youtube.com/watch?v=pgoetgxecw8&list=PLEMXAbCVnm...
I think it's also important to separate computer science from software engineering. Algorithms and data structures can be reasoned about mathematically, but what about agile VS waterfall or functional VS oop?
Maybe software engineering relates more to philosophy than math. There are theories and some are more sound than others, but there is no objective truth. Most of us agree that it's important for code to be readable, but we still don't have the answer for what's the best readable code. Even first principle such as DRY are challenged, eg by WET.
I recall vaguely that there was a situation where there were two schools of thought with different approaches. One was about hacking away things, the other about having correct programs. Maybe Berkeley vs MIT? Such different opinions at the basic level of a discipline are more likely in philosophy than math, I think.
A lot of my performance and code quality chops came from projects where the team was comfortable with the performance of the system but the business was not. They wanted to stop when it was right for them but not right for the situation. It ended up negatively affecting my opinion of them because ultimately I started to see it as deflecting. It's fine because I don't know what else to do, not because this is the best that can be done.
No one is working with a huge amount of data in big loops using virtual methods to take every element out of a huge dataset like he is showing. That's a false pre-position he is trying to debunk. Polymorphic classes/structs are used to represent some business logic of applications or structured data with a few hundred such objects that keep some states and a small amount of other data so they are never involved in intensive computations as he shows. In real projects, such "horrible" polymorphic calls never pop up under profiling and usually occupy a fraction of a percent overall.
The reality is that the entire Java ecosystem revolves around call stacks hundreds of calls deep where most (if not all) of those are virtual calls through an interface.
Even in web server scenarios where the user might be "5 milliseconds away", I've seen these overheads add up to the point where it is noticeable.
ASP.NET Core for example has been optimised recently to go the opposite route of not using complex nested call paths in the core of the system and has seen dramatic speedups.
For crying out loud, I've seen Java web servers requiring 100% CPU time across 16 cores for half an hour to start up! HALF AN HOUR!
Dead Comment
You're mistaken, the load size has nothing to do with the end result. The result is normalized to give an estimate of how much faster the simple code is than the polymorphic code irregardless of input size. (Kinda like deaths per 100k instead of giving an absolute number of deaths for statistics about diseases).
So yes, your code is running 20x slower than it should be all the time.
Especially when you make every class an interface, with... get this, one implementation! This is based on real world experience and is not a joke. There are real companies with real people that write real code where every single class is an interface with exactly one implementation. Which, as Casey has shown, results in upwards of a 20x slowdown in the worst case.
Obviously, you probably won't get a 20x speedup by getting rid of the polymorphic garbage. But it's equally asinine to assume that polymorphic functions are only called a few hundred times. I guarantee you your PC is making millions of polymorphic function calls per minute between: the OS, the browser, windows Anti-Malware scanner, steam running in the background, oracle running its checks to remind you to update Java, etc. There are hundreds of processes running all the time on a modern device, these devices are wasting enormous amounts of resources.
And when you run this through a profiler, you will not notice how slow your code is, because everything is slow. Slowness is infused throughout the whole system.
There is one thing worse you can do (and I caught a C++ compiler doing it when we were profiling code while building an x86 clone years ago) instead of loading the address and jumping to it push the address then return to it, that not only breaks pipelines but also return stack optimisations
I remember doing that in a code generator ages ago because it was easier than calculating the jump offset :-P
Things way worse than that exist. Replace "virtual method" with "service call."
Yeah. I opened discord earlier, and it took about 10 seconds to open. My CPU is an apple M1, running about 3ghz per core. Assuming its single threaded (it wasn't), discord is taking about 30 billion cycles to open. (Or around 50 network round-trips at a 200ms ping).
Crimes against performance are everywhere.
As a result some styles of writing code just don’t work for the audio thread at all, and we’d have to simply avoid or rewrite libraries written this way.
There are just some domains where standard practice for cleanliness is different because of your constraints.
I mean, it’s to the point we’ve got die hards in this industry who insist on putting all functions inlined in headers (not that I agree!)
Then again, at some point we had "Lisp machines", maybe some day there will be a computer architecture where memory / computations patterns are adapted to massive simulation - rather than shoehorning on existing architecture.
And those will fail just as miserably as Lisp machines.
this is exactly how the typical naïve game loop/entity system works.
So go on all of you, write everything in Python with 90 levels of indirection, my stock will go up.
Congratulations. Taps on the back, champagne all around. Customers call. Same complaint.
Programmer asks "Well, did something change at least?". "Loading bar now flickers more", answers customer.
The particular example with shapes could be CAD or BIM app where it is usually more then "few hundred times".
I guess only in the places you call methods?
Wait? Are they everywhere? Hmmm...
When I was coming up I got gifted a bunch of modules at several jobs because the original writer couldn't be arsed to keep up with the many incremental changes I'd been making. They had a mentality that code was meant to be memorized instead of explored, and I was just beginning to understand code exploration from a writer's perspective. So they were both on the wrong side of history and the wrong side of me. Fuck it, if you want it so much, kid, it's yours now. Good luck.
Remember that everyone has their blind spots!
I follow Casey on twitter, and a couple years ago there was a weird thread where he had hung his browser for 4-5 seconds by running some JS to assign CSS rules to ~50K div elements. And Casey was a million percent confident that the hang was due to JS being slow, and had nothing to do with CSS or DOM rendering.
Of course, part of the point of Handmade Hero is to show that you can totally reimplement everything from first principles. Libraries are not magical black boxes, they're code written by human beings like you or me, and you can understand what they're doing.
For instance, he wrote his own PNG decoder[0] live on stream, with hardly any prior knowledge of the spec, even though I'm confident that under normal circumstances he'd just use stb_image. I'm sure he did this just to show how you'd go about doing that sort of thing.
[0] He only implemented the parts necessary to load a non-progressive 24bit color image, but that still involved writing his own DEFLATE implementation.
tell me you've never made a game without telling me you've never made a game
Likewise, I've worked in code bases where performance had been dreadful, yet there were no obvious bottlenecks. Little by little, replacing iterators with loops, objects/closures with enum-backed structs/tables, early exits and so on accumulating to the point where speed ups ranged from 2X to 50X without changing algorithms (outside of fixing basic mistakes like not pre allocating vectors).
Always fun to see these videos. I highly recommend his `Performance Aware Programming` course linked in the description. It's concise and to the point, which is a nice break from his more casual videos/streams which tend to be long-winded/ranty.
Just taking the little bit of time to think about what the computer needs to do and making a reasonable effort to not do unnecessary stuff goes a long way. That 2x-50x factor is in fact very familiar. That’s something loading in a second rather than in a minute, or something feeling snappy instead of slightly laggy.
And it matters much more than people say it does. The “premature optimisation...” quote has been grossly misused to a degree that it’s almost comical. It’s not a good excuse for being careless.
It takes 19 seconds for the main menu to load when you push play on our current game in Unity. It's killing me.
Meanwhile in my lua side project, its less than a second.
Can you recommend any SQL book with main focus on performance improvements like this?
> Select * queries
Sometimes you only want two columns, but you ask for 5. Say you query a million rows, where you ask for but throw away 60% of all the data you get back.
> Naive indexes
As in, just slapping an index on a table that doesn't have one makes such a big difference that sometimes it's all you need.
> N+1 queries
This is more of a problem in ORMs, but any time you call the database N times instead of 1 time. A classic example is writing a for loop that asks for one row at a time, instead of asking for all rows once.
You get 90% of the improvement just from that.
Indexes are the whole reason anyone even uses databases. And yet some backend guys think they are optional.
There might be some important colums like say a “status” or a “date”, which are fundamental to a lot of queries.
Or you have colums X and Y being used frequently or importantly in where clauses together, then that’s a candidate for composite indexes.
Stuff like that.
If there are a few fields referenced in a single WHERE add a single index that includes all of them.
If you have index that has a, b, c then it is as if you also had indexes a, b and a.
If condition in WHERE is = put this field at the beginning of an index. If it's < (or similar) put it at the end. You'll get best results if you have none or only one < in your query.
We can all write code that glues a very fixed set of things end to end and squeeze every last CPU cycle of performance out of it, but as we all know, software requirements change, and things like polymorphism allow for much better composition of functionality.
The tests he put together here are hardly something I'd call a straw-man argument, they seem like reasonable simplification of real-cases.
The focus on performance here ignores the fact that most programs are large systems of many things that interact with each other. That is where good design and abstractions and “clean code” can really help.
Like all things it is about finding a balance and applying the right techniques to the right parts of a larger system.
You should check if your code is in the hot path before optimizing, because the more you couple things together the harder it is to change it around. For instance, in Casey's example, if you wanted to add a polygon shape but you've optimized calculating area into multiplying height x width by a coefficient, that requires a significant refactor. If you are sure you don't need polygons, that's a perfectly fine optimization. But if you do, you need to start deoptimizing.
Paraphrasing Russ Ackoff, doing the right thing and doing a thing right is the difference between wisdom or effectiveness and efficiency. What Casey is doing here may be efficient, but calculating a billion rectangles doesn't present a realistic or general use case.
"Clean Code" or any paradigm of the sort aims to make qualitative, not quantitative improvements to code. What you gain isn't performance but clarity when you build large systems, reduction in errors, reduce complexity, and so on. Nobody denies that you can make a program faster by manually squeezing performance out if it. But that isn't the only thing that matters, even if it's something you can easily benchmark.
Looking at a tiny code example tells you very little about the consequences of programming like this. If we program with disregard for the system in favour of performance of one of its parts, what does that mean three years down the line, in a codebase with millions of lines of code?`That's just one question.
The reality is that how flexible your interfaces and abstractions are and their design has to be a part of your original design considerations when building something. It's a bad move to just hand wave away performance concerns because you religiously adhere to some design patterns. It's also a bad move to drop down to using intrinsics for everything from the get-go and thinking you know better than the compiler when it's a codepath that isn't even computationally expensive or a bottleneck a priori.
There's lots of ways to do this poorly and well. There's no process for it. That's a feature. I feel like a lot of the flak clean code gets boils down to, "I followed it dogmatically and look what it made me do!" It didn't make you do anything; it's trying to teach you aesthetics, not a process. Internalize the aesthetics and you won't need a rigid process.
Obviously when you do this you probably need more code than you'd normally write. That can be viewed as a maintenance burden in some situations, esp. when you don't have product market fit. Again, this shows that treating clean code like some process that always produces better code in every situation is extremely naive.
There was a moment in grade school where I was sat down and it was explained to me that you don't have to take a test in order. You can skip around if you want to, and I ran so far with that notion that at 25 I probably should have written a book on how to take tests, while I could still remember most of it.
One of the few other "lightning bolt out of the blue" experiences I can recall was realizing that some code constructs are easier for both the human and the compiler to understand. You can by sympathetic to both instead of compromising. They both have a fundamental problem around how many concepts they can juggle at the exact same time. For any given commit message or PR you can adjust your reasoning for whichever stick the reviewer has up their butt about justifying code changes.
Clean code means easy to read, maintain, not full of arbitrary things 'because performance'.
you have Japan infrastructure, and you have Turkey infrastructure
6.1 quake in Japan = nothing destroyed
6.1 quake in Turkey = everything collapses
The engineers in Turkey probably didn't value performance and efficiency
It's the same for developers, you choose your camp wisely, otherwise people will complain at you if they can no longer bear your choice
You act like innocent, but your code choice translate to a cost (higher server bill for your company, higher energy bill for your customers/users, time wasted for everyone, depleting rare materials at a faster rate, growing tech junk)
Selfishness is high in the software industry
We are lucky it's not the same for the HW industry, but it's getting hard for them to hide your incompetence, as more things now run on a battery, and the battery tech is kinda struggling
Good thing is they get to sell more HW since the CPU is "becoming slower" lol
So we now got smartwatches that one need to recharge every damn day
Uh, no. It was corruption. They were standards, that worked, but people didn't do it, plain and simple.
These "clean code" principles should not, and generally are not, ever used at performance critical code, in particular computer graphics. I've never seen anyone seriously try to write computer graphics while "keeping functions small" and "not mixing levels of abstraction". We can go further: you won't be going anywhere in computer graphics by trying to "write pure functions" or "avoiding side effects".
These "clean code principles" are, however, rather useful for large, corporate systems, with loads of business process rules maintained by different teams of people, working for multiple third parties with poor job retaining. You don't need to think about vector performance for processing credit card payments. You don't need to think about input latency for batch processing data warehouse jobs, but you need this types of applications to work reliably. Way more reliably than a videogame or a streaming service.
Right tools for the right jobs, people need to stop trying to hammer everything into the same tools. This is not only a bad practice in software, it's a bad practice in life, the search for an ever elusive silver bullet, a panacea, a miracle. Just drop it and get real.
Not sure about this; in my experience (in a different domain, audio processing) you totally can get away with both of these a lot of the time.
Function inclining works well, so you can write small pure functions in a lot of cases (especially if you accept a function that reads from one buffer and writes to another as pure).
As for avoiding side effects, this is normally more about keeping your state updates small and localised (allowing more parts to be pure), which is often not a problem performance-wise.
IME it's much easier to improve the performance of a piece of code which is easy to reason about and change with some level of confidence that your optimisation will not break things.
I know there's some unavoidable global state in computer graphics, but presumably there is lots of code that doesn't directly touch that.
I agree that this is mostly true, but maybe not for beginners in the field
When I was reading "Raytracing in One Weekend" (known as _the_ introductory literature on the topic), I was very surprised to see that the author designed the code so objects extend a `Hittable` class and the critical ray-intersection function `hit` is dynamically dispatched through the `virtual` keyword and thus suffers a huge performance penalty
This is the hottest code path in the program, and a ray-tracer is certainly performance critical, but the author is instructing students/readers to use this "clean code" principle and it drastically slows down the program.
So I agree most computer graphics programmers aren't writing "clean code", but I think a lot of new programmers are being taught them because of introductory literature
But I'd point that languages and compilers are built so we can have both.
The problem is some definitions of good really aren't, and it isn't just because they're slow, it's because they make some things "good" at the detriment of others. Uncle Bob's Clean Code is really good at making some portions of the code "good" and "simple", but when put together they are interconnected in ways that are more difficult to understand.
In my experience, writing the code with readability, ease of maintenance, and performance all in mind gets you 90% of each of the benefits you’d have gotten focusing on only one of the above. For instance, maybe instead of pretending that an O(n^2) algorithm is any “cleaner” than an O(n log n) algorithm because it was easier for you to write, maybe just use the better algorithm. Or, instead of pretending Python is more readable or easier to develop in than Rust (assuming developers are skilled in both), just write it in Rust. Or, instead of pretending that you had to write raw assembly to eke out the last drop of performance in your function, maybe target the giant mess elsewhere in your application where 80% of the time is spent.
A lot of the “clean” vs “fast” argument is, as I’ve said above, pretending. People on both sides pretend you cannot have both, ever, when in actuality you can have almost all of what is desired in 95% of cases.
Code that is unreadable, tightly coupled, untestable or just messy is much, much harder to work in than code that is readable, loosely coupled, well-tested and clean. This has been proven often and is really a no-brainer. Performance-optimizing is finding the bottleneck, then rewriting that without changing the functional behaviour. For this you need resp. readability (to find bottlenecks you must be able to understand flow and code), ability to rewrite (tightly coupled code cannot be rewritten in isolation) and insurance the behaviour doesn't change (test coverage).
Ergo: a clean archictecture is a requirement to make code more performant in the first place. Even if that architecture is bad for performance in itself, it enables future improvements.
Clean Code is actually a book by Uncle Bob. And Clean Architecture is the name of another book by him.
What Casey is criticizing isn't "good code". He's criticizing Uncle Bob's philosophy.