“Clean” code, horrible performance

I think the author is taking general advice and applying it to a niche situation.

> So by violating the first rule of clean code — which is one of its central tenants — we are able to drop from 35 cycles per shape to 24 cycles per shape

Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something. If you're writing a AAA video game, or high performance calculation software then sure, go crazy, get those improvements.

But most of us aren't doing that. Most developers are doing work where the biggest problem is adding the next umpteenth features that Product has planned (but hasn't told us about yet). Clean code optimizes for improving time-to-market for those features, and not for the CPU doing less work.

yashap · 3 years ago

100%, I’ve done tonnes of (backend) performance optimization, profiling, etc. on higher level applications, and the perf bottlenecks have never been any of the things discussed in this article. It’s normally things like:

- Slow DB queries

- Lack of concurrency/parallelism

- Lack of caching/memoization for some expensive thing that could be cached

- Excessive serialization/deserialization (things like ORMs that create massive in memory objects)

- GC tuning/not enough memory

- Programmer doing something dumb, like using an array when they should be using a set (and then doing a huge number of membership checks)

With that being said, I have worked on the odd performance optimization where we had to get quite low level. For example, when working on vehicle routing problems, they’re super computationally heavy, need to be optimized like crazy and the hot spots can indeed involve pretty low level optimizations. But it’s been rare in the work I’ve done.

This article is probably meaningful for people who work on databases, games, OSes, etc., but for most devs/apps these tips will yield zero noticeable performance improvements. Just write code in a way you find clean/maintainable/readable, and when you have perf issues, profile them and ship the appropriate fix.

p1necone · 3 years ago

Casey Muratori knows a lot about optimizing performance in game engines. He then assumes that all other software must be slow because of the exact same problems.

I think the core problem here is that he assumes that everything is inside a tight loop, because in a game engine that's rendering 60+ times a second (and probably running physics etc at a higher rate than that) that's almost always true.

Also the fact that his example of what "everyone" supposedly calls "clean code" looks like some contrived textbook example from 20 years ago strains his credibility.

Edit: come to think of it, the only person I know of who actually uses the phrase "clean code" as if it's some kind of concrete thing with actual rules is Uncle Bob. Is Casey assuming the entire commercial software industry === Uncle Bob? It's like he talked to one enterprise java dev like 10 years ago and based his opinion of the entire industry on them.

hsn915 · 3 years ago

All of these are instances of doing something _wasteful_, which is the #1 issue he mentions in the list of things that cause performance degradation.

Now, your argument seems to be: in the real world, there's so much waste, that virtual function calls pale in comparison.

This does not debunk his main point, which seems to me at least the following: all things being equal, writing code with virtual functions that do a tiny amount of work and "hiding implementation details" makes performance worse, sometimes by an order of magnitude.

Now, there maybe situations where you _have_ to use virtual functions, because you are writing a library for other people to use, and you can't dictate ahead of time how they will use it.

This again does not invalidate the point. You need to be _aware_ of the performance implications of this, and mitigate it. He said the following in the comment section on the article:

> Try to make it so that you do very rare virtual function calls, behind which you do a _large_ amount of work, rather than the "clean" code way of using lots of little function calls.

citrin_ru · 3 years ago

Write slow code now, profile and optimize later is how we got all slow software because second step - optimization practically never happens in my experience.

Along with the heuristic that hardware and electricity are cheap, developers are expensive. That's probably why managers in my experience almost never ask developers to optimize slow code - they believe it would be cheaper to use more hardware if this fixes the problem (in some areas like HFT or GamDev more hardware is not the answer so optimization do happen). In rare cases I've seen optimization being done initiative always come from IC (who knew that the code could work fast / use less resources).

So nowadays writing code I assume that it never will be optimized later and try to do less dump stuff from the beginning.

fvdessen · 3 years ago

If you program using design patterns that are 10x slower, your application end up 10x slower, even after you've optimised the hot spots away, and the profiler will not give you any idea that it could be still 10x faster.

slaymaker1907 · 3 years ago

ORMs and slow DB queries kind of go hand in hand. Also, you'd be surprised at how efficient arrays are for membership checks so long as the number of items is even moderately small (as a rough rule of thumb, the only thing that really matters with these kinds of checks is how many cache lines are involved).

jimbob45 · 3 years ago

Well said. The aphorism "Premature optimization is the root of all evil" is meant to mean "Build it right first, then optimize only what needs to be optimized". There's really no need to start cooking spaghetti right off the bat. Clean code with some performance tweaks will be more maintainable in the long run without sacrificing performance.

noobermin · 3 years ago

To be fair to OP, does anything about your current paradigm even allow you to evaluate his claims? Is your code even in a form that you can for example remove polymorphism and use simple arrays of data you want to work over?

You can of course only optimize what you are looking to optimize. I am not surprised (honestly) that some engineers do not realize they are in fact primed for the kind of things they will find based on what they are looking by the mere choice of where they want to look.

DeathArrow · 3 years ago

Maybe for low user count is valid. We run a large multitenant, microservice based application and the physical machines where the Kubernetes pods reside have their CPUs at 90%. The application makes such a large use of "clean coding", "design patterns", SOLID that would make Uncle Bob proud. We would have been better without using so much abstractions on top of abstractions.

nijave · 3 years ago

In a business context, it usually happens that hardware is cheaper than software (licensing) which is cheaper than engineering labor. Tack on the opportunity cost of delaying business advances/features and it's usually cheaper to just throw hardware at it.

There's a tipping point where you have so much hardware there's big savings with optimization. Things like Postgres and the Linux kernel have a lot of optimization put into them and there's an insane amount of hardware out running that code.

That said, slow software sucks.

munk-a · 3 years ago

I'd actually say this article is generally unhelpful - it's good to be aware but as someone who works on sorting out performance critical things I want the code to be as clean as humanly possible going in. Whether you write clean or dirty code if you're a junior developer you're probably not going to write performant code and even senior devs may be able to sniff what might be a bottleneck in advance but most of us have learned to avoid premature optimization like the plague.

Maintainability and cleanliness is the best virtue code can have. If you have extremely clearly written code that has performance issues I can swoop in with analysis tools figure out where the pain point is and refactor it out. Sometimes this is a real headache[1] sometimes not - what I can guarantee is that if the code is "dirty" it's going to be a headache and it'll take more time.

I'd personally take issue with this article over the polymorphism claim though - polymorphism is a tool but it isn't the be-all and end-all tool. A lot of your data can live as structs/blobs in memory with tight internal type definition but without any OO principals. Personally I am a huge fan of functional programming (but not pure functional programming) so objects that I use are relatively few and far between and exist to fulfill a very specific purpose.

I've had two occasions in working when I needed to break out an asm block - the compiler was being a thick headed dummy and this code needed to receive incoming signals without exception or delay - but once that critical section was passed? Back to high level programming and statements favoring expressiveness over raw bare metal performance.

If you want an interesting experience talk to your closest non-technical manager type - be that a product team manager or the company owner - and ask them if they'd prefer if you focus on reducing how long your product takes to execute by 20% over the next five years or if they'd prefer you to lower the growth of the developer labor budget by 20% for the next five years by focusing on maintainability over performance. With the exception of extremely niche cases maintainability is always the golden standard.

1. For instance, I've dealt with OOM issues that have required transforming all logic on a query result to be lazily evaluated on a data stream after main execution finishes - like the logic goes up and down the stack and only then begins processing results. In this particular case the problem was rather easy to deal with because we essentially swapped out the actual value passing on each layer for a lazy result set being passed around - because the code was clean. Sometimes you'll definitely need to massively re-engineer things though.

osigurdson · 3 years ago

>> Lack of concurrency/parallelism

Definitely get the single-threaded house in order before attempting to speed up by running in parallel.

jbverschoor · 3 years ago

Well that's exactly the difference between systems programming and application programming.

Don't forget virtual machines and interpreters.

Dave3of5 · 3 years ago

> Slow DB queries

What I've seen is slow queries but a bigger problem is actually too many queries. It's easy to do especially when using an ORM.

It mostly happens on change when you want to add something to an existing query the changer just add their new query and slop it into a loop, boom performance is gone.

LorenPechtel · 3 years ago

One I found with profiling--when writing the code n was quite small. Many database operations simply iterated over an array to decide where to store an item. The time spent dealing with the data was a tiny fraction of the database round trip time, there simply was no reason to get fancy.

Over the years they've grown and one bin showed up where jobs would sit around for weeks--for that case n went from rarely having a screenful of lines to a few thousand--oops, now more than 2/3 of the time was spent in those searches. (And I have a sneaking suspicion that a good portion of the remaining time comes from using field names to retrieve values. The profiler doesn't separate that out, though, because it's not my code.)

gofreddygo · 3 years ago

Some more indirect reasons

- shit UX "ideas" that trigger 10 new API calls to show dialogs and popups.

- Logging libs under pressure

- overemphasis on testability

anitil · 3 years ago

How did you get in to that work? I love this sort of optimization but not sure how to get people to pay me to do it full time.

rkuska · 3 years ago

And let's not forget that majority of CPU cycles is spent on encoding/decoding json requests.

hallqv · 3 years ago

Great reply, thanks for sharing your experiences!

e28eta · 3 years ago

Author could use a little more memoization in his example, but I suspect that breaks some of the simplicity of his argument.

If shape Area is computed often enough that you care about inlining the calculation, why not compute & store it every time the height / width change. That’d be easy enough in an architecture based on information hiding, and might illustrate a legitimate engineering trade-off between those architectural choices.

mquander · 3 years ago

> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something. If you're writing a AAA video game, or high performance calculation software then sure, go crazy, get those improvements.

That's really not even close to true. Loading random websites frequently costs multiple seconds worth of local processing time, and indeed, that's often because of the exact kind of overabstraction that this article criticizes (e.g. people use React and then design React component hierarchies that seem "conceptually clean" instead of ones that perform a rendering strategy that makes sense.)

simplotek · 3 years ago

> That's really not even close to true. Loading random websites frequently costs multiple seconds worth of (...)

You attempted to present an argument that's a textbook example of an hyperbolic fallacy.

There are worlds of difference between "this code does not sit in a hot path" and "let's spend multiple seconds of local processing time".

This blend of specious reasoning is the reason why the first rule of software optimization is "don't". Proponents of mindlessly going about the 1% edge cases fail to understand that the whole world is comprised of the 99% of cases where shaving off that millisecond buys you absolutely nothing, with a tradeoff of producing unmaintainable code.

The truth of the matter is that in 99% of the cases there is absolutely no good reason to run after these relatively large performance improvements if in the end the user notices absolutely nothing. Moreso if you're writing async code that stays far away from any sort of hot path.

mabbo · 3 years ago

I don't disagree that the balance is shifting towards "why is this taking so long". There's ebbs and flows in that ecosystem.

But overall, I think you overestimate how much time you spend loading the website and how much time it's just sitting there, mostly idle.

And in the end, as long as it's fast enough that users don't stop using the site/webapp/program/whatever, then it's fine, imho. When it becomes too slow, the developers will be asked to improve performance. Because in the end, economics is the driver, not performance.

WatchDog · 3 years ago

Most of these websites have neither clean code nor hyper optimisation, just bad code all round.

PragmaticPulp · 3 years ago

> That's really not even close to true. Loading random websites frequently costs multiple seconds worth of local processing time

Unless we’re talking about specific compute-intensive websites, this is almost certainly network loading latency.

Modern web browsers are very fast. Moderns CPUs are very fast. Common, random websites aren’t churning through “multiple seconds” of CPU time just to render.

commandlinefan · 3 years ago

Well, you can make things run even faster by hand-coding it in assembler... but performance isn't the reason we use high level languages. I agree with you that ignoring performance characteristics in favor of speed-to-market is an awful and pervasive practice in modern software development, but the linked article isn't talking about or making that case at all. He's saying that he can make his own custom object oriented C language that runs faster than C++ itself, but that's not news - people were saying that in 1995 (at least). The maintainability hit isn't worth it.

rodrigobellusci · 3 years ago

Since you bring up React in your example, which framework should one use to build better performing web apps?

I know React tends to lack in both dev UX and performance (at least in my exp). Personally I've taken a look at Svelte and Solid, and liked them both. I haven't had the chance to build anything larger than a toy app, though.

onion2k · 3 years ago

For what it's worth, less than 4% of websites use React (approximately 4% use any JS framework) . If you believe the web is slow because of React you are wrong. It's not even due to JS.

overgard · 3 years ago

I would argue that ignoring performance, a lot of "clean" code isn't really that much clearer and more maintainable at all (At least by the Robert Martin definition of "Clean Code"). Things like dependency injection and runtime polymorphism can make it really hard to trace exactly what happens in what sequence in the code, and things like asynchronous callbacks can make things like call stacks a nightmare (granted, you do need the latter a lot). Small functions can make code hard to understand because you need to jump to a lot of places and lose the thread on the sequential state (bonus points if the small functions are only ever used once by one other function). The more I work in code bases the more I find that overusing these "clean" ideas can obscure the code more than if it was just written plainly. I think a lot of times, if a technique confuses compiler optimization and static code analysis, it's probably going to confuse humans also.

wsc981 · 3 years ago

There's some videos on the internet claiming object oriented programming is pretty bad in many situations. And lately I've been wondering if there's a kernel of truth in this statement. As an alternative, often procedural programming is advised instead.

I think a lot of the clean code advice in general related to object oriented programming.

I've noticed that once my Lua programs (games) grow to reasonable size, it becomes kinda hard to maintain. And I tend to use an object oriented programming style (of course it also doesn't help that Lua is not typesafe). After I finish my current game, I want to try to make a game using a procedural approach. I wonder if this would solve some of the issues I see in my current code base.

One of the core ideas of procedural programming is that data and functionality is not mixed in classes as we do in object oriented programming. Instead, you might have a module that contains some functions and some data objects the functions act upon. This approach would make some other aspects of game programming with Lua easier as well (e.g. serialisation), but perhaps it will make the code also easier maintainable as the size of the codebase grows. It's something I want to contemplate upon.

JustSomeNobody · 3 years ago

> a lot of "clean" code isn't really that much clearer and more maintainable at all

Oh, it's not. Look at the crap the Java community used to produce. 37 levels of abstraction is not maintainable.

ilyt · 3 years ago

CPU meter when clicking anything on "modern" webpage proves that's a lie.

Also, sure, even if "clicking on things" is maybe 1-5% vs "looking at things" THAT'S THE CRITICAL PATH.

Once the app rendered a view obviously it is not doing much but user is also not waiting on anyting and is "being productive", parsing whatever is displayed.

The critical path, the wasted time is the time app takes to render stuff and "but 99% of the time is not doing it" is irrelevant.

izacus · 3 years ago

Yeah, this comment doesn't even pass a smell test as soon as you try any of the modern Electron "apps".

It's all horribly performing turd that needs a 4000$ MacBook to be tolerable at best.

Deleted Comment

sarchertech · 3 years ago

>Clean code optimizes for improving time-to-market for those features

Does it though? Where's the evidence for it? The vast majority of people I've worked with over the last couple decades who like to bring up "clean code", tend towards the wrong abstractions and over abstracting.

I almost always prefer working with someone who writes the kind of code Casey was than someone who follows the clean code examples I've spent my career dealing with. I've seen and worked with many examples of Data Oriented Design that were far from unmaintainable or unreadable.

grumpyprole · 3 years ago

Completely agree. These rules simply do not lead to better outcomes in all cases. Looking at the rules and playing Devil's advocate for fun:

> Prefer polymorphism to “if/else” and “switch”

Algebraic data types and pattern matching (a more general version of switch), make many types of data transformation far easier to understand and maintain (versus e.g. the visitor pattern which uses adhoc polymorphism).

> Code should not know about the internals of objects it’s working with

This is interpreted by many as "don't expose data types". Actually some data types are safe to expose. We have a JSON library at work where the actual JSON data type is kept abstract and pattern matching cannot be used. This is despite the fact that JSON is a published (and stable) spec and therefore already exposed!

> Functions should be small

"Small" is a strange metric to optimise for, which is why I don't like Perl. Functions should be readable and easy to reason about. Let's optimise for "simple" instead.

> Functions should do one thing

This is not always practical or realistic advice. Most functions in OOP languages are procedures that will likely perform side effects in addition to returning a result (e.g. object methods). Should we also not do logging? :)

> “DRY” - Don’t Repeat Yourself

The cost of any abstraction must be weighed up against the repetition on a case-by-case basis. For example, many languages do not abstract the for-loop and effectively encourage users to write it out over and over again, because they have decided that the cost of abstracting it (internal iteration using higher-order functions) is too high.

Gluber · 3 years ago

You are confounding three separate skills. Finding the right abstractions is an art, whether you write clean code or not. Writing high performance code is another art.

A really good developer writes clean code using the right abstraction (finding those tends to take the most time and experience) and drop down to a different level of abstraction for high performance areas where it makes sense.

The fact that bad developers suck and write bad code no matter if they use clean code or not does not reflect on the methodology

bgirard · 3 years ago

Having done a lot of performance work on Gecko (Firefox), we generally knew where cycles mattered a lot (low level graphics like rasterization, js, dom bindings, etc...) and we used every trick there. But for the majority of the millions of LoC of the codebase these details didn't matter like you say.

If we had perf issues that showed up outside they were higher level design issues like 1) trying to take a thumbnail of the page at full resolution for a tab thumbnail while loading another tab, not because the thumbnailing code itself was slow, or 2) running slow O(tabs) JS teardown during shutdown when we could run a O(~1) clean up step instead.

Wowfunhappy · 3 years ago

What you're basically saying is "modern computers are so much faster than anyone needs them to be, it's okay to make them a little slower."

This works until your computer is old enough to be slower than what a majority of wealthy people (ie desirable customers) are using, at which point you need to buy a newer, faster computer, even though your current one was already "faster than anyone could reasonably need it to be".

This is all harmless enough—a little disrespectful perhaps, to make other people waste their money, but not so terrible—until you consider the environmental impact of all these new computers, which the average spreadsheet absolutely should not need but does anyway. It's also an equity issue—someone on a fixed income can't necessarily afford a new machine.

What would actually happen if Moore's law ended tomorrow, and we were no longer able to make computers any faster than they are today? It would really suck for scientists and hardcore gamers, but I actually think a majority of computer users would benefit The experience of someone who just writes documents and checks email would be unchanged, except that their current computers would never slow down!

GuB-42 · 3 years ago

> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something.

Assuming it is right, there is something called multitasking, the CPU, RAM, and most importantly, the cache is not all yours, if there is 1000 pieces of software like yours, that's 100%. You may argue that 1000 pieces of software is unreasonable, and you would mostly be right, but it happens, and mostly for the same reason software isn't optimized: quantity over quality.

Another issue is that you have to make a distinction between throughput and latency. You don't have to keep up with a sustained 100 actions per second, people don't go that fast, but you definitely have to respond within tens of milliseconds, because more than that is noticeable. Latency is much harder to optimize and if you are in the critical path, these cycles may matter.

A lot of devices are battery powered these days, and all these wasted cycle are reducing the battery life of the entire system. Mobile devices are crazy powerful these days, but this power is meant to be used sparingly. And even with line powered devices, I think we waste enough energy as it is...

And finally, what is the point of "clean code"? Hopefully not just because it gives software architects boners. The point is usually to make software that will last: easier maintenance, less bugs, etc... But performance bugs exist too, and one of the most common software evolution is to do more of what the software already does. An image editing software will process more and bigger images, a database will store more entries and more details about each entry, documents will get larger, etc... You may even find that your users are using some feature on a scale you never intended, maybe someone is pasting entire books on your note taking app, and it may turn out working quite well... if you cared about performance. Not caring about performance is technical debt, and it may negate the advantage of using "clean code" in the first place.

naikrovek · 3 years ago

> I think the author is taking general advice and applying it to a niche situation.

I don't. how many programs are running in your OS right now? how much CPU do you need to keep those things plus the things you need running in a performant manner?

how much CPU would you need if things performed better? the answer is "less" every time.

better software performance = less money required for hardware to obtain the responsiveness you require.

it's important, and it's important completely independently of how it is framed here.

just wait till you've seen software get slower for 30 years. to put it another way, watch hardware get faster and faster and faster for 30 years while you observe software continually consume all of the available headroom until it feels slow again. watch that happen for THREE DECADES and wait for someone to tell you that everything is fine and that someone saying "software is unnecessarily slow" is wrong because they aren't framing their argument how you think it should be framed.

greysphere · 3 years ago

>Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something.

All that says is you should focus your energy on the increasing the value of .1%. It's not actually an argument to not spend any energy.

It's like saying 'Astronauts only spend .1% of their time in space' or 'tomatoes only spend .1% of their existence being eaten' - that .1% is the whole point.

You can debate how best to maximize that value, more features or more performance. The OP is suggesting folks are just leaving performance on the floor and then making vacuous arguments to excuse it.

bilvar · 3 years ago

I don't know man, my TV has hardware several orders of magnitude faster and more advanced than the hardware that took us to the moon, and it takes dozens of seconds for apps like Netflix or Amazon Prime Video to load dashboards / change profiles or several seconds to do simple navigation or adjust playback. People just don't know how to properly write software these days, universities just churn out code monkeys with a vicious feedback loop occurring at the workplaces afterwards.

naikrovek · 3 years ago

yes. I've observed hardware get faster and faster for 30+ years and I've watched software consume all of that headroom the entire time, for no clear reason other than the way we write software is just getting worse and worse and worse.

DeathArrow · 3 years ago

Yes, but while you wait for the software to load, don't you have a smile on your face thinking that the developers used "clean code principles"?

larsonnn · 3 years ago

So true and so annoying.

FooBarWidget · 3 years ago

Not only time to market, but also maintainability.

In non-performance-critical areas, it's pretty important that when the original dev team leaves, new hires can still fix bugs and add features without breaking things.

the-smug-one · 3 years ago

I don't see how the code snippets presented are less maintainable.

bakugo · 3 years ago

> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something

It's because of this mentality that almost all desktop software nowadays is bloated garbage that needs 2GB of RAM and a 5Ghz CPU to perform even the most basic task that could be done with 1/100th of the resources 20 years ago.

doodlesdev · 3 years ago

No, it's not because of this mentality. Remember the timeless quote:

   "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

    Yet we should not pass up our opportunities in that critical 3%."

Using Electron and bloated frameworks that "abstract" things away is the biggest problem for most modern software, not the fact the developers aren't optimizing 10 cycles from the CPU away. It's a fundamental issue of how the software is made and not what the code is. If you need to run a whole web browser for your application, you already lost, there is no optimization that you can do there.

I think the sibling comment here in this thread shows what some developers think. Electron is not reasonable. "Developers" using Electron should be punished by CBT and chinese torture.

Okay maybe that's too much. But I suggest a new quote:

   "We should forget about Electron, say about 100% of the time: electron is the root of all evil.

    Yet we should not pass up our opportunities in rewriting everything in Rust."

bitwize · 3 years ago

RAM and GPU are cheap. Most users aren't going to notice. Meanwhile by choosing Electron, the developers were able to roll out the app on Windows, Mac, and Linux at nearly zero marginal cost per additional platform.

nikanj · 3 years ago

But somehow modern software (Outlook, I’m looking at you) has trouble keeping up at my typing speed, and there’s a visible delay before characters appear on the screen. It doesn’t matter what the software does 99.9% of the time, if it’s an utter pig that crucial 0.1% of the time when the user is providing input.

Cthulhu_ · 3 years ago

An oft recited of thumb: Make it work, make it pretty, make it fast - in that order. That is, performance bottlenecks are easier to find and fix if your code is clean to begin with.

I sometimes wish performance was an issue in the projects I work with, but if it is, it's on a higher level / architectural level - things like a point-and-click API gateway performing many separate queries to the SAP server in a loop with no short-circuiting mechanism. That's billions of lines of code being executed (I'm guessing) but the performance bottleneck is in how some consultant clicked some things together.

Other than school assignments, I've never had a situation where I ran into performance issues. I've had plenty of situations where I had to deal with poorly written ("unclean") code and spent extra brain cycles trying to make sense of it though.

ZephyrBlu · 3 years ago

> That is, performance bottlenecks are easier to find and fix if your code is clean to begin with

That is not at all what "make it work, make it pretty, make it fast" is about. That saying is about prioritization. Making it fast doesn't mean anything if it doesn't work.

However, if you are doing performance-sensitive work then this is a very bad strategy. You need to design a performant architecture up front otherwise you'll likely have orders of magnitude worse performance, even after optimizing your code.

Ex: if your "make it work" design has shared mutable state, you're going to have a bad time when you want to scale that horizontally and unlock 100x better throughput/performance.

m_mueller · 3 years ago

Here is the thing: The paradigms of your initial design, or if you're in a better situation, the first refactor, is likely to stay. If performance is what you think about last, you choose different paradigms (like e.g. polymorphism). From my experience, a winning strategy is to think about things that scale already in the initial design, and what scales in a backend-system is often the amount of data you want to pipe through it. Thus, keeping it data centric (primitive types and arrays for raw data), while using polymorphism and functional interfaces to keep the methodologies clean, is usually a good idea.

So, most code that I write tends to have dead simple data types, but combines it with classes (and subclasses) that represent methods (strategies) on how to retrieve, transform, present and store the data. The 'make it work' phase may do this in a simple script, but the actual data model tends to stay the same.

hawski · 3 years ago

I think that I and all non-technical folks around me experience issues with applications performance daily. I think that sometimes "making it work" should include some particular performance metrics. If it is not fast enough it doesn't really work. Now "fast enough" is something to be defined and different depending on the application part.

Far too often I see applications that assume low latency and unbreakable Internet connection. They seem to do almost no caching at all. For example thumbnails.

Also many of applications will be almost unusable (or trigger OOM) when you try to work with a big file. Sometimes a big file has merely tens of MB, sometimes problems start with a 3MB file. Those are the issues that occur without thinking about performance from the start - memory is free, you can copy things around, everything will fit in RAM.

One more thing. When your application consists of a client and a server it may turn out that you will put yourself in a corner when not thinking about performance early on. Everything will work without any troubles at first and then it turns out there are some latency issues with more data and you can't easily upgrade the client for example. Or you had an architecture that allows to spin up more servers and handle the load closer to the client, but it can cut your margins.

scotty79 · 3 years ago

I heard it as

make it work

make it work correctly

make it work fast

Pretty was never in the picture. But if anyone wants to add it, it should come last.

resonious · 3 years ago

Sure I leave apps sitting around waiting for my input quite often. But then when I do start inputting stuff, they lock up, fail to respond to my inputs, show me loading screens, blank screens, delayed responses, and otherwise waste my time. Pretty silly given how much money I pay for the hardware.

AnIdiotOnTheNet · 3 years ago

> Look, most modern software is spending 99.9% of the time waiting for user input

If that's true, why does it take forever to load and frequently fail to keep up with my input?

chpatrick · 3 years ago

Often it's because of bad (quadratic) algorithms, not because the code isn't micro-optimized. For example: https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...

coldtea · 3 years ago

>Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something.

I hear this a lot, but that 0.1% when I'm actually waiting to do its calculation, it better be fast.

And the rest 99.9% of time, it better not eat my battery and memory...

naniwaduni · 3 years ago

Value of time is disproportionately weighted by user attention, which is at its highest right around when user input is happening.

iamcreasy · 3 years ago

> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something.

This talk(Preventing the Collapse of Civilization) by Jonathan Blow disagrees with you.

Link: https://www.youtube.com/watch?v=ZSRHeXYDLko

harry8 · 3 years ago

>Look, most modern software is spending 99.9% of the time waiting for user input

That's fine, that's as it should be and isn't an interesting metric. The computer should wait for me, not the other way around. Needlessly waiting for the computer is a sign of s%&t software and not as uncommon as we'd like, huh?

22289d · 3 years ago

Did you read the title? He's addressing performance.

And you're saying for many developers, performance is not the biggest priority.

That's fine. But it doesn't make him even 0.000001% wrong. And he's not applying anything to any niche situations. You just missed his point. Performance.

thescriptkiddie · 3 years ago

In my experience, software has been getting slower faster than hardware has been getting faster. Bringing focus back to performance would be a welcome improvement.

drivers99 · 3 years ago

It still matters because in your example it will affect how smoothly the computer responds once it gets the user input.

xupybd · 3 years ago

But how much does that matter?

If you're scaling to 1000s of users then yes. If you have a GUI for a monthly task that two administrators use, then no.

The less something gets used the longer the payback time on the initial development.

sidlls · 3 years ago

Like everything else in the software industry the context does matter: at larger scales small gains in performance translate to large savings in costs (infrastructure, maintenance, etc.).

Also, "clean code" (as in from the "Clean Code" book) is generally not good advice for most programs anyway. Not only does it eat performance, it's not all that great for building maintainable, extensible systems.

croes · 3 years ago

>Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1%

Wrong focus. Software time doesn't matter, user time does.

Users don't want to wait, even slow typists want instand results as soon as they hit Enter.

scotty79 · 3 years ago

What does it matter if it does this 0.1% ten times slower than it could? Then user will have to wait for the software which slows the most expensive component of the whole work setup, the human.

adgjlsfhk1 · 3 years ago

if the software ever takes more than 10ms to do something it is stealing time from the human. the human is the slowest part of the system so nothing else should ever make the slowest part even slower.

whstl · 3 years ago

It matters because, to give an example, Facebook still isn't fast enough to keep up with my typing speed. And I'm not that fast.

Deleted Comment

matheusmoreira · 3 years ago

Most software can barely keep up with the display's refresh rate due to their unacceptably high latencies.

hdhrufjdi · 3 years ago

I think we absolutely should care, because when the devices do something, the user still expects software to be fast. So if it is not fast enough, he buys a new device, because this is what he can do. He can't just buy the software rewritten. Usually

This is bad for sustainability

dusted · 3 years ago

Look, most modern software is running in datacenters serving millions of users.

> I think the author is taking general advice and applying it to a niche situation.

I think the author is taking a general situation and applying common sense to it.

nottorp · 3 years ago

> Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something.

So why isn't my browser at 0% CPU when IN THE BACKGROUND then?

meindnoch · 3 years ago

>most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something

Bad mindset.

A GFCI breaker spends 99.99999% of the time waiting with zero leakage current. Yet, when it does detect leakage current, you want the breaker to trip as quickly as possible.

See where I'm going?

Imagine if it would take 5 seconds from flicking a light switch until the lights actually turn on. Because the switch is waiting for user input 99.99% of the time, right? Would you install that light switch in your home?

ajmurmann · 3 years ago

Don't forget the 90% of the processing time that it's waiting for a DB response

TeMPOraL · 3 years ago

Except when it isn't. I remember an article making rounds the other day, that claimed the whole "most software spend most time waiting on I/O" common wisdom is no longer true, as most software these days is CPU-bound, and a good chunk of that is parsing JSON.

ilyt · 3 years ago

Eh, that heavily depends on language and dataset you're working with. I've seen "simple" data with some fat thing like RoR on top of it having 10x the latency of the underlying database after all the ORMing.

FridgeSeal · 3 years ago

I’d like to work in one of these teams/products where the database is the bottleneck. Basically everywhere I’ve worked, I’ve had to work with backends that have been foot-draggingly slow compared to the actual database.

darkwater · 3 years ago

Yes, because of some clean and readable way (for the developer) the code uses to interact with the DB...

osigurdson · 3 years ago

I think there is a relatively comfortable middle ground. High quality, readable and performant code are not mutually exclusive optimization quantities - though they will start to compete against each other in the extreme. Often, O(WTF) algorithms are complicated, full of useless fluff and hard to understand - occasionally attempting to follow Uncle Bob's unresearched and unexamined ideas.

jacoblambda · 3 years ago

Honestly in many cases you can have your cake and eat it too if you just write in a functional or data-flow style rather than a rigorous OO style. Since this is C++, using std::algorithm or another algorithms library would let you abstract your implementation details while relying on the compiler's ability to optimise/vectorise/inline code as needed.

This applies doubly so if you can rely on templates & structural typing to push your polymorphism to compile time. clang & gcc are surprisingly good at optimisation as long when you don't have to bounce off a vtable and code is clean / avoids "manual optimisation".

Also while I'm not saying I don't believe the author here, I wish they would have used https://quick-bench.com/ or https://godbolt.org/ so that readers could trivially verify the results & methodology.

photochemsyn · 3 years ago

So if understand correctly, insulate the performant code by wrapping it in a clean code buffer that protects it from the nefarious user input?

hburd · 3 years ago

I don’t think the percentage of time waiting for input has anything to do with this. Outside of video games, the way most people will see performance problems is in the latency of their UI interactions. You press a button and want to see the result as fast as possible.

In other words, the user’s entire perception of your program’s performance falls into that 0.1%.

gspencley · 3 years ago

Performance is not an absolute. At the end of the day it is about user experience. From a computer science point of view, we can measure memory and CPU usage, but if the users haven't been complaining then what problems are you actually solving (at least from a product POV)?

Performance for performance sake is an interesting and appealing challenge to us engineers. I was writing C code in the 90s and I miss being that close to the hardware, trying to spare every clock cycle that I could while working with machines that had sparse resources.

But today I'm building SaaS products for millions of simultaneous active users. When customers complain about performance it is often not what us engineers think of as "performance." They're NOT saying things like "Your app is eating all memory on my phone" or "the rendering of this table looks choppy." It's usually issues related to server-side replication lag causing data inconsistencies or in some cases network timeouts due to slow responding services.

The point is the age old advice that we were giving aspiring game programmers back in the 90s:

Figure out and understand your priorities.

The famous inverse square root function in the Quake III Arena source code is a great example. If memory serves me, they needed this calculation as part of their particle physics engine. The problem is that calculating inverse square roots precisely is very expensive, especially at the scale they were required to. So they exploited how 32-bit floating point numbers are represented in binary in order to do a fast, good enough approximation. This is a good example of a targeted, purposeful optimization.

Back in the 90s we were obsessed over getting the most out of our hardware, especially when coding games. So we picked up all sorts of performance hacks and optimizations and learned how to code in assembly so that we could get even closer to the bare metal if we needed to. The result was impossible to understand and maintain code and so experienced engineers taught us young'uns:

Write clean code first, then profile to understand what your bottlenecks are, then to make TARGETED optimizations aimed at solving performance issues in order of priority.

That priority always being driven by user experience and/or scalability requirements.

Anything else is premature optimization. You're speculating about where your performance bottlenecks are, and you're throwing out maintainable code for speculative reasons; not actually knowing how much of an impact your optimizations are going to have on user experience.

Deleted Comment

berniedurfee · 3 years ago

100% agree.

Clean code (OOP, DRY, etc) is optimized for maintainability and extensibility, not necessarily performance.

In fact, I think it’s pretty well understood that clean code is a tradeoff wrt performance, at least that’s the way I’ve always understood it.

Clean code works well for something like a web app that’ll need to be maintained by scores of different engineers over many years or decades.

At least that’s the theory. In practice, at least some level of abstraction makes it a bit easier to rip and replace parts of the app without a total rewrite.

DeathArrow · 3 years ago

While I agree with Casey, for some situations it's hard to do. You can't really develop web app in C# and Java in a simple way. Not only you have to fight the language design, but all framework and libraries are written with OOP, clean code and design patterns in mind.

So you might write your code in a straight way, careful to not lose efficiency and then you are going to call slow code.

So his advices are easier to follow on some scenarios than others.

JustSomeNobody · 3 years ago

There is speed and there's the perception of speed. Some code (games) has to run fast. But most of the code we work on has to only have the perception of speed. If you're loading all of your resources and making the user stare at a twirly, you're doing it wrong.

benj111 · 3 years ago

>Look, most modern software is spending 99.9% of the time waiting for user input, and 0.1% of the time actually calculating something.

Is it? What processes are running on your computer right now, how many are waiting for your interaction?

Further the 0.1% of the time I do interact, I want the results promptly.

julik · 3 years ago

What if you have a growing team of N people, and they need to be able to add shape subclasses / new shape logic all the time? As with anything this is a tradeoff. Often having decoupled code is more important than raw performance.

DeathArrow · 3 years ago

I wonder if you ever used Slack or Microsoft Teams and find them fast and reliable.

smnplk · 3 years ago

Actually in his example, he goes down from 35 cycles to 3 cycles per shape.

fartsucker69 · 3 years ago

There is nothing more maintainable about the clean code variant either.

gjulianm · 3 years ago

There is. The first one is that the code with the switch pattern can only process simple shapes. Let's say you want to get the area of a donut now. Well, you need to change the whole code to compute the area of the union. Or imagine that there are other places in the code that need to know the area of a single shape. Do they need to copy/paste the same code?

Applejinx · 3 years ago

And then when you're not waiting for user input and you have a simple keystroke… how long is it taking you to show the result on the screen?

dogen · 3 years ago

dude then why does most modern software feel like shit even on high performance systems?

Well written video games, the kind of thing Casey works on usually, beat the hell out of basically any other category of software in terms of user responsiveness. At least of all the software I use regularly.

paxys · 3 years ago

Yeah, the article might be technically correct but is ultimately pointless. In almost every software engineering environment the priority is always going to be writing readable and composable code over something that runs 5 microseconds faster. All of your clever efficiency gains are anyways going to be wiped out by a single database call.

yieldcrv · 3 years ago

always be aware of the computational time. but also be aware of where the real bottlenecks are.

every place we work at is FAR more likely to scale how many compute instances we are using, than optimize the application.

AbusiveHNAdmin · 3 years ago

"So by violating the first rule of clean code — which is one of its central tenants"

The word is TENET. Tenants rent stuff.

kilgnad · 3 years ago

The gaming industry is not 0.1 percent.

Dead Comment

Aeolun · 3 years ago

I’m always glad to save nanoseconds on my code so that we get the absolute best performance out of that 10s long call to the legacy API.

naikrovek · 3 years ago

> I’m always glad to save nanoseconds on my code so that we get the absolute best performance out of that 10s long call to the legacy API.

you're right. some things are inevitably slow so we should therefore never care about performance in any situation.

avoid belittling the efforts here just because they don't apply to all situations.

randomdata · 3 years ago

> I think the author is taking general advice

Not just general advice, but advice that is meant to be applied to TDD specifically. The principals of clean code are meant to help with certain challenges that arise out of TDD. It is often going to seem strange and out of place if you have rejected TDD. Remember, clean code comes from Robert C. Martin, who is a member of the XP/Agile gang.

The author cherry-picking one small piece out of what Uncle Bob promotes and criticizing it in isolation without even mentioning why it is suggested with respect to the bigger picture seemed disingenuous. It does highlight that testable code may not be the most performant code, but was anyone believing otherwise? There are always tradeoffs to be made.

larsonnn · 3 years ago

But is this trade off not a little bit to much?

It reminds me of Twitter or other companies which starting to change programming languages for performance reasons.

So he puts polymorphic function calls into enormous loops to simulate a heavy load with a huge amount of data to conclude "we have 20x loss in performance everywhere"? He is either a huge troll or he has a typical fallacy of premature optimization: if we would call this virtual method 1 billion times we will lose hours per day, but if we optimize it will take less than a second! The real situation: a virtual method is called only a few hundred times and is barely visible in profiling tools.

No one is working with a huge amount of data in big loops using virtual methods to take every element out of a huge dataset like he is showing. That's a false pre-position he is trying to debunk. Polymorphic classes/structs are used to represent some business logic of applications or structured data with a few hundred such objects that keep some states and a small amount of other data so they are never involved in intensive computations as he shows. In real projects, such "horrible" polymorphic calls never pop up under profiling and usually occupy a fraction of a percent overall.

jiggawatts · 3 years ago

> The real situation: a virtual method is called only a few hundred times and is barely visible in profiling tools.

The reality is that the entire Java ecosystem revolves around call stacks hundreds of calls deep where most (if not all) of those are virtual calls through an interface.

Even in web server scenarios where the user might be "5 milliseconds away", I've seen these overheads add up to the point where it is noticeable.

ASP.NET Core for example has been optimised recently to go the opposite route of not using complex nested call paths in the core of the system and has seen dramatic speedups.

For crying out loud, I've seen Java web servers requiring 100% CPU time across 16 cores for half an hour to start up! HALF AN HOUR!

mda · 3 years ago

I bet that half an hour startup is not because of nested calls at all. I worked with a ton of Java code and if something is slow, it is usually shitty I/O related code or some algorithmic stupidity, not because of virtual calls and what not.

Dead Comment

_gabe_ · 3 years ago

> So he puts polymorphic function calls into enormous loops to simulate a heavy load with a huge amount of data to conclude "we have 20x loss in performance everywhere"?

You're mistaken, the load size has nothing to do with the end result. The result is normalized to give an estimate of how much faster the simple code is than the polymorphic code irregardless of input size. (Kinda like deaths per 100k instead of giving an absolute number of deaths for statistics about diseases).

So yes, your code is running 20x slower than it should be all the time.

Especially when you make every class an interface, with... get this, one implementation! This is based on real world experience and is not a joke. There are real companies with real people that write real code where every single class is an interface with exactly one implementation. Which, as Casey has shown, results in upwards of a 20x slowdown in the worst case.

Obviously, you probably won't get a 20x speedup by getting rid of the polymorphic garbage. But it's equally asinine to assume that polymorphic functions are only called a few hundred times. I guarantee you your PC is making millions of polymorphic function calls per minute between: the OS, the browser, windows Anti-Malware scanner, steam running in the background, oracle running its checks to remind you to update Java, etc. There are hundreds of processes running all the time on a modern device, these devices are wasting enormous amounts of resources.

hsn915 · 3 years ago

> Especially when you make every class an interface, with... get this, one implementation! This is based on real world experience and is not a joke.

And when you run this through a profiler, you will not notice how slow your code is, because everything is slow. Slowness is infused throughout the whole system.

MikeCampo · 3 years ago

Just because you haven't been exposed this issue doesn't mean it doesn't exist. "the real situation", "no one", "in real projects", "never pop up"...give me a break lol.

kaba0 · 3 years ago

One can reasonably well guess/know the expected input sizes to their programs. You ain’t (hopefully) loading your whole database into memory, and unless you are writing a simulation/game engine or another specialized application, your application is unlikely to have a single scorching hot loop, that’s just not how most programs look like. If it is, then you should design for it, which may even mean changing programming languages for that part (e.g. for video codecs not even C et al. cut it, you have to do assembly), but more likely you just use a bit less ergonomic primitive of your language.

mmarq · 3 years ago

His classes are doing a single multiplication, of course dynamic dispatch would have a significant cost in this scenario.

Taniwha · 3 years ago

Occasional CPU architect here .... probably the worst thing you can do in your code is to load something from memory (the core of method dispatch) and then jump to it, it sort of breaks many of the things we do to optimise our hardware - it causes CPU stalls, branch prediction failures, etc etc

There is one thing worse you can do (and I caught a C++ compiler doing it when we were profiling code while building an x86 clone years ago) instead of loading the address and jumping to it push the address then return to it, that not only breaks pipelines but also return stack optimisations

badsectoracula · 3 years ago

> instead of loading the address and jumping to it push the address then return to it

I remember doing that in a code generator ages ago because it was easier than calculating the jump offset :-P

saagarjha · 3 years ago

Most people do neither, but call into something that may do these things.

lightbendover · 3 years ago

> No one is working with a huge amount of data in big loops using virtual methods to take every element out of a huge dataset like he is showing.

Things way worse than that exist. Replace "virtual method" with "service call."

josephg · 3 years ago

> Things way worse than that exist.

Yeah. I opened discord earlier, and it took about 10 seconds to open. My CPU is an apple M1, running about 3ghz per core. Assuming its single threaded (it wasn't), discord is taking about 30 billion cycles to open. (Or around 50 network round-trips at a 200ms ping).

Crimes against performance are everywhere.

kyle-rb · 3 years ago

"Don't make tons of RPCs" is a totally separate issue from "don't make subclasses because virtual methods cost a few extra cycles".

icedchai · 3 years ago

Query loops are also just as problematic: looping through a result set, and making another query (or worse, N queries) per result, etc.

roflyear · 3 years ago

As in like a micro service? Ahahaha. Our CTO just pushed for microservices everywhere and we're not even that far along and we're chasing all kinds of performance problems. Insanity.

samdafi · 3 years ago

Just a quick nudge back on this: people in DSP would disagree with the assertion that nobody is going over big loops and using a virtual method on each element. We often have to process at least 88k elements per second in real time, through many many different processes. If any of those processes are defined using factories that spit out classes with polymorphic inheritance and virtual functions it certainly becomes an issue.

As a result some styles of writing code just don’t work for the audio thread at all, and we’d have to simply avoid or rewrite libraries written this way.

There are just some domains where standard practice for cleanliness is different because of your constraints.

I mean, it’s to the point we’ve got die hards in this industry who insist on putting all functions inlined in headers (not that I agree!)

Gluber · 3 years ago

I agree with your sentiment. But those things exist (not that that validates the authors argument) and I still shake in terror when during covid I was asked to take a look at a virus spread simulation (cellular automaton) that was written by a university professor and his postdoc team for software engineering at a large university that modeled evey cell in a 100k x 100k grid as a class which used virtual methods for every computation between cells. Rewrote that in Cuda and normal buffers/ arrays.. and an epoch ran in milliseconds instead of hours.

phtrivier · 3 years ago

In all fairness to them, "simulating many stuff interacting with each other" is the poster child of OO. It's just, that, well, it's not how CPU works.

Then again, at some point we had "Lisp machines", maybe some day there will be a computer architecture where memory / computations patterns are adapted to massive simulation - rather than shoehorning on existing architecture.

And those will fail just as miserably as Lisp machines.

adamrezich · 3 years ago

> No one is working with a huge amount of data in big loops using virtual methods to take every element out of a huge dataset like he is showing.

this is exactly how the typical naïve game loop/entity system works.

TheBigSalad · 3 years ago

Still... this isn't the reason games are slow.

cma · 3 years ago

The clean code methods can be even worse outside of tight loops, in the tight loop all the vtable lookup stuff was cached.

jackmott42 · 3 years ago

The same problems happen when you have 1000 requests being dealt with simultaneously each working on small collections. Web Servers for real businesses do not sit idle, they churn at high % and reducing CPU load on them lets you save money, and/or improve latency for users, which can make you money.

So go on all of you, write everything in Python with 90 levels of indirection, my stock will go up.

Ygg2 · 3 years ago

Reminds me of a joke where programmer optimized most frequently used method in imgur clone from 1s to 0.01s, because customer complained UI was slow to respond.

Congratulations. Taps on the back, champagne all around. Customers call. Same complaint.

Programmer asks "Well, did something change at least?". "Loading bar now flickers more", answers customer.

timeon · 3 years ago

> a virtual method is called only a few hundred times

The particular example with shapes could be CAD or BIM app where it is usually more then "few hundred times".

scotty79 · 3 years ago

> "we have 20x loss in performance everywhere"?

I guess only in the places you call methods?

Wait? Are they everywhere? Hmmm...