Revisiting Knuth's “Premature Optimization” Paper

I think the problem with the quote is that everyone forgets the line that comes after it.

  We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

  vvvvvvvvvv
  Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.
  ^^^^^^^^^^

This makes it clear, in context, that Knuth defines "Premature Optimization" as "optimizing before you profile your code"

@OP, I think you should lead with this. I think it gets lost by the time you actually reference it. If I can suggest, place the second paragraph after

  > People always use this quote wrong, and to get a feeling for that we just have to look at the original paper, and the context in which it was written.

The optimization part gets lost in the middle and this, I think, could help provide a better hook to those who aren't going to read the whole thing. Which I think how you wrote works good for that but the point (IMO) will be missed by more inattentive readers. The post is good also, so this is just a minor critique because I want to see it do better.

https://dl.acm.org/doi/10.1145/356635.356640 (alt) https://sci-hub.se/10.1145/356635.356640

Swizec · 2 months ago

Amdahl’s Law is the single best thing I learned in 4 years of university. It sounds obvious when spelled out but it blew my mind.

No amount of parallelization will make your program faster than the slowest non-parallelizable path. You can be as clever as you want and it won’t matter squat unless you fix the bottleneck.

This extends to all types of optimization and even teamwork. Just make the slowest part faster. Really.

https://en.wikipedia.org/wiki/Amdahl%27s_law

adrian_b · 2 months ago

While Amdahl’s Law is very important, its practical effects are very frequently overestimated, at least as frequently as Knuth is misquoted.

Simple problems, e.g. solving a system of equations, will usually include some non-negligible sequential part, which, according to Amdahl’s Law will limit the amount of speed-up provided by hardware parallelism.

On the other hand, complex problems, e.g. designing an integrated circuit, can usually be decomposed in a very great number of simpler subproblems that have weaker dependencies between them, than between the parts of a subproblem, so that by distributing the execution of the simple subproblems over parallel hardware that executes sequentially each subproblem you can obtain much greater acceleration factors than when attempting to parallelize the execution of each simple subproblem.

With clever decomposition of a complex problem and with good execution planning for its subtasks, it is much easier to approach the performance of an embarrassingly parallel problem, than when trying to find parallel versions of simple algorithms, whose performance is frequently limited to low values by Amdahl’s Law.

Amdahl’s Law frequently prevents you from reducing the execution time of some task from 1 minute to 1 second, but it normally does not prevent you from reducing the execution time of some task from 1 year to 1 week, because a task so complex to have required weeks, months or years before parallelization normally contains a great enough number of weakly-coupled subproblems.

hinkley · 2 months ago

> faster than the slowest non-parallelizable path

Rather, than the slowest non-parallelized path. Ultimately you may reach maximum speed on that path but the assumptions that we are close to it often turn out to be poorly considered, or considered before 8 other engineers added bug fixes and features to that code.

From a performance standpoint you need to challenge all of those assumptions. Re-ask all of those questions. Why is this part single threaded? Does the whole thing need to be single threaded? What about in the middle here? Can we rearrange this work? Maybe by adding an intermediate state?

godelski · 2 months ago

  > It sounds obvious when spelled out but it blew my mind.

I think there's a weird thing that happens with stuff like this. Cliches are a good example, and I'll propose an alternative definition to them.

  A cliche is a phrase that's so obvious everyone innately knows or understands it; yet, it is so obvious no one internalizes it, forcing the phrase to be used ad nauseam

At least, it works for a subset of cliches. Like "road to hell," "read between the lines," Goodheart's Law, and I think even Amdahl's Law fits (though certainly not others. e.g. some are bastardized, like Premature Optimization or "blood is thicker than water"). Essentially they are "easier said than done," so require system 2 thinking to resolve but we act like system 1 will catch them.

Like Amdahl's Law, I think many of these take a surprising amount of work to prove despite the result sounding so obvious. The big question is if it was obvious a priori or only post hoc. We often confuse the two, getting us into trouble. I don't think the genius of the statement hits unless you really dig down into proving it and trying to make your measurements in a nontrivially complex parallel program. I think that's true about a lot of things we take for granted

nurettin · 2 months ago

There is also the "death by a thousand cuts" kind of slowness that accrues over time and it doesn't really matter where you start peeling the onion and the part you started is rarely the best.

ilc · 2 months ago

Interestingly, you didn't learn the full lesson:

When optimizing, always consider the cost of doing the optimization vs. it's impact.

In a project where you are looking a 45/30/25 type split. The 45 may actually be well optimized, so the real gains may be in the 30 or 25.

The key is to understand the impact you CAN have, and what the business value of that impact is. :)

The other rule I've learned is: There is always a slowest path.

wiz21c · 2 months ago

Before optimizing, I always balance the time I'll need to code the optimization and the time I (or the users of my code) will effectively gain once the optimization is there (that is, real-life time, not CPU time).

If I need three weeks to optimize a code that will run for 2 hours per month, it's not worth it.

sethammons · 2 months ago

A former boss: an optimization made at a non-bottleneck is not an optimization.

chinchilla2020 · 2 months ago

There is more to it than that.

1. Decide if optimization is even necessary.

2. Then optimize the slowest path

hinkley · 2 months ago

It is exactly this "lulled into complacency" that I rail against when most people cite that line. Far too many people are trying to shut down down dialog on improving code (not just performance) and they're not above Appeal to Authority in order to deflect.

"Curiosity killed the cat, but satisfaction brought it back." Is practically on the same level.

If you're careful to exclude creeping featurism and architectural astronautics from the definition of 'optimization', then very few people I've seen be warned off of digging into that sort of work actually needed to be reined in. YAGNI covers a lot of those situations, and generally with fewer false positives. Still false positives though. In large part because people disagree on what "The last responsible moment" in part because our estimates are always off by 2x, so by the time we agree to work on things we've waited about twice as long as we should have and now it's all half assed. Irresponsible.

godelski · 2 months ago

I'm with you, and been on a bit of a rampage about it lately. Honestly, just too much broken shit, though I'm not sure what the last straw was.

A big thing I rail against is the meaning of an engineer and that our job is about making the best product, not making the most profitable product (most times I bring this up someone will act like there's no difference between these. That itself is concerning). The contention between us engineers and the business people is what creates balance, but I'm afraid we've turned into yesmen instead. Woz needs Jobs, but Jobs also needs Woz (probably more than the other way around). The "magic" happens at the intersection of different expertise.

There's just a lot of weird but subtle ways these things express themselves. Like how a question like "but what about x problem" is interpreted as "no" instead of "yes, but". Or like how people quote Knuth and use it as a thought terminating cliche. In ML we see it with "scale is all you need."

In effect, by choosing to do things the easy way we are choosing to do things the hard way. Which this really confuses me, because for so long in CS the internalization was to "be lazy." Not in the way that you put off doing the dishes now but in the way that you recognize that doing the dishes now is easier than doing them tomorrow when you 1) have more dishes 2) the dishes you left out are harder to clean as the food hardens on the plate. What happened to that "efficient lazy" mindset and how did we turn into "typical lazy"?[0]

[0] (I'm pretty sure I need to add this) https://en.wikipedia.org/wiki/Rhetorical_question

motorest · 2 months ago

> It is exactly this "lulled into complacency" that I rail against when most people cite that line. Far too many people are trying to shut down down dialog on improving code (not just performance) and they're not above Appeal to Authority in order to deflect.

Your comment reads like a strawman argument. No one is arguing against "improving code". What are you talking about? It reads like you are misrepresenting any comment that goes against your ideas, no matter how misguided they are, by framing your ideas as obvious improvements that can only be conceivably criticized by anyone who is against good code and in favor of bad code.

It's a rehash of the old tired software developer cliche of "I cannot do wrong vs everyone around me cannot do right".

Ironically, you are the type of people Knuth's quote defends software from: those who fail to understand that using claims of "optimization" as a cheat code to push through unjustifiable changes are not improving software, and are actually making it worse.

> "Curiosity killed the cat, but satisfaction brought it back." Is practically on the same level.

This is the same strawman. It's perfectly fine to be curious. No one wants to take the magic out of you. But your sense of wonder is not a cheat code that grants you the right to push nonsense into production code. Engineers propose changes based on sound reasoning. If the engineers in your team reject a change it's unlikely you're surrounded by incompetent fools who are muffling your brilliant sense of wonder.

> If you're careful to exclude creeping featurism and architectural astronautics from the definition of 'optimization', (...)

Your comment has a strong theme of accusing anything not aligned with your personal taste as extremely bad and everything aligned with your personal taste as unquestionably good that can only possibly be opposed if there's an almost conspiratorial persecution. The changes you like are "improving code" whereas the ones proposed by third parties you don't like suddenly receive blanket accusations such as "creeping featurism and architectural astronautics".

Perhaps the problems you experience, and create, have nothing to do with optimization? Food for thought.

foobiekr · 2 months ago

A lot of people are still thinking of Knuth's comment as being about just finding the slow path or function and making it faster. What Knuth has talked about, however, and why any senior engineer who cares about performance has either been taught or discovered, is that most real optimization is in the semantics of the system and - frankly - not optimizing things but finding ways not to do them at all.

Tuning the JSON parser is not nearly as effective as replacing it with something less stupid, which is, in turn, not nearly as effective as finding ways to not do the RPC at all.

Most really performant systems of a certain age are also full of layer violations for this reason. If you care about performance (as in you are paid for it), you learn these realities pretty quickly. You also focus on retaining optionality for future optimizations by not designing the semantics of your system in a way that locks you permanently into low performance, which is unfortunately very common.

sfn42 · 2 months ago

Well said.

Another example of this is using the right algorithm for the job. I see a lot of code like let's say you have two lists and you need to go through the first one and find the corresponding element by id in the second. The naive implementation uses linear search of the second list, making the algorithm O(n^2). This will stall and take hours, days or longer to run when the lists get large, say into the tens to hundreds of thousands of elements.

If both lists come from your own database then you should have used a foreign key constraint and had the database join them for you. If you can't do that, let's say one or both lists come from a third-party api, you can create a dictionary(hashmap) of the second list to allow O(1) lookup rather than O(n) which brings your full algorithm's complexity to O(n). Now the you can have millions of elements and it'll still run in milliseconds or seconds. And if this is something you need to do very often then consider storing the second list in your database so you can use a join instead. Avoid doing the work, as you say.

These are the most common mistakes I see people make. That and situations where you make one request, wait for a response, then use that to make another request, wait for response and so on.

You don't need a profiler to avoid this type of problem, you just need to understand what you're doing at a fundamental level. Just use a dictionary by default, it doesn't matter if it might be slightly slower for very small inputs - it'll be fast enough and it'll scale.

jerf · 2 months ago

You can almost simplify it to simply observing that PREMATURE optimization is not the same as OPTIMIZATION.

Most people I see who get offended are reacting to the claim that optimization is never useful. But it's pretty easy to knock that claim over.

I don't deny that plenty of people use adjectives very sloppily, and that much writing is improved by just ignoring them, but Knuth is not one of them.

godelski · 2 months ago

A pet peeve of mine is how common it is for people to ignore qualifying words. Like you can say "most people do 'x'" and you can guarantee if you post that online people will come out of the woodwork saying "I don't do 'x'". Sure, the most claims can often be inaccurate, but those types of responses aren't helpful and aren't meaningfully responsive even if true. "Most" is just a very different word from "all". It implies a distribution but idky we often think things are discrete or binary.

I also see ignoring qualifiers as a frequent cause for online arguments. They're critical words when communicating and dropping them can dramatically change what's being communicated. And of course people are fighting because they end up talking about very different things and don't even realize it.

While I agree with you, that it's common for people to use these words sloppily, I think it is best to default to not ignore them. IME, in the worst case, it aids in communication as you get clarification.

psychoslave · 2 months ago

Preprofiling optimisation is stab in the dark.

Isn't that more clear and at least as concise as the original one, also using some idiomatic metaphor while avoiding the reference to mythical dark forces.

Mawr · 2 months ago

By definition anything premature is going to be bad. It's not any kind of an insight.

lisper · 2 months ago

The operative word was always "premature". As with everything, whether or not a particular optimization is worthwhile depends on its ROI, which in the case of software depends on how often your code is going to be run. If you are building a prototype that's only going to run once, then the only optimizations that are worthwhile are the ones that speed things up by times comparable to the time it take to write them. On the other hand, if you're writing the inner loop of an LLM training algorithm, a 1% speedup can be worth millions of dollars, so it's probably worth it even if it takes months of effort.

tialaramex · 2 months ago

> This makes it clear, in context, that Knuth defines "Premature Optimization" as "optimizing before you profile your code"

As with that Google testing talk ("We said all your tests are bad, but that's not entirely true because most of you don't have any tests"), the reality is that most people don't have any profiling.

If you don't have any profiling then in fact every optimisation is premature.

sitkack · 2 months ago

You have joined a noble line of Knuth Quote Expanders, https://hn.algolia.com/?dateRange=all&page=3&prefix=true&que...

They are all worth reading.

godelski · 2 months ago

Twice in the last 2 months. I'm not sure if that's a good thing or a bad thing hahaha

jayd16 · 2 months ago

Yes but the second half of the second half is "only after that code has been identified." So the advice is still 'don't waste your time until you profile.'

neves · 2 months ago

I'm appalled by the number of developers that don't know that profilers exist.

BTW, my junior developers don't know what a debugger is.

renecito · 2 months ago

too late, all the slackers got promoted and now are demanding to keep pushing features no one is asking about.

godelski · 2 months ago

Same is true for lots of things. Classic example is we are so leetcode obsessed because all the people that were leetcode obsessed got hired and promoted. We've embraced p-hacking, forgetting Goodhart's Law (and the original intention of leetcode style interviewing. It's just the traditional engineering interview, where you get to see how the interviewee problem solves. It is less about the answer and more about the thought process. It's easy to educate people on answers, but it is hard to teach someone how to think in a different framework... (how much money do we waste through this and by doing so many rounds of interviewing?))

I like this article. It’s easy to forget what these classic CS papers were about, and I think that leads to poorly applying them today. Premature optimisation of the kind of code discussed by the paper (counting instructions for some small loop) does indeed seem like a bad place to put optimisation efforts without a good reason, but I often see this premature optimisation quote used to:

- argue against thinking about any kind of design choice for performance reasons, eg the data structure decisions suggested in this article

- argue for a ‘fix it later’ approach to systems design. I think for lots of systems you have some ideas for how you would like them to perform, and you could, if you thought about it, often tell that some designs would never meet them, but instead you go ahead with some simple idea that handles the semantics without the performance only to discover that it is very hard to ‘optimise’ later.

godelski · 2 months ago

  > a ‘fix it later’ approach

Oh man, I hate how often this is used. Everyone knows there's nothing more permanent than a temporary fix lol.

But what I think people don't realize is that this is exactly what tech debt is. You're moving fast but doing so makes you slow once we are no longer working in a very short timeline. That's because these issues compound. Not only do we repeat that same mistake, but we're building on top of shaky ground. So to go back and fix things ends up requiring far more effort than it would have taken to fix it early. Which by fixing early your efforts similarly compound, but this time benefiting you.

I think a good example of this is when you see people rewrite a codebase. You'll see headlines like "by switching to rust we got a 500% improvement!" Most of that isn't rust, most of that is better algorithms and design.

Of course, you can't always write your best code. There's practical constraints and no code can be perfect. But I think Knuth's advice still fits today, despite a very different audience. He was talking to people who were too obsessed with optimization while today were overly obsessed with quickly getting to some checkpoint. But the advice is the same "use a fucking profiler". That's how you find the balance and know what actually can be put off till later. It's the only way you can do this in an informed way. Yet, when was the last time you saw someone pull out a profiler? I'm betting the vast majority of HN users can't remember and I'd wager a good number never have

sfn42 · 2 months ago

I completely agree with most of what you've said, but personally I rarely use a profiler. I don't need it, I just think about what I'm doing and design things to be fast. I consider the time complexity of the code I'm writing. I consider the amount of data I'm working with. I try to set up the database in a way that allows me to send efficient queries. I try to avoid fetching more data than I need. I try to avoid excessive processing.

I realize this is a very personal preference and it obviously can't be applied to everyone. Someone with less understanding might find a profiler very useful and I think those people will learn the same things I'm talking about - as you find the slow code and learn how to make it fast you'll stop making the same mistakes.

A profiler might be useful if I was specifically working to optimize some code, especially code I hadn't written myself. But for my daily work it's almost always good enough to keep performance in mind and design the system to be fast enough from the bottom up.

Most code doesn't have to be anywhere near optimal, it just has to be reasonably fast so that users don't have to sit and stare at loading spinners for seconds at a time. Some times that's unavoidable, some times you're crunching huge amounts of data or something like that. But most of the time, slow systems are slow because the people who designed and implemented them didn't understand what they were doing.

Joker_vD · 2 months ago

You also need to remember that back when those classic CS papers were written, the CPU/RAM speed ratios were entirely different from what they are today. Take, for instance, a Honeywell 316 from 1969 [0]: "Memory cycle time is 1.6 microseconds; an integer register-to-register "add" instruction takes 3.2 microseconds". Yep, back in those days, memory fetches could be twice as fast as the most simple arithmetic instruction. Nowadays, even the L1 fetch is 4 times as slow as addition (which takes a single cycle).

No wonder the classical complexity analysis of algorithms generally took memory access to be instantaneous: because it, essentially, was instantaneous.

[0] https://en.wikipedia.org/wiki/Honeywell_316#Hardware_descrip...

naniwaduni · 2 months ago

> It’s easy to forget what these classic CS papers were about, and I think that leads to poorly applying them today.

Notably, pretty much the entire body of discourse around structured programming is totally lost on modern programmers failing to even imagine the contrasts.

IshKebab · 2 months ago

I 100% agree. I could have written the same comment.

The biggest way I see this is picking an architecture or programming language (cough Python) that is inherently slow. "We'll worry about performance later" they say, or frequently "it's not performance critical".

Cut to two years later, you have 200k lines of Python that spends 20 minutes loading a 10GB JSON file.

dan-robertson · 2 months ago

I failed to put an important point on the above comment, which is spending too much time designing the perfect thing can lead to a system that is either never finished, or one that meets the goals but doesn’t do the work it was originally intended to do, and then it’s too late to pivot to doing the right thing.

If you develop in an environment where you have high velocity (eg python) you can much sooner learn that you are building the wrong thing and iterate.

Most systems do not have very high performance requirements. A typical python web application is not going to need to service many requests and the slow stuff it does is likely going to be waiting on responses from a database or other system. The thing to make such a program faster is to send better db queries rather than optimising the O(1 web page) work that is done in python.

sgarland · 2 months ago

I know this was an example, but if you get to that point and haven’t swapped out Python’s stdlib json library for orjson or some other performance-oriented library, that’s on you.

runevault · 2 months ago

It is interesting that there's so much discourse about the effort people have had to put into data structure and algorithm stuff for interviews, but then people refuse to take advantage of the knowledge studying that gives you towards trivial effort optimizations (aka your code can look pretty similar, just using a different data structure under the hood for example).

sfn42 · 2 months ago

That's because people don't get it. You see people saying things like "It's pointless to memorize these algorithms I'll never use" - you're not supposed to memorize the specific algorithms. You're supposed to study them and learn from them, understand what makes them faster than the others and then be able to apply that understanding to your own custom algorithms that you're writing every day.

I have the uncomfortable feeling that others are making a religion out of it, as if the conceptual problems of programming could be solved by a single trick, by a simple form of coding discipline!