This article and its associated HN comment section continue in the long tradition of Big O Notation explainers [0] and getting into a comment kerfuffle over the finer, technical points of such notation versus its practical usage [1]. The wheel turns...
From what I read in the comments of the first post, the Pyon guy seems very toxic and pedantic, but the rebuttal by Ned isn't great. For example, nowhere in the rebuttal is the pedantic technical detail ever actually described. In fact the prose reads very awkwardly in order to circumlocute around describing it, just repeatedly naming it "particular detail". In my view, the author overreaches: he dismisses Pyon not only for the delivery of his criticism (which was toxic) but also the content of his criticism (why?).
Ultimately Ned is in the right about empathy and communication online. But as an educator it would have been nice to hear, even briefly, why he thought Pyon's point was unnecessarily technical and pedantic? Instead he just says "I've worked for decades and didn't know it". No one is too experienced to learn.
EDIT: I just skimmed the original comment section between Pyon and Ned and it seems that Ned is rather diplomatic and intellectually engages with Pyon's critique. Why is this level of analysis completely missing from the follow-up blogpost? I admit to not grasping the technical details or importance, personally, it would be nice to hear a summarized analysis...
Such is the curse of blogging: when writing a series of posts, authors naturally assume that the readers are as engaged and as familiar with the previous discussion as they are.
In reality, even regular subscribers probably aren't, and if you're a random visitor years later, the whole thing may be impossible to parse.
> Why is this level of analysis completely missing from the follow-up blogpost
Because that's not what the article is about. It's not about whether Pyon was correct or wrong, it's that they were a dick about it. Their opening words were "you should be ashamed". They doubled and tripled down on their dickery later on.
And no matter how good your point is or how right you are: if you're a dick about it people will dislike you.
I think the lesson of those articles is not that people should stop trying to correct a misleading or incorrect explanation, but rather, that some people on the internet (like the "expert" described there) are more interested in picking and winning "fights" rather than gently helping the author correct his article. If you see Pyon's comments, he was very aggressive and very internet-troll-like.
The wrong take from this would be "... and therefore, technical details don't matter and it's ok to be inaccurate."
One caveat here is that the author of the article posted it here in HN for comments -- it's not that someone else did, and this is unfair because HN was never supposed to take a look, etc. They expected a review, otherwise they wouldn't have posted it here.
HN is not an audience of laypeople (mostly) and will critique the article with a different mindset than a novice that might be impressed by the visuals (which are impressive).
So I think the reaction is both to be expected and reasonable: HN will critique how correct the explanation is, and point out the mistakes. And there were a couple of fundamental mistakes due to the author not being a subject matter expert.
Good intro! I first learned Big O from Cracking the Coding Interview since many universities in Europe notoriously skip complexity notations even in basic programming classes. This definitely explains it in a much simpler way.
Toxic expert here! I hate when blog posts try to teach complex subjects. It's almost always a non-expert doing the teaching, and they fail to do it accurately. This then causes 1) the entire internet repeating the inaccuracies, and 2) the readers make no attempt to do further learning than the blog post, reinforcing their ignorance.
I'll double down on my toxicity by saying I didn't like the page layout. As someone with ADHD (and a declining memory), I need to be led through formatting/sub-headings/bullets/colored sections/etc into each detail or it all blends together into a wall of text. The longer it takes to make a point (visually and conceptually), the more lost I am. I couldn't easily follow it. The Simple Wikipedia page was more straight to the point (https://simple.wikipedia.org/wiki/Big_O_notation), but reading the "full" Wikipedia page thrusts you headlong into a lot of math, which to me signifies that this shit is more complex than it seems and simplifying it is probably a bad idea.
> Toxic expert here! I hate when blog posts try to teach complex subjects. It's almost always a non-expert doing the teaching, and they fail to do it accurately. This then causes 1) the entire internet repeating the inaccuracies, and 2) the readers make no attempt to do further learning than the blog post, reinforcing their ignorance.
Ask yourself why. The usual answer is that top experts either can't be bothered to create better content, or they actively gatekeep, believing that their field must remain hard to learn and the riff-raff must be kept out.
I think the first step is to accept that laypeople can have legitimate interest in certain topics and deserve accessible content. The remedy to oversimplified explanations is to write something better - or begrudgingly accept the status quo and not put people down for attempts that don't meet your bar.
It's also good to ponder if the details we get worked up about actually matter. Outside the academia, approximately no one needs a precise, CS-theoretical definition of big-O notation. Practitioners use it in a looser sense.
Writing is one of the best ways to learn something. Maybe non-experts learn something by writing about it?
Don't think the entire internet is repeating inaccuracies. :) I also believe there are readers that attempt to learn further than a blog. A blog post can inspire you to learn more about a topic, speaking from personal experience.
If there were no blog posts, maybe there would be no HN I believe.
There should be a place for non-experts. One could remain skeptical when they read blog posts without hating blog posts about complex topics written by non-expert.
My experience is the opposite. I hate the colorful books with many little boxes, pictures with captions in floaters, several different font sizes on the page, cute mascots etc, where even the order or reading is unclear.
Instead I found it much easier to learn from old books made before the 70s-80s, sometimes back into the 40s. It's single column, black on white, but the text is written by a real character and is far from dry. I had such a book on probability and it had a chapter on the philosophical interpretations of probability, which took the reader seriously, and not by heaping dry definitions but with an opinionated throughline through the history of it. I like that much better than the mealy mouthed, committee-written monstrosity that masks any passion for the subject. I'd rather take a half page definition or precise statement of a theorem where I have to check every word but I can then trust every word, over vague handwavy explanations. Often these modern books entirely withhold the formal definitions so you're left in a vague uneasy feeling where you kind of get it, have questions but can't find out "is it now this way, or that way, precisely?". And I'm not so sure that these irrelevant-cute-pics-everywhere books are really better for ADHD at the end of the day, as opposed to distraction free black on white paragraphs written by a single author with something to say.
Late to the party but as a layperson who knows just enough about programming to be dangerous, I found the article to be enlightening.
Never assumed that reading it would turn me into an expert at it. Never heard of big O till now but find it a fascinating way to look at algorithm construction.
One more thing: if every article about everything has to be a flawless exhaustive explanation then I suppose only 13 people would even know about relativity. There is plenty of room for content that simply introduces an idea and leaves filling in the gaps to the reader's discretion.
I think the criticism here is not that the article is incomplete ("not exhaustive") but rather that it makes several fundamental mistakes and doesn't actually describe Big O. It describes a "pop culture" version of Big O, which is not Big O, but calls it Big O.
It's not useless, and a lot of care seems to have gone into it, but ultimately it's a lay person explaining something they don't firmly grasp, and making a couple of serious mistakes in the process.
O(1) in many cases involves a hashing function which is a non-trivial but constant cost. For smaller values of N it can be outperformed in terms of wall clock time by n^2 worst case algorithms.
For example, in applications where the sorted form can be readily maintained, a decent B+-tree tends to massively outperform a hashmap as soon as you get the streamed/non-indexed side (of what's in a way a lookup join) to opportunistically batch items:
as when you sort your lookups you can use exponential forward search (compare at exponentially increasing distances from the cursor; once you're past the target, run binary search between this now upper bounds and the previous probe point as lower bound) in your index for each next key to reduce the per-lookup cost to be logarithmic in distance from the last-looked-up key (asymptotically always better than single isolated lookups; in practice tends to cap out at 2x the cost in pathological cases if you respect page locality of B+-tree structures).
If you aren't ignorant of your LRU cache set during such, you'll get by with overall significantly fewer memory accesses to e.g. fresh DRAM pages (let alone NAND pages) than with a hashmap setup.
I've severely sped up a C++ program by replacing an `std::unordered_map` with a `std::vector` (iirc technically a pair of; for SoA purposes) by realizing that I could collect the items unordered; sort the structure; then use binary search instead of hashmap lookups. The function that I modified ran something like 3x faster as a result, and that's without anything like vectorization-friendly structures or such.
I mean, true obviously, but don't say that too loud lest people get the wrong ideas. For most practical purposes n^2 means computer stops working here. Getting people to understand that is hard enough already ;)
Besides, often you're lucky and there's a trivial perfect hash like modulo.
Well, it doesn't seem that obvious, at least not to everyone, because several times I've seen people rewrite things to make it O(1) when it was actually slower for the vast majority of cases (or even all cases).
Having gone through electrical engineering, I always felt that Big O notation was never properly explained unless I happened to miss every time it was introduced. It always felt like it was just hand waived as something we should already know. I'm curious what level of math or computer science is usually responsible for introducing it.
My university had an "algorithm analysis" class (required for CS majors) which covered it along with various kinds of proofs. That was a junior/senior year class though; I think it was really expected that you were already somewhat familiar with the concept by the end of your first year (via osmosis I guess).
What an asshole way of explaining it. For practical purposes (don't @ me if this is technically wrong, I don't give a shit) it's a description of how the cost of an algorithm grows with the size of the input. For example if the cost doubles every time the input increments then it's O(n^2). If the cost increases in step with the input size then it's O(n). Simples.
Comp sci 101. Learned it my first year. There really isn't anything to it. It just describes how the number of operations grow as the number of inputs in your algorithm increases. It looks scary, but it's very simple and straightforward.
It can get a lot more complicated though. At times there are more parameters to an algorithm's complexity than just `n`. E.g., the parameters for big-O might be `n`, `K`, and `ρ`, and some expressions in the big-O might involve combinations of those parameters.
In such cases as well, one might know for a specific application of the algorithm that, for example, `ρ` is bounded, and so the big-O becomes more influenced by the `n` and `K`.
Big O was introduced in my Data Structures and Algorithms (DSA) class which was nominally a second year undergrad class. In practice, most Computer Science (CS) students were far enough advanced to take DSA in their first year.
I took DSA as a third-year Electrical and Computer Engineering student. I recall maybe 15 minutes of exposition on the topic and exam questions of a near-throwaway nature (ergo: write this tree parsing algorithm, 95 points for your code's correctness, 5 points for explaining your algorithm's Big O complexity). It did seem like something that CS professors and teaching assistants considered either trivial or self-explanatory.
That’s because you could define it in a lot of different ways. For example, if you define it as the number of steps it takes some Turing machine representing some algorithm to halt, then there is no such thing as a logarithmic time algorithm; O(lg n) = O(1)
Every time I see an introduction to Big-O notation, I immediately go look for a section about the worst case to find the most common, fundamental misconception that is present in nearly every such text.
Lo and behold:
> Because it's common for an algorithm's performance to depend not just on the size of the input, but also its arrangement, big O notation always describes the worst-case scenario.
This is not true. I'm on my phone, so I can't be bothered to explain this in more detail, but there is plenty of information about this online.
O() doesn't necessarily describe worst-case behaviour. It just provides an asymptotic upper bound. So a quadratic sorting algorithm can still be said to be O(n^3), although that might be a little misleading.
In math that’s true, but a practitioner software engineer uses it to mean the common "tightest" upper bound we can easily prove for the case we care about (worst, amortized, average, etc.), and not the actual tightest possible bound.
Yes. The notation can be used to describe how fast any function grows. That function may be a worst-case runtime wrt. input length, the average-case runtimve wrt. input length, or anything else, really.
But the most common usage is indeed worst-case analysis, especially in intro courses.
This is also wrong in the article:
> You may sometimes see the Greek letter "theta", Θ, instead of the letter O in big O notation. This is used when the best and worst case have the same order of growth. We can't use it for bubble sort because the best case is O(n) and the worst case is O(n^2).
It conflates two different, orthogonal concepts: upper vs lower bounds on the one hand, and best vs worst case analysis on the other.
The phrase "the best case is O(n)" already contradicts "big O notation always describes the worst-case scenario". They clearly use it for best case right there in the article.
> But the most common usage is indeed worst-case analysis, especially in intro courses.
It is also the most sensible one. Best case version is basically useless, if you are lucky you may end up with O(1) but that doesn't tell you anything. Average case could be good but it requires you to estimate the distribution.
Big O notation can be used to describe any function, whether worst case runtime of an algorithm, average case runtime of an algorithm, the price of eggs in China as a function of time, or anything else.
In order to write f(n) is O(g(n)), you already had to smuggle in the idea that you were talking worst case before even talking about O(*). What is f? Typically it is the max of "steps taken" over all input problems of size n. i.e. the worst case.
The O(g(n)) part says f is asymptotically bounded by g(n) up to some constant factor.
I would just say "almost always" to cut off the pedants. You do use the same notation to describe the best-case and average-case as well; people just don't care about those as often.
TLDR: Different contexts, different analyses. You can do a best-case, average-case and worst-case analysis. A linear search is O(1) in the best case, for example.
Maybe this comes across as nitpicky to some, but that doesn't make the text any less incorrect, unfortunately. It's an extremely common misconception, however, and I would say that a huge proportion of students that go through an introductory course on this subject come out thinking the same thing.
> It’s okay, you’re not the first to point it out to me!
It would be nice if you corrected it - there are already too many people making this mistake online.
What blows me away is that there are quantum calculations which grow as O**7 with respect to the number of atoms and scientists are fine with running them because N is small, computers get faster and more memory all the time, and the results are valuable.
(I'm not a computer science expert, so if I used the O(n) terminology differently from the standard, sorry).
"Fine with running them" is a bit of a stretch, though the gist is correct.
There's a whole hierarchy of quantum chemistry algorithms where accuracy is traded with asymptotic runtime, I think starting at N*^3 and going all the way to N!.
[0] https://nedbatchelder.com/text/bigo.html
[1] https://nedbatchelder.com/blog/201711/toxic_experts.html
Ultimately Ned is in the right about empathy and communication online. But as an educator it would have been nice to hear, even briefly, why he thought Pyon's point was unnecessarily technical and pedantic? Instead he just says "I've worked for decades and didn't know it". No one is too experienced to learn.
EDIT: I just skimmed the original comment section between Pyon and Ned and it seems that Ned is rather diplomatic and intellectually engages with Pyon's critique. Why is this level of analysis completely missing from the follow-up blogpost? I admit to not grasping the technical details or importance, personally, it would be nice to hear a summarized analysis...
In reality, even regular subscribers probably aren't, and if you're a random visitor years later, the whole thing may be impossible to parse.
Because that's not what the article is about. It's not about whether Pyon was correct or wrong, it's that they were a dick about it. Their opening words were "you should be ashamed". They doubled and tripled down on their dickery later on.
And no matter how good your point is or how right you are: if you're a dick about it people will dislike you.
The wrong take from this would be "... and therefore, technical details don't matter and it's ok to be inaccurate."
HN is not an audience of laypeople (mostly) and will critique the article with a different mindset than a novice that might be impressed by the visuals (which are impressive).
So I think the reaction is both to be expected and reasonable: HN will critique how correct the explanation is, and point out the mistakes. And there were a couple of fundamental mistakes due to the author not being a subject matter expert.
I'll double down on my toxicity by saying I didn't like the page layout. As someone with ADHD (and a declining memory), I need to be led through formatting/sub-headings/bullets/colored sections/etc into each detail or it all blends together into a wall of text. The longer it takes to make a point (visually and conceptually), the more lost I am. I couldn't easily follow it. The Simple Wikipedia page was more straight to the point (https://simple.wikipedia.org/wiki/Big_O_notation), but reading the "full" Wikipedia page thrusts you headlong into a lot of math, which to me signifies that this shit is more complex than it seems and simplifying it is probably a bad idea.
Ask yourself why. The usual answer is that top experts either can't be bothered to create better content, or they actively gatekeep, believing that their field must remain hard to learn and the riff-raff must be kept out.
I think the first step is to accept that laypeople can have legitimate interest in certain topics and deserve accessible content. The remedy to oversimplified explanations is to write something better - or begrudgingly accept the status quo and not put people down for attempts that don't meet your bar.
It's also good to ponder if the details we get worked up about actually matter. Outside the academia, approximately no one needs a precise, CS-theoretical definition of big-O notation. Practitioners use it in a looser sense.
Don't think the entire internet is repeating inaccuracies. :) I also believe there are readers that attempt to learn further than a blog. A blog post can inspire you to learn more about a topic, speaking from personal experience.
If there were no blog posts, maybe there would be no HN I believe.
There should be a place for non-experts. One could remain skeptical when they read blog posts without hating blog posts about complex topics written by non-expert.
I bet you also have subjects where you use and spread inaccuracies unless you are universal expert
My experience is the opposite. I hate the colorful books with many little boxes, pictures with captions in floaters, several different font sizes on the page, cute mascots etc, where even the order or reading is unclear.
Instead I found it much easier to learn from old books made before the 70s-80s, sometimes back into the 40s. It's single column, black on white, but the text is written by a real character and is far from dry. I had such a book on probability and it had a chapter on the philosophical interpretations of probability, which took the reader seriously, and not by heaping dry definitions but with an opinionated throughline through the history of it. I like that much better than the mealy mouthed, committee-written monstrosity that masks any passion for the subject. I'd rather take a half page definition or precise statement of a theorem where I have to check every word but I can then trust every word, over vague handwavy explanations. Often these modern books entirely withhold the formal definitions so you're left in a vague uneasy feeling where you kind of get it, have questions but can't find out "is it now this way, or that way, precisely?". And I'm not so sure that these irrelevant-cute-pics-everywhere books are really better for ADHD at the end of the day, as opposed to distraction free black on white paragraphs written by a single author with something to say.
Where it falls short is the usefulness to others.
I hate that this declaration is both hilarious and one many might suggest I need to prefix with regularly.
:-D
Never assumed that reading it would turn me into an expert at it. Never heard of big O till now but find it a fascinating way to look at algorithm construction.
One more thing: if every article about everything has to be a flawless exhaustive explanation then I suppose only 13 people would even know about relativity. There is plenty of room for content that simply introduces an idea and leaves filling in the gaps to the reader's discretion.
It's not useless, and a lot of care seems to have gone into it, but ultimately it's a lay person explaining something they don't firmly grasp, and making a couple of serious mistakes in the process.
I love the presentation itself though!
For example, in applications where the sorted form can be readily maintained, a decent B+-tree tends to massively outperform a hashmap as soon as you get the streamed/non-indexed side (of what's in a way a lookup join) to opportunistically batch items:
as when you sort your lookups you can use exponential forward search (compare at exponentially increasing distances from the cursor; once you're past the target, run binary search between this now upper bounds and the previous probe point as lower bound) in your index for each next key to reduce the per-lookup cost to be logarithmic in distance from the last-looked-up key (asymptotically always better than single isolated lookups; in practice tends to cap out at 2x the cost in pathological cases if you respect page locality of B+-tree structures). If you aren't ignorant of your LRU cache set during such, you'll get by with overall significantly fewer memory accesses to e.g. fresh DRAM pages (let alone NAND pages) than with a hashmap setup.
I've severely sped up a C++ program by replacing an `std::unordered_map` with a `std::vector` (iirc technically a pair of; for SoA purposes) by realizing that I could collect the items unordered; sort the structure; then use binary search instead of hashmap lookups. The function that I modified ran something like 3x faster as a result, and that's without anything like vectorization-friendly structures or such.
Besides, often you're lucky and there's a trivial perfect hash like modulo.
I also don't understand your first point. We can run n^2 algorithms on massive inputs given its just a polynomial. Are you thinking of 2^n perhaps?
Do I really need to make my code spaghetti if the array sizes are small enough to not impact my speed ?
Well, it doesn't seem that obvious, at least not to everyone, because several times I've seen people rewrite things to make it O(1) when it was actually slower for the vast majority of cases (or even all cases).
In computer science f(x) is often some complexity function, like number of some specific operations when running an algorithm to completion.
Otherwise, we cannot say that 1 is O(x), for example.
In such cases as well, one might know for a specific application of the algorithm that, for example, `ρ` is bounded, and so the big-O becomes more influenced by the `n` and `K`.
Examples:
- algorithm with O(2^n) complexity might be faster for your problem than O(1) algorithm
- the same algorithm may be O(2^n) or O(n³) depending how you define size function
This is not straightforward.
I took DSA as a third-year Electrical and Computer Engineering student. I recall maybe 15 minutes of exposition on the topic and exam questions of a near-throwaway nature (ergo: write this tree parsing algorithm, 95 points for your code's correctness, 5 points for explaining your algorithm's Big O complexity). It did seem like something that CS professors and teaching assistants considered either trivial or self-explanatory.
I learned about Big O notation while learning calculus, along with little-o.
Lo and behold:
> Because it's common for an algorithm's performance to depend not just on the size of the input, but also its arrangement, big O notation always describes the worst-case scenario.
This is not true. I'm on my phone, so I can't be bothered to explain this in more detail, but there is plenty of information about this online.
But the most common usage is indeed worst-case analysis, especially in intro courses.
This is also wrong in the article:
> You may sometimes see the Greek letter "theta", Θ, instead of the letter O in big O notation. This is used when the best and worst case have the same order of growth. We can't use it for bubble sort because the best case is O(n) and the worst case is O(n^2).
It conflates two different, orthogonal concepts: upper vs lower bounds on the one hand, and best vs worst case analysis on the other.
The phrase "the best case is O(n)" already contradicts "big O notation always describes the worst-case scenario". They clearly use it for best case right there in the article.
It is also the most sensible one. Best case version is basically useless, if you are lucky you may end up with O(1) but that doesn't tell you anything. Average case could be good but it requires you to estimate the distribution.
That section in particular has already seen several revisions based on other peoples’ comments.
I’d be interested to hear how you would explain it when you’re at a more appropriate device.
The O(g(n)) part says f is asymptotically bounded by g(n) up to some constant factor.
Maybe this comes across as nitpicky to some, but that doesn't make the text any less incorrect, unfortunately. It's an extremely common misconception, however, and I would say that a huge proportion of students that go through an introductory course on this subject come out thinking the same thing.
> It’s okay, you’re not the first to point it out to me!
It would be nice if you corrected it - there are already too many people making this mistake online.
(I'm not a computer science expert, so if I used the O(n) terminology differently from the standard, sorry).
What you should say if you want to be correct is it grows as Ω(n⁷).
There's a whole hierarchy of quantum chemistry algorithms where accuracy is traded with asymptotic runtime, I think starting at N*^3 and going all the way to N!.