A visual introduction to big O notation

This article and its associated HN comment section continue in the long tradition of Big O Notation explainers [0] and getting into a comment kerfuffle over the finer, technical points of such notation versus its practical usage [1]. The wheel turns...

[0] https://nedbatchelder.com/text/bigo.html

[1] https://nedbatchelder.com/blog/201711/toxic_experts.html

dawnofdusk · 4 months ago

From what I read in the comments of the first post, the Pyon guy seems very toxic and pedantic, but the rebuttal by Ned isn't great. For example, nowhere in the rebuttal is the pedantic technical detail ever actually described. In fact the prose reads very awkwardly in order to circumlocute around describing it, just repeatedly naming it "particular detail". In my view, the author overreaches: he dismisses Pyon not only for the delivery of his criticism (which was toxic) but also the content of his criticism (why?).

Ultimately Ned is in the right about empathy and communication online. But as an educator it would have been nice to hear, even briefly, why he thought Pyon's point was unnecessarily technical and pedantic? Instead he just says "I've worked for decades and didn't know it". No one is too experienced to learn.

EDIT: I just skimmed the original comment section between Pyon and Ned and it seems that Ned is rather diplomatic and intellectually engages with Pyon's critique. Why is this level of analysis completely missing from the follow-up blogpost? I admit to not grasping the technical details or importance, personally, it would be nice to hear a summarized analysis...

xenotux · 4 months ago

Such is the curse of blogging: when writing a series of posts, authors naturally assume that the readers are as engaged and as familiar with the previous discussion as they are.

In reality, even regular subscribers probably aren't, and if you're a random visitor years later, the whole thing may be impossible to parse.

arp242 · 4 months ago

> Why is this level of analysis completely missing from the follow-up blogpost

Because that's not what the article is about. It's not about whether Pyon was correct or wrong, it's that they were a dick about it. Their opening words were "you should be ashamed". They doubled and tripled down on their dickery later on.

And no matter how good your point is or how right you are: if you're a dick about it people will dislike you.

the_af · 4 months ago

I think the lesson of those articles is not that people should stop trying to correct a misleading or incorrect explanation, but rather, that some people on the internet (like the "expert" described there) are more interested in picking and winning "fights" rather than gently helping the author correct his article. If you see Pyon's comments, he was very aggressive and very internet-troll-like.

The wrong take from this would be "... and therefore, technical details don't matter and it's ok to be inaccurate."

the_af · 4 months ago

One caveat here is that the author of the article posted it here in HN for comments -- it's not that someone else did, and this is unfair because HN was never supposed to take a look, etc. They expected a review, otherwise they wouldn't have posted it here.

HN is not an audience of laypeople (mostly) and will critique the article with a different mindset than a novice that might be impressed by the visuals (which are impressive).

So I think the reaction is both to be expected and reasonable: HN will critique how correct the explanation is, and point out the mistakes. And there were a couple of fundamental mistakes due to the author not being a subject matter expert.

frabonacci · 4 months ago

Good intro! I first learned Big O from Cracking the Coding Interview since many universities in Europe notoriously skip complexity notations even in basic programming classes. This definitely explains it in a much simpler way.

monkeyelite · 4 months ago

This is why being able to read and write math is so important. All this confusion can be answered from a one sentence definition.

0xbadcafebee · 4 months ago

Toxic expert here! I hate when blog posts try to teach complex subjects. It's almost always a non-expert doing the teaching, and they fail to do it accurately. This then causes 1) the entire internet repeating the inaccuracies, and 2) the readers make no attempt to do further learning than the blog post, reinforcing their ignorance.

I'll double down on my toxicity by saying I didn't like the page layout. As someone with ADHD (and a declining memory), I need to be led through formatting/sub-headings/bullets/colored sections/etc into each detail or it all blends together into a wall of text. The longer it takes to make a point (visually and conceptually), the more lost I am. I couldn't easily follow it. The Simple Wikipedia page was more straight to the point (https://simple.wikipedia.org/wiki/Big_O_notation), but reading the "full" Wikipedia page thrusts you headlong into a lot of math, which to me signifies that this shit is more complex than it seems and simplifying it is probably a bad idea.

xenotux · 4 months ago

> Toxic expert here! I hate when blog posts try to teach complex subjects. It's almost always a non-expert doing the teaching, and they fail to do it accurately. This then causes 1) the entire internet repeating the inaccuracies, and 2) the readers make no attempt to do further learning than the blog post, reinforcing their ignorance.

Ask yourself why. The usual answer is that top experts either can't be bothered to create better content, or they actively gatekeep, believing that their field must remain hard to learn and the riff-raff must be kept out.

I think the first step is to accept that laypeople can have legitimate interest in certain topics and deserve accessible content. The remedy to oversimplified explanations is to write something better - or begrudgingly accept the status quo and not put people down for attempts that don't meet your bar.

It's also good to ponder if the details we get worked up about actually matter. Outside the academia, approximately no one needs a precise, CS-theoretical definition of big-O notation. Practitioners use it in a looser sense.

wy1981 · 4 months ago

Writing is one of the best ways to learn something. Maybe non-experts learn something by writing about it?

Don't think the entire internet is repeating inaccuracies. :) I also believe there are readers that attempt to learn further than a blog. A blog post can inspire you to learn more about a topic, speaking from personal experience.

If there were no blog posts, maybe there would be no HN I believe.

There should be a place for non-experts. One could remain skeptical when they read blog posts without hating blog posts about complex topics written by non-expert.

croes · 4 months ago

Most of the time inaccurate knowledge is better than no knowledge.

I bet you also have subjects where you use and spread inaccuracies unless you are universal expert

bonoboTP · 4 months ago

You can't satisfy everyone.

My experience is the opposite. I hate the colorful books with many little boxes, pictures with captions in floaters, several different font sizes on the page, cute mascots etc, where even the order or reading is unclear.

Instead I found it much easier to learn from old books made before the 70s-80s, sometimes back into the 40s. It's single column, black on white, but the text is written by a real character and is far from dry. I had such a book on probability and it had a chapter on the philosophical interpretations of probability, which took the reader seriously, and not by heaping dry definitions but with an opinionated throughline through the history of it. I like that much better than the mealy mouthed, committee-written monstrosity that masks any passion for the subject. I'd rather take a half page definition or precise statement of a theorem where I have to check every word but I can then trust every word, over vague handwavy explanations. Often these modern books entirely withhold the formal definitions so you're left in a vague uneasy feeling where you kind of get it, have questions but can't find out "is it now this way, or that way, precisely?". And I'm not so sure that these irrelevant-cute-pics-everywhere books are really better for ADHD at the end of the day, as opposed to distraction free black on white paragraphs written by a single author with something to say.

jpfromlondon · 4 months ago

It's an exercise most laymen undertake to reinforce newly acquired or partially acquired knowledge, and a valuable one.

Where it falls short is the usefulness to others.

IOT_Apprentice · 4 months ago

Perhaps you could take the content and reformat it in a way that is better? I’d be interested in seeing your results. Thanks.

AdieuToLogic · 4 months ago

> Toxic expert here!

I hate that this declaration is both hilarious and one many might suggest I need to prefix with regularly.

:-D

jibal · 4 months ago

That second link is not about Big-O and is not something anyone should model.

samwho · 4 months ago

Hehe, yes, Ned sent me a lovely email the other day about it. Happy to be doing my part.

Every time I see an introduction to Big-O notation, I immediately go look for a section about the worst case to find the most common, fundamental misconception that is present in nearly every such text.

Lo and behold:

> Because it's common for an algorithm's performance to depend not just on the size of the input, but also its arrangement, big O notation always describes the worst-case scenario.

This is not true. I'm on my phone, so I can't be bothered to explain this in more detail, but there is plenty of information about this online.

tromp · 4 months ago

O() doesn't necessarily describe worst-case behaviour. It just provides an asymptotic upper bound. So a quadratic sorting algorithm can still be said to be O(n^3), although that might be a little misleading.

didibus · 4 months ago

In math that’s true, but a practitioner software engineer uses it to mean the common "tightest" upper bound we can easily prove for the case we care about (worst, amortized, average, etc.), and not the actual tightest possible bound.

samwho · 4 months ago

I'd love to hear more about how a quadratic sorting algorithm could be said to be O(n^3). That isn't intuitive to me.

bonoboTP · 4 months ago

Yes. The notation can be used to describe how fast any function grows. That function may be a worst-case runtime wrt. input length, the average-case runtimve wrt. input length, or anything else, really.

But the most common usage is indeed worst-case analysis, especially in intro courses.

This is also wrong in the article:

> You may sometimes see the Greek letter "theta", Θ, instead of the letter O in big O notation. This is used when the best and worst case have the same order of growth. We can't use it for bubble sort because the best case is O(n) and the worst case is O(n^2).

It conflates two different, orthogonal concepts: upper vs lower bounds on the one hand, and best vs worst case analysis on the other.

The phrase "the best case is O(n)" already contradicts "big O notation always describes the worst-case scenario". They clearly use it for best case right there in the article.

ayhanfuat · 4 months ago

> But the most common usage is indeed worst-case analysis, especially in intro courses.

It is also the most sensible one. Best case version is basically useless, if you are lucky you may end up with O(1) but that doesn't tell you anything. Average case could be good but it requires you to estimate the distribution.

samwho · 4 months ago

I’d love to hear how those two things differ, and how I could include it in the post in a way that won’t be too scary for a beginner to the topic.

That section in particular has already seen several revisions based on other peoples’ comments.

lordleft · 4 months ago

Are you referring to the fact that Big O has notation and concepts revolving around best-case/average case scenarios, as well as worst-case scenarios?

umanwizard · 4 months ago

It is the same notation in either case. The worst-case runtime of quicksort is O(n^2). The average runtime is O(n * log(n)).

samwho · 4 months ago

It’s okay, you’re not the first to point it out to me!

I’d be interested to hear how you would explain it when you’re at a more appropriate device.

umanwizard · 4 months ago

Big O notation can be used to describe any function, whether worst case runtime of an algorithm, average case runtime of an algorithm, the price of eggs in China as a function of time, or anything else.

ndriscoll · 4 months ago

In order to write f(n) is O(g(n)), you already had to smuggle in the idea that you were talking worst case before even talking about O(*). What is f? Typically it is the max of "steps taken" over all input problems of size n. i.e. the worst case.

The O(g(n)) part says f is asymptotically bounded by g(n) up to some constant factor.

mminer237 · 4 months ago

I would just say "almost always" to cut off the pedants. You do use the same notation to describe the best-case and average-case as well; people just don't care about those as often.

IceDane · 4 months ago

TLDR: Different contexts, different analyses. You can do a best-case, average-case and worst-case analysis. A linear search is O(1) in the best case, for example.

Maybe this comes across as nitpicky to some, but that doesn't make the text any less incorrect, unfortunately. It's an extremely common misconception, however, and I would say that a huge proportion of students that go through an introductory course on this subject come out thinking the same thing.

> It’s okay, you’re not the first to point it out to me!

It would be nice if you corrected it - there are already too many people making this mistake online.

jcalx · 4 months ago

Beestie · 4 months ago

Late to the party but as a layperson who knows just enough about programming to be dangerous, I found the article to be enlightening.

Never assumed that reading it would turn me into an expert at it. Never heard of big O till now but find it a fascinating way to look at algorithm construction.

One more thing: if every article about everything has to be a flawless exhaustive explanation then I suppose only 13 people would even know about relativity. There is plenty of room for content that simply introduces an idea and leaves filling in the gaps to the reader's discretion.

I think the criticism here is not that the article is incomplete ("not exhaustive") but rather that it makes several fundamental mistakes and doesn't actually describe Big O. It describes a "pop culture" version of Big O, which is not Big O, but calls it Big O.

It's not useless, and a lot of care seems to have gone into it, but ultimately it's a lay person explaining something they don't firmly grasp, and making a couple of serious mistakes in the process.

I love the presentation itself though!

ryeats · 4 months ago

O(1) in many cases involves a hashing function which is a non-trivial but constant cost. For smaller values of N it can be outperformed in terms of wall clock time by n^2 worst case algorithms.

namibj · 4 months ago

Arguably this is just a fairly poor way of thinking for quite many practical applications. Notably, memory access latency(/inverse throughput) is roughly between O(log(n)) [ https://news.ycombinator.com/item?id=12385472 ] and O(n^(1/2)) [ https://news.ycombinator.com/item?id=12383275 ](the latter aka O(sqrt(n))).

For example, in applications where the sorted form can be readily maintained, a decent B+-tree tends to massively outperform a hashmap as soon as you get the streamed/non-indexed side (of what's in a way a lookup join) to opportunistically batch items:

as when you sort your lookups you can use exponential forward search (compare at exponentially increasing distances from the cursor; once you're past the target, run binary search between this now upper bounds and the previous probe point as lower bound) in your index for each next key to reduce the per-lookup cost to be logarithmic in distance from the last-looked-up key (asymptotically always better than single isolated lookups; in practice tends to cap out at 2x the cost in pathological cases if you respect page locality of B+-tree structures). If you aren't ignorant of your LRU cache set during such, you'll get by with overall significantly fewer memory accesses to e.g. fresh DRAM pages (let alone NAND pages) than with a hashmap setup.

I've severely sped up a C++ program by replacing an `std::unordered_map` with a `std::vector` (iirc technically a pair of; for SoA purposes) by realizing that I could collect the items unordered; sort the structure; then use binary search instead of hashmap lookups. The function that I modified ran something like 3x faster as a result, and that's without anything like vectorization-friendly structures or such.

svara · 4 months ago

I mean, true obviously, but don't say that too loud lest people get the wrong ideas. For most practical purposes n^2 means computer stops working here. Getting people to understand that is hard enough already ;)

Besides, often you're lucky and there's a trivial perfect hash like modulo.

b52_ · 4 months ago

What do you mean? Modulo is not a perfect hash function... What if your hash table had size 11 and you hash two keys of 22 and 33?

I also don't understand your first point. We can run n^2 algorithms on massive inputs given its just a polynomial. Are you thinking of 2^n perhaps?

whatever1 · 4 months ago

N^2 is just two nested loops. It trivially shows up everywhere you have 2D arrays.

Do I really need to make my code spaghetti if the array sizes are small enough to not impact my speed ?

> true obviously

Well, it doesn't seem that obvious, at least not to everyone, because several times I've seen people rewrite things to make it O(1) when it was actually slower for the vast majority of cases (or even all cases).

fracus · 4 months ago

Having gone through electrical engineering, I always felt that Big O notation was never properly explained unless I happened to miss every time it was introduced. It always felt like it was just hand waived as something we should already know. I'm curious what level of math or computer science is usually responsible for introducing it.

daemonologist · 4 months ago

My university had an "algorithm analysis" class (required for CS majors) which covered it along with various kinds of proofs. That was a junior/senior year class though; I think it was really expected that you were already somewhat familiar with the concept by the end of your first year (via osmosis I guess).

leni536 · 4 months ago

A function f(x) is said to be O(g(x)) if f(x)/g(x) is bounded, that is there is some C so that for every x, f(x)/g(x) < C .

In computer science f(x) is often some complexity function, like number of some specific operations when running an algorithm to completion.

blt · 4 months ago

This needs to be refined: f(x) is O(g(x)) if there exists some X >= 0 such that f(x)/g(x) is bounded for all x > X.

Otherwise, we cannot say that 1 is O(x), for example.

mvdtnz · 4 months ago

What an asshole way of explaining it. For practical purposes (don't @ me if this is technically wrong, I don't give a shit) it's a description of how the cost of an algorithm grows with the size of the input. For example if the cost doubles every time the input increments then it's O(n^2). If the cost increases in step with the input size then it's O(n). Simples.

bongodongobob · 4 months ago

Comp sci 101. Learned it my first year. There really isn't anything to it. It just describes how the number of operations grow as the number of inputs in your algorithm increases. It looks scary, but it's very simple and straightforward.

nwatson · 4 months ago

It can get a lot more complicated though. At times there are more parameters to an algorithm's complexity than just `n`. E.g., the parameters for big-O might be `n`, `K`, and `ρ`, and some expressions in the big-O might involve combinations of those parameters.

In such cases as well, one might know for a specific application of the algorithm that, for example, `ρ` is bounded, and so the big-O becomes more influenced by the `n` and `K`.

Sankozi · 4 months ago

You are commenting under blog post that tried to explain it and got it wrong. It is not simple.

Examples:

- algorithm with O(2^n) complexity might be faster for your problem than O(1) algorithm

- the same algorithm may be O(2^n) or O(n³) depending how you define size function

This is not straightforward.

anthomtb · 4 months ago

Big O was introduced in my Data Structures and Algorithms (DSA) class which was nominally a second year undergrad class. In practice, most Computer Science (CS) students were far enough advanced to take DSA in their first year.

I took DSA as a third-year Electrical and Computer Engineering student. I recall maybe 15 minutes of exposition on the topic and exam questions of a near-throwaway nature (ergo: write this tree parsing algorithm, 95 points for your code's correctness, 5 points for explaining your algorithm's Big O complexity). It did seem like something that CS professors and teaching assistants considered either trivial or self-explanatory.

jayd16 · 4 months ago

The Discrete Math class in the CS track is where I got the bulk of it.

LPisGood · 4 months ago

That’s because you could define it in a lot of different ways. For example, if you define it as the number of steps it takes some Turing machine representing some algorithm to halt, then there is no such thing as a logarithmic time algorithm; O(lg n) = O(1)

Why is that?

ordu · 4 months ago

> It always felt like it was just hand waived as something we should already know.

I learned about Big O notation while learning calculus, along with little-o.

dekhn · 4 months ago

What blows me away is that there are quantum calculations which grow as O**7 with respect to the number of atoms and scientists are fine with running them because N is small, computers get faster and more memory all the time, and the results are valuable.

(I'm not a computer science expert, so if I used the O(n) terminology differently from the standard, sorry).

JohnKemeny · 4 months ago

You can simply say grows as n⁷. Although people will understand what you mean by saying O(n⁷), O is an upper bound, not a lower bound.

What you should say if you want to be correct is it grows as Ω(n⁷).

wasabi991011 · 4 months ago

"Fine with running them" is a bit of a stretch, though the gist is correct.

There's a whole hierarchy of quantum chemistry algorithms where accuracy is traded with asymptotic runtime, I think starting at N*^3 and going all the way to N!.

1zael · 4 months ago

The dynamic visualizations are a massive help in helping understanding the content. Please make more lessons like this!

I’m glad you like them! <3

primitivesuave · 4 months ago

I just want to add to the chorus that your visualizations are outstanding.

ygouzerh · 4 months ago

So impressive!

totisjosema · 4 months ago

Love the visualizations! For someone that learned these algorithms years ago, it is still nice to see them visualized!