Big O Notation – Explained as easily as possible

If you are the kind of person that want to read an article titled "explained as easily as possible", I think you should just avoid saying the phrase "big oh" but instead talk about algorithm runtime more informally, like "quicksort has a worst case quadratic but average case n log n runtime".

The risk is otherwise you will shoot yourself in the foot, maybe during an interview or other situation, as Big O is just one out of many members in a family of notations that has a very specific mathematical definition, other common ones being small-o and big-theta.

Analysis of algorithms is more difficult than it appears. If you implement an algorithm in a high level language like Python you may get much worse runtime than you thought because some inner loop does arithmetic with bignum-style performance instead of hardware integer performance, for example. In such case you could talk of big-omega (your analysis is bounded-below instead of bounded-above, asymptotically).

Al-Khwarizmi · 5 years ago

I know the rest of the notations in the family but, to be honest, even in most algorithmics textbooks they tend to use Big O like 90% of the time, even in contexts where Big Theta would be more precise (e.g. "mergesort is O(n log n) in all cases"). Let alone in more informal contexts.

I don't especially like it, but it's OK because it's not a lie. And I do think for people who just want the gist of the concept, like readers of this piece, it's enough and it's not worth being fussy about it.

What irks me is people who use the equal sign as in T(n)=O(n log n), though. Why would a function equal a class? Set membership notation does the job just fine.

thaumasiotes · 5 years ago

> What irks me is people who use the equal sign as in T(n)=O(n log n), though. Why would a function equal a class? Set membership notation does the job just fine.

It's strange but it's fully standard. I would guess it develops from the use in analysis, where o(n) is more common. You derive your formula, it has a term in it that you don't want, you observe that "complicated term's numerator = o(n)", you take your limit, and the term vanishes away. Use of the = sign makes more sense there, because conceptually you're claiming something about the value of the term. (Specifically, that in the limit, it's equal to zero.)

repsilat · 5 years ago

> even in contexts where Big Theta would be more precise (e.g. "mergesort is O(n log n) in all cases"

Just to be careful here: the difference between big/little oh/theta/omega is orthogonal to best/worst/average case.

A pedant could say that merge sort makes O(n^3) comparisons in both the best and worst case, ω(1) in both the best and worst case, etc. Colloquially, the former means "as fast as", and the latter means "slower than".

indymike · 5 years ago

I didn't really read that article as being for a developer. I read it and thought, hey, this one would be good to share with a few project managers & business unit managers. Any time people who are not programmers (especially ones we have to work with) start to better understand what we're really doing, it is a good thing. Articles like this are superb for helping them understand that developers do have a disciplined and rigorous way of solving problems.

I do agree with you that these articles do leave a lot of detail and precision out. They tend to give the reader a superficial understanding of the subject... but a superficial understanding may be enough to help.

Edmond · 5 years ago

Agreed.

The use of "Big O Notation" itself as a way of referring to algorithmic complexity seems like a misnomer, considering that the topic is about analysis rather than the notation used to express the results of such analysis.

Unfortunately academic textbooks have terrible "UX", so students end up dealing with confusing presentation of topics, hence we're stuck with labels such as "Big O Notation".

bjeds · 5 years ago

I hear you.

Whether I like it or not, by now big o notation has fallen into the category of "folklore" that working engineers use and abuse informally without being very precise about it.

It's like the "proof by engineers induction": if some statement P(n) is true for P(0), P(1) and P(2), then P(n) is true for all n \in Z. :-)

Similarly if an engineer states that algorithm has a runtime of O(f(n)) that should probably be read as "as n grows very large (whatever that means) the runtime approximates (whatever that means) some bound (below, above, whatever) f(n). yolo.".

But people should at least be _aware_ that they are being imprecise about it.

If I read a blog post or StackOverflow post or whatever and I see big-theta notation I know that the person is probably precise with her definition. If I see big-o then it may be correct, or accidentally correct (happens often due to the nature of the definition) or mistaken.

nicoburns · 5 years ago

The problem with that is that informal use of big-o like notation is a lot more intuitive than the fancy language in your explanantion.

Most people who can program can grasp the informal meaning of O(n^2) pretty easily. They may not connect the word quadratic to that say concept.

Deleted Comment

sabas123 · 5 years ago

Why wouldn't you be able to just say "worst case n^2?"

tester756 · 5 years ago

>Analysis of algorithms is more difficult than it appears. If you implement an algorithm in a high level language like Python you may get much worse runtime than you thought because some inner loop does arithmetic with bignum-style performance instead of hardware integer performance, for example. In such case you could talk of big-omega (your analysis is bounded-below instead of bounded-above, asymptotically).

Of course, that's why Big O says one thing, Compiler, CPU and Caches may say the other.

cortesoft · 5 years ago

I think the main purpose of an “explained as easily as possible” article is to help the reader understand when SOMEONE ELSE uses the term.

Sure, I can choose t use informal language, but I can’t stop someone else from using big o notation when speaking to me or writing something I want to read.

proverbialbunny · 5 years ago

>The risk is otherwise you will shoot yourself in the foot, maybe during an interview or other situation

If the point is to identify the speed (or ram consumption) of algorithm, then why not check for that itself instead of the vocabulary in an interview? Why be pedantic when you can instead measure how well they would do as a developer? In an interview you can ask followup questions to see how precise their ability to explain their thought process is.

If someone is so pedantic that they would consider the interviewee to have shot themselves in the foot because they said "The big O is n squared." without any followup questions from the interviewer, that doesn't sound healthy to me. I would worry this kind of culture would extend past the interview and it wouldn't be an enjoyable place to work.

Can you imagine working in a place where people regularly argue over terminology instead of just making sure everyone is on the same page?

(Full warning: I'm not a dev, so I'm coming in from the view of another industry.)

pvg · 5 years ago

If the point is to identify the speed (or ram consumption) of algorithm, then why not check for that itself instead of the vocabulary

To some extent because the "that itself" is a deep field in its own right with its own specialized vocabulary.

bakuninsbart · 5 years ago

Exactly, the link above starts well, but fails in some regards. If you use big O, you should give the mathematical definition and at least shortly explain it. The idea of an upper boundary isn't very hard, and it is actually important to understand that an algorithm running in O(n) is also running in O(n log(n)) is also running in O(n^2). It is at least necessary to understand the other greek letters, which really does come in handy in a deeper understanding of algorithms.

The list of "common" O's is also kinda bad. Particularly, and I see this all the time, I think it is a mistake to go from O(n^2) to O(c^n), as this is the step that leaves polynomial time complexity, and glosses over the fact, that each level of exponent constitutes a different time complexity. Here the mathematical notation of O(n^2), O(n^3), ..., O(n^l) is indispensible. Nesting loops is probably one of the most commonly relevant applications of O-notations, so this actually has an influence on real-world implementations.

kristopolous · 5 years ago

Last summer I was interviewing at a FAANG company for a supposed senior level position and I pointed this out.

Instead I was treated as if I fundamentally had no understanding of algorithms whatsoever. It was enormously frustrating, especially when I demonstrated real world runtime to the interviewer of two implementations.

If they need someone to actually make things work well on real physical hardware, they need to know how the claims map to the physical reality and the fundamental limitations of chalkboard optimization. The real world actually matters.

If he knew this maybe he wouldn't be overbudget, overdeadline and trying to mad hire people like some parody of the mythical man month...

8 months later it still rubs me the wrong way - that is, a bad faith read on new information as obviously objectively wrong and the speaker (me) as misinformed even after it's been demonstrated as accurate. Assuming everyone is stupid is a great way to hire, just fantastic.

I'd bet thousands the project is either still off the rails or they've overhauled the org chart. The product hasn't been publicly announced yet btw.

It's really all for the best. This way I only wasted one day instead of say 6 additional months just spinning wheels against a stonewall.

leonidasv · 5 years ago

Big O is not about actual hardware, not it should be.

We can argue about it being useful or not, but that don't change this fact.

iujjkfjdkkdkf · 5 years ago

> Instead I was treated as if I fundamentally had no understanding of algorithms whatsoever.

This and similar experiences, basically discovering the interviewer, potential boss, or worse, actual boss, is not a colleague but actually a shallow copy of what one would expect from someone in their role, is pretty common, especially (from what I've seen) in orgs with a clear divide between a "manager" class that is mostly composed of people with less experience than those they are managing, and those doing the work.

It's best, as you say, to just write off the time wasted on the discussion, move on, and be happy you don't have to work with them.

melenaboija · 5 years ago

I agree with what you say and not a Python fanatic but this high level language stigmas catch my attention.

If you try to implement your own, let's say, intersection of sets you will probably get at most the same performance as Python using a reasonable amount of time.

I guess my point is that seeing the comment of Python in a thread like this can also be confusing. Bad (or maybe I should say "not appropriate for your use") implementations can exist in any language.

zachrose · 5 years ago

I appreciate that the author was specific about “if you count addition as 1 operation.” Without saying this, it’s not obvious that the notation is such a simplified abstraction over operations and their costs, with all the limitations that come with that simplification.

sixstringtheory · 5 years ago

> If you implement an algorithm in a high level language like Python you may get much worse runtime than you thought because some inner loop does arithmetic with bignum-style performance instead of hardware integer performance

If a language/library can change the dominant asymptotic term of an algorithm like that, instead of just the constant factor, that is a problem. Does that really happen with python or is this exaggeration? I’m inclined to accept another reason to dislike these super high level interpreted langs but I’ve never seen something that e.g. appears written to be logarithmic to become quadratic because of library implementation details.

Brian_K_White · 5 years ago

The language doesn't matter, nor even it's high vs low class.

You can write a short function in any language that looks O(1) if you only look at the surface or high level of it. Even assembly. Meanwhile in the middle of that routine is a single call to another function or macro which may be 0(1) or O(n!).

Python was just an example.

cgriswald · 5 years ago

For Python code, I usually just ask critics if they’ve tested it (because I have). Frequently using a built-in will be faster than a custom loop even if at the surface level you’re going over the data more than once.

mhh__ · 5 years ago

Anecdotally (it was a contrived sorting benchmark so the exact numbers don't really matter), Python started off about 3 times slower than D, but and grew at what would be considered the same O() but the coefficient was enormous. To the point where a D was taking 3s for n-million arrays, Python closer to one minute.

Node was actually very impressive, roughly as fast as D's reference compiler (cutting edge optimisations maybe 2 decades ago) in release mode.

d0mine · 5 years ago

How does it change big O? Typically, you implement "bignum" on top of "hardware integer" regardless of the language. Or do you mean that some common integer operations are implemented with suboptimal big O in CPython?

crashocaster · 5 years ago

For some large n, integers in the algorithm may be so large that operations on them cease to be constant time.

bitexploder · 5 years ago

It is still a useful conceptual framework. Maybe this sparks someone’s interest. Agree it is much harder than it appears :)

polishdude20 · 5 years ago

Can you recommend any good algorithms books?

bhrgunatha · 5 years ago

I'm assuming you want something rigorous - based on the comment you replies to.

Many people will recommend CLRS [0] but I prefer it as a reference, rather than a learning resource. I feel it's very dry and academic.

Instead I'd recommend Tim Roughgarden's series of books Algorithms Illuminated for learning about analysis and algorithms. He also has courses on Coursera and Edx to cover the material. It's thorough rigorous and shows algorithms that apply to different paradigms - lke divide and conquer, graph theory.

Sedgewick and Wayne's Algorithms has a companion website with lots of additional material (heavily Java based) - and courses on Coursera too. Again I think it's more approachable than CLRS while still being detailed and covering the theory.

[1] https://mitpress.mit.edu/books/introduction-algorithms-third...

[2] http://timroughgarden.org/books.html

[3] https://algs4.cs.princeton.edu/home/

Every one of these "Big-O explainers" says pretty much the same thing: count (or bound) the number of steps, then take the most significant term, and drop the constant associated with it. None of them explain why you take the most significant term or drop constant factors.

I get why that is. You need the mathematical definition to demonstrate why that is, and most "Big-O explainers" don't want to assume any significant amount of mathematical background. But, that definition isn't that hard. It's simply:

f(x) is O(g(x)) iff there exists a positive number M and an x_0 such that for all x > x_0, |f(x)| <= Mg(x).

And, if you're in an analysis of algorithms context, it's even easier, because you typically don't have to worry about this absolute value business.

Well, that M is essentially the reason you get to drop constant multiples of f(x). And, you drop the least significant terms of f(x) because g(x) dominates them, i.e. lim_{x -> \infty} g(x)/f(x) = 0. (No need to prove this, because this is what makes the "less significant" terms less significant.)

I would also like to add that the equals sign in f(x) = O(g(x)) is one of the most useful abuses of notation that I know of, but it can be misleading. It doesn't behave at all like a real equality because it's not symmetric, but it is transitive and reflexive. It actually acts more like set membership than equality.

casion · 5 years ago

You say it isn't hard, but I have 2 graduate degrees and didn't understand your explanation at all.

"Hard" is relative to prerequisite knowledge, which can vary significantly.

roywiggins · 5 years ago

The short version is that the fastest-growing term dominates all the others and for large xs the smaller terms round down to zero. Since big-O notation is about how the complexity grows for large inputs, you can assume the input is arbitrarily large, and you'll notice that the complexity is completely determined by the fastest-growing term.

You drop the constant because it doesn't alter how the complexity grows as the input increases.

pmiller2 · 5 years ago

Of course "hard" is relative. Because I was writing a HN post and not a "Big-O explainer," I didn't provide you with any of that prerequisite knowledge. But, the amount of prerequisite knowledge one needs to understand this is very, very little, and would easily fit in a digestible web page, provided you have some basic fluency with functions of the real numbers. And, I think that's a reasonable level of prerequisite to assume for anyone who wants to be throwing around terms like "Big-O."

If it's not too intrusive, may I ask what your graduate degrees are?

kaba0 · 5 years ago

A function f(n) is O(g(n)) if the graph of f will be underneath the graph of g(n) for a big enough n. (If we want to be more correct, then I would have to add that if there exists a positive number c, and cg should be above the graph of f)
So f(n):=3n+28 will be O(n^2), because choosing c as 3, for every n greater than or equal 4, 3
n^2 will be greater than f(n).

It would help if I could draw some graphs, but hopefully it helps.

andi999 · 5 years ago

Had such students as well. If I ask anything they said, oh I learned that as undergrad (implying that it is too long ago to remember). I am sad about such a waste.

mhh__ · 5 years ago

Admittedly I am studying theoretical physics so I am supposed to be able to, but it made sense to me?

kergonath · 5 years ago

This bit:

>> f(x) is O(g(x)) iff there exists a positive number M and an x_0 such that for all x > x_0, |f(x)| <= Mg(x).

is fairly similar to the so-called epsilon-delta definition of limits of functions. This way of reasoning is quite common. I know I bumped into it quite a lot: I learnt it in high school, even though I really understood it a couple of years later (I did a MSc in Physics). So the explanation above makes sense even if I never saw this exact formulation. Now I appreciate that not everyone is a Physics graduate, but I’d expect this to be understandable for people with degrees in applied Maths, Physics, or some related engineering discipline.

Al-Khwarizmi · 5 years ago

I'd say that the = is one of the most annoying abuses of notation that I know of, if not the most. Apart from not symmetric, which is a problem in its own right, for small o and small omega it's not even reflexive, which does not prevent the = users from using it also for those. Which amounts to using an equals sign to highlight how two functions differ.

And what do people gain with that? It's not as if a set membership symbol, which works just fine because big O, small o, the omegas and their ilk define sets of functions, doesn't work just fine and take the same space without being confusing.

edflsafoiewq · 5 years ago

The notation is from math; the point of it is that you can manipulate it like a normal expression, but it's "anonymous"; o(1) means some function that goes to zero, but we don't bother with a name, just recording the asymptotic behavior.

For example, f'(x) = lim (f(x+h)-f(x))/h can be rewritten with an error term f'(x) = (f(x+h)-f(x))/h + o(1), and then you can manipulate it more freely, say like f(x+h) = f(x) + hf'(x) + o(h). There's no need to drag out a bunch of useless names, each qualified by a set membership, to do this. I mean e(h) in o(1), e2(h) := h e(h) in o(h), etc.

The failure of "reflexivity", eg O(f) = O(f), is because the anonymity hides whether the two O(f)s are referring to the same function (ie. exactly what a name would tell us).

Tade0 · 5 years ago

I understood your definition (all in all I had calculus 101) but to me it only describes why that is, it doesn't explain it.

Most Big-O explainers don't assume a mathematical background because to non-mathematicians parsing your definition feels like being told you're in a hot air balloon. They see the what, but they don't understand the why.

uh_uh · 5 years ago

> And, you drop the least significant terms of f(x) because g(x) dominates them, i.e. lim_{x -> \infty} g(x)/f(x) = 0.

If g(x) dominates all the terms in f(x), then wouldn't lim_{x -> \infty} g(x)/f(x) go to infinity?

edflsafoiewq · 5 years ago

In most practical cases f=O(g) is the same as saying f/g is bounded.

jakear · 5 years ago

It acts exactly like set membership, because that’s what it is.

otabdeveloper4 · 5 years ago

> f(x) is O(g(x)) iff there exists a positive number M and an x_0 such that for all x > x_0, |f(x)| <= Mg(x).

Correct.

Corollary: x is O(x^2), for example.