One interesting note on this approach is that the author suggests using a "fast" sorting algorithm like mergesort/quicksort as the sorting algorithm for best performance. But in practice, you may get better performance from a "worse" sorting algorithm: insertion sort.
The reason is that objects in collision detection systems tend to move relatively small steps between frames, meaning you can keep lists from the previous frame that are already mostly sorted. For sorting these mostly sorted lists, insertion sort tends towards O(n) while Quicksort tends towards O(n^2)!
If you use a modern sort, it'll just go "Oh that's sorted, we're done". Pattern Defeating Quicksort will do that and so would Glidesort, Driftsort, IPNsort and so on. Most of these algorithms will also correctly shortcut trivial "almost sorted" cases like 123457689 (only longer, if you really only have nine elements you want a custom sort for that, there will be a correct algorithm and it's probably written down somewhere already)
Rust's old sorts, both of them, get this right. There's a fair chance the current version of your C++ stdlib even gets this right in its unstable sort. In 1965 it's understandable that you reach for an insertion sort for this case, in 2024 it's embarrassing if your language doesn't provide a built-in sort where this just works.
There is an alternative to sorting every step : Build your indexing structure a little looser, so that you catch the candidates collisions when object have moved less than epsilon.
For example that can be done by increasing the radius of the spheres by epsilon.
As long as the spheres have not moved by epsilon, you don't need to recompute the index.
To avoid latency peaks when you do need to recompute the index, you can start building a lagging index by sorting 10% each frame (aka amortizing the computation cost). After 10 frames you have an index that is valid as long as the position is within epsilon of the position 10 frames ago.
> For sorting these mostly sorted lists, insertion sort tends towards O(n) while Quicksort tends towards O(n^2)!
Only if you pick your pivot really, really badly. You can randomise your pivot to get O(n log n), or in the case of already almost sorted lists, you can pick a pivot by just picking the element in the middle of the list.
But you are right, that even with optimal pivots, QuickSort is still O(n log n) even in the best case. There are simple variants of merge sort that give you O(n log k) behaviour where k is the number of runs (both ascending and descending) in the data. The 'sort' you get by default in eg Haskell's standard library uses such an algorithm. I think Python does, too.
What’s funny is that I’ve been doing some form of gamedev since late 90s and most of this is abstracted by engines at this point, but this is essential to understanding how complex system simulations work.
Thanks to the author for making a very accessible article
You're right that it's not exactly n^2. For the i-th element we perform (n - i - 1) comparisons (indexing from 0). This adds up to a total of (n - 1) * n / 2 comparisons. (see https://en.wikipedia.org/wiki/Triangular_number)
In the end it doesn't make a difference for big-O analysis because it's used to describe the behavior when n approaches infinity, where the quadratic factor takes over.
Big O is just complexity classes, describing how number of abstract computations scale (that's the key word) with input size (input list length). Generally speaking, if you could analytically express number of computations as a function of input size, Big O takes the largest term and drops all factors.
It does not necessarily describe performance of the algorithm. `20n2^+5n` and `2n^2 + 9001n` are both O(n^2)
> It does not necessarily describe performance of the algorithm.
Not necessarily true. It does indeed describe performance of the algorithm. It just compares scenarios with coarser granularity. You can tell from the very start that a O(1) algorithm is expected to outperform a O(N²) alternative.
it's the summation of numbers from 1 to n which is n(n+1)/2. This reduces to quadratic complexity because big O notation ignores all coefficients and terms that scale slower
I enjoyed the use of illustration. Seems like an appropriate usage.
Sometimes I feel like these articles with interactive illustrations are more like an excuse to put together a bunch of cool demos, like there's a lot of fluff with not much substance (a bit like a TED talk), but this one didn't let the illustrations take over.
A long time ago I did something similar but rather than sorting I just kept index lists for each direction and the objects sorted themselves. Meaning like there are 4 lists `objectIndicesSortedByLeftEdge/RightEdge/TopEdge/BottomEdge` If an object moves horizontally then it updates its own indices in the leftEdge and rightEdge arrays. This is because as it moves it likely only has to swap 1 or 2 indices at most to move itself.
objects are unlikely to jump from one side the screen to the other so comparing them to stuff on the other side of the screen seems like a waste (which is what a generic sort will do).
The reason is that objects in collision detection systems tend to move relatively small steps between frames, meaning you can keep lists from the previous frame that are already mostly sorted. For sorting these mostly sorted lists, insertion sort tends towards O(n) while Quicksort tends towards O(n^2)!
> Let’s look at the sort step, which is the bottleneck of the algorithm according to the analysis.
> You can see that most of the time, the sort does nothing at all! The list is almost always already sorted from the previous frame.
> Even when it becomes unsorted, it usually just takes a couple of swaps to be sorted again.
> Here’s insertion sort in action:
Rust's old sorts, both of them, get this right. There's a fair chance the current version of your C++ stdlib even gets this right in its unstable sort. In 1965 it's understandable that you reach for an insertion sort for this case, in 2024 it's embarrassing if your language doesn't provide a built-in sort where this just works.
For example that can be done by increasing the radius of the spheres by epsilon. As long as the spheres have not moved by epsilon, you don't need to recompute the index.
To avoid latency peaks when you do need to recompute the index, you can start building a lagging index by sorting 10% each frame (aka amortizing the computation cost). After 10 frames you have an index that is valid as long as the position is within epsilon of the position 10 frames ago.
Only if you pick your pivot really, really badly. You can randomise your pivot to get O(n log n), or in the case of already almost sorted lists, you can pick a pivot by just picking the element in the middle of the list.
But you are right, that even with optimal pivots, QuickSort is still O(n log n) even in the best case. There are simple variants of merge sort that give you O(n log k) behaviour where k is the number of runs (both ascending and descending) in the data. The 'sort' you get by default in eg Haskell's standard library uses such an algorithm. I think Python does, too.
Deleted Comment
What’s funny is that I’ve been doing some form of gamedev since late 90s and most of this is abstracted by engines at this point, but this is essential to understanding how complex system simulations work.
Thanks to the author for making a very accessible article
https://github.com/bepu/bepuphysics2/blob/master/Documentati...
The library itself is amazing in terms of performance. It is a bit challenging to integrate with though due to the amount of optimization involved.
Is this true? The naive algorithm's outer loop (i) counts n - 1, while the inner loop (j) begins at i + 1 (so counts progressively less than n - 1).
Not a CS major, is this roughly equivalent (for large values of n) to O(n2) or is it, as it appears, something less?
In the end it doesn't make a difference for big-O analysis because it's used to describe the behavior when n approaches infinity, where the quadratic factor takes over.
[Ed: I should add that it also prevents testing an object against itself.]
The algorithm is O(n^2) because every pair will be checked once.
It does not necessarily describe performance of the algorithm. `20n2^+5n` and `2n^2 + 9001n` are both O(n^2)
Not necessarily true. It does indeed describe performance of the algorithm. It just compares scenarios with coarser granularity. You can tell from the very start that a O(1) algorithm is expected to outperform a O(N²) alternative.
Sometimes I feel like these articles with interactive illustrations are more like an excuse to put together a bunch of cool demos, like there's a lot of fluff with not much substance (a bit like a TED talk), but this one didn't let the illustrations take over.
Check out his other goodies https://leanrada.com/