He starts from the wrong axiom that its hard to prove and creates a lot of nonsense over that.
Its requires just two induction proofs:
- One that for I=1, after N comparisons the largest number is at position 1 (Proven with induction) its the base case
- The other, that for any I=n+1 if we assume that the first n slots are ordered we can treat n+1 as a new array of length N-n and solve using the base case proof.
Talking about computer science and not doing the bare minimum of computer science is peak software engineering :)
I thought his goal was to get the prover to prove it without understanding it himself.
By realizing the low-indexed portion is always sorted, you've already proved the algorithm yourself and the prover is just checking for bugs in your logic.
I'm not saying the proof isnt valuable, just that it's not magical and actually requires the user to understand the majority of the proof already.
This algorithm is surprising and interesting because it doesn't work at all like it at first seems to.
The low-indexed portion is sorted, but isn't guaranteed to contain the lowest or the highest i elements of the list (except when i=1), and the list is ultimately sorted in decreasing, not increasing order. The final sort doesn't occur until the last iteration of the outer loop when the inequality is reversed (the interesting variable, j, is on the right).
Because of that, the proof outline discussed here doesn't work.
Consider what happens if the unique smallest element starts at position n. It is placed at the start of the list (the correct final position) in the final iteration of the outer loop (i=n), and not before.
Proof (for simplicity the list is indexed 1 to n):
Let A[n] < A[k] for all k != n in the initial list.
Elements are only swapped when A[i] < A[j]. If i != n and j = n, then A[i] > A[n] = A[j], so A[n] is not swapped.
Then, when i = n and j = 1, A[i] = A[n] < A[1] = A[j], so A[1] and A[n] are swapped, placing the smallest element in position 1 at last.
That is an elegant proof... for a totally different algorithm. J starts at 1, not at I. Every element of the loop is subject to be moved in every single outer iteration. Besides it gets the comparison backwards.
You are absolutely correct, I was gun ho and got punished with internet shame, bellow is the correct proof. but the point still stands.
Solution is even simpler:
Empty case after 1 iteration (I=1) the largest number is at position 1
Base case: after 2 iterations (I=2) the 2 first elements are ordered, and the largest number is at position 2
Assume N case: after N iterations the first N numbers are ordered (within the sub list, not for the entire array) and the largest number is at position N
N+1 case (I=N+1):
For Every J<N+1 and A[J]>= A[I] nothing will happen
From the first J where J<N+1 and A[J] < A[I] (denote x) each cell will switch as A[J] < A[J+1]
At J=N+1 the largest number will move from A[N] to A[I]
For every J>N+1 Nothing will happen as the largest number is at A[I]
Not part of the proof but to make it clear we get:
- for J<x A[J]<A[I]
- for J=x A[J]=A[I]
- for J<x A[J]>=A[I]
and the list is ordered for the first J elements
No, it works, there's just a bit of an unstated step in the proof.
After j ranges from 1 to i, it will still be the case that the values from 1 to i are sorted. So you can assume that j starts at i without disturbing the induction condition.
The reason this is true is... If A[i] is >= any element A[j] for j < i, then it is clearly true. Otherwise, there is some smallest j in 1 to i-1 inclusive such that A[i] < A[j]. Then A[i] will swap with that A[j], then it will swap again with the next A[j+1] (because the original A[j] was < A[j+1]) by the induction case, then with the next element, ... swapping with every element until j reaches i.
After this is done we have maintained the condition that A[1] .. A[i] are sorted after j ranges from 1 to i.
If the (i+1)'st outer loop starts with the first i elements in sorted order, then it ends with the first i+1 elements in sorted order.
In fact if k of the first i elements are <= A[i+1], then the first k iterations of the inner loop will not swap, and the next i-k iterations will swap. The latter can be seen to maintain order.
The last n-i iterations of the inner loop (where j>=i) do not affect this reasoning and can be omitted, as can the first iteration of the outer loop.
I guess it can be thought of as an unoptimized insertion or bubble sort.
I think it is very possible to write this algorithm by mistake in intro compsci classes when you try to code a bubble sort by heart. I would think TAs may have many such instances in their students' homework.
The video from the article (https://www.youtube.com/watch?app=desktop&v=bydMm4cJDeU) is much better because it highlights the index of the outer loop, which is unclear from the cascading visualization there. By seeing the indexes it becomes clear that (1) in the area before the outer index, every value gets swapped in and out of the outer loop location to put them in order, and (2) at the end of the first outer loop iteration, the largest element will be at the location of the outer loop index, and so everything to the right is unchanged in each iteration.
This is nice. I had a way to highlight indexes at one point in this tool. Got rid of it when I was trying to add a way to visualize any custom algo. I should add it back. It adds more value/info to the visualization.
I actually used this algorithm a decade ago to implement a log-based, transactional database system for an embedded system with very low amount of memory and requirement that all memory be statically allocated.
To the frustration of the rest of the development team who first called me an idiot (I was new) then they could not make quicksort run as fast on inputs that were capped at something like 500 items. Apparently, algorithmic complexity isn't everything.
You sure it was this algorithm and not Bubble Sort?
> algorithmic complexity isn’t everything
Yeah very true. Or at least hopefully everyone knows that complexity analysis only applies to large n, and small inputs can change everything. In console video games it was very common to avoid dynamic allocation and to use bubble sort on small arrays. Also extremely common to avoid a sort completely and just do a linear search on the (small) array while querying, that can end up being much faster than sorting and binary searching, especially when the total number of queries or the number of queries per frame is also low.
Oh yeah, linear search I did a lot to the same consternation of other devs.
The issue was the device was so low in memory (2MB of unified NV and flash for code, files and operating memory) that there simply did not exist enough space for a lot of things to be held to be any problem for the 20MHz ARM7 controller as long as you did not do anything stupid. 600kB of it was used by OpenSSL and further 300kB by operating system anyway (I am aware of how funny it sounds when OpenSSL takes twice as much space as OS).
I fee like anyone who was surprised that algorithmic complexity isn't everything, probably didn't totally understand it. The assumptions (like ignoring constants) are straight out of calculus limits. That (+10000) on the end doesn't mean anything if you're sorting an infinite list, but it means a lot if you're sorting 15 (or in your case 500) entries.
Well, it actually is kinda worse (or better, depends how you look at it).
It is not necessarily +10000, it can also be something like x5000.
Because CPUs really, really, really like working short, simple, predictable loops that traverse data in a simple pattern and hate when it is interrupted with something like dereferencing a pointer or looking up a missing page.
So your super complex and super intelligent algorithm might actually be only good on paper but doing more harm to your TLB cache, prefetcher, branch predictor, instruction pipeline, memory density, etc.
So there is this fun question:
"You are generating k random 64 bit integers. Each time you generate the integer, you have to insert it in a sorted collection. You implement two versions of the algorithm, one with a singly-linked list and one with a flat array. In both cases you are not allowed to cheat by using any external storage (indexes, caches, fingers, etc.)
The question: in your estimation, both algorithms being implemented optimally, what magnitude k needs to be for the linked list to start being more efficient than array list."
The fun part of this question is that the answer is: NEVER.
If it looks "obviously wrong" the quick proof that it works ... the j loop will move the largest unsorted element in the array into a sorted position. Since the i loop executes the j loop n times, the array must be sorted (since the n largest elements will be in the correct order).
EDIT
^W^W^W
Nope, I'm wrong. I did a worked example on paper. I think the devious thing here is the algorithm looks simple at a lazy glance.
To sort: 4-3-6-9-1, next round pivot for the i loop in [].
I can see that it sorts everything to the left of a pivot, then because it does that n times it comes to a sorted list. A reasonable proof will be more complicated than I thought.
It is just quirky insert-sort. In each outer cycle, it inserts value originally from the A[i] position to already sorted sequence A[1]..A[i-1], while using A[i] as a temporary variable to shift higher part of the sorted sequence one position up.
The list on the left (index less than i) is always sorted. The ith element is inserted by the j loop and the rest of the list is shifted right by one element by repeated swapping with the ith position. Nothing to the right changes because the ith element is the max of the entire list, which seems to be a red herring for analysis.
Yeah, I see this as a very interesting reply to the TLA+ post. Might spend my evening diving into Ada and Spark.
Edit: Some things I noticed. The package gnat-12 does not have gnatprove. Ada mode for emacs requires a compilation step that failes with gnat community edition. With alire there is no system gnat so it cannot compile it (quite possible I'm missing something). In the end I gave up on using emacs. Gnatstudio wouldn't run for me until I realized it needed ncurses. It also had some unhandled exceptions for me (raised PROGRAM_ERROR : adjust/finalize raised GNATCOLL.VFS.VFS_INVALID_FILE_ERROR: gnatcoll-vfs.adb:340), but in the end I managed to get it up and running.
Edit2: After playing around with it, I'm extremely impressed with what spark can do. I made a function to add 7 to numbers. Tried putting a post condition that the return value is bigger than input. "Nuh uh, it could overflow". Ok so I add a pre condition that numbers must be less than 100. "Uhm, buddy you are passing in 200 right there". This is useful stuff for real and easy to write too.
I've been watching those mesmerizing YouTube videos visualizing sorting algorithms lately. The header of this article uses a screen cap from one of them.
Them: So what shows do you watch?
Me: ... It's complicated.
There are a lot of different sorting algorithms. Like, a lot, a lot.
As I watch them, I try to figure out what they were optimizing for. Some only scan in one direction. Some only use the swap operation. Some seem to do the minimum number of writes. Some are incremental performance improvements over others.
When I see an algorithm like this, I don't assume the person who wrote it was an idiot. I assume they were optimizing for something that's not obvious to me. Its only modifying operation is swap, so maybe that operation is faster than an arbitrary insert for whatever system or data structure they're using. There are no temporary variables besides loop counters, so maybe they're on a memory-constrained environment. There's barely any code here, so maybe this is for a microcontroller with precious little ROM. Or maybe they're applying this as a binary patch and they have a strict ceiling to the number of ops they can fit in.
Or maybe it's just the first sorting algorithm they could think of in an environment that doesn't ship with one and the performance is adequate for their needs. In that case, it's optimized for developer time and productivity. And honestly, it's a far more elegant algorithm than my "naive" version would be.
These are all valid reasons to use a "naive" sorting algorithm.
This is a fantastic post! It's great to have a concrete example of proving a not-trivial algorithm. And somehow this Ada code feels more approachable to me than HOL4 and other functional proof assistants I've looked at.
While it's been on my list for a while, I'm more curious to try out Ada now.
It'll take you less than a week to get proficient enough in it to make something useful, or at least interesting. It's a very straightforward language, overall.
https://learn.adacore.com/ - Pretty much the best free source until you're actually ready to commit to the language. The "Introduction to Ada" course took me maybe a week of 1-2 hours a day reading and practicing to go through. There's also a SPARK course that takes a bit longer, but is also interactive.
The language reference manuals for 2012 and 202x (which should become Ada 2022):
Two of the most interesting things that trickled down during the design of Spark2014:
1- contracts are written in the same language as the code to be checked/proved. This made important to add expressive features to the language (for all / for some: quantifiers! expression functions, if- and case-statements and more recently the new delta-aggregates notation): these additions make the language far more expressive without too much loss of readability.
2- executable contracts: most contracts can be checked at runtime or proved. And the escape hatch (code only present for proof) is 'ghost' code which is also a nice addition.
Lots of little nifty additions to the language, from just 'thinking in contracts' or 'designing for probability'.
Its requires just two induction proofs: - One that for I=1, after N comparisons the largest number is at position 1 (Proven with induction) its the base case - The other, that for any I=n+1 if we assume that the first n slots are ordered we can treat n+1 as a new array of length N-n and solve using the base case proof.
Talking about computer science and not doing the bare minimum of computer science is peak software engineering :)
By realizing the low-indexed portion is always sorted, you've already proved the algorithm yourself and the prover is just checking for bugs in your logic.
I'm not saying the proof isnt valuable, just that it's not magical and actually requires the user to understand the majority of the proof already.
The low-indexed portion is sorted, but isn't guaranteed to contain the lowest or the highest i elements of the list (except when i=1), and the list is ultimately sorted in decreasing, not increasing order. The final sort doesn't occur until the last iteration of the outer loop when the inequality is reversed (the interesting variable, j, is on the right).
Because of that, the proof outline discussed here doesn't work.
Consider what happens if the unique smallest element starts at position n. It is placed at the start of the list (the correct final position) in the final iteration of the outer loop (i=n), and not before.
Proof (for simplicity the list is indexed 1 to n):
Let A[n] < A[k] for all k != n in the initial list.
Elements are only swapped when A[i] < A[j]. If i != n and j = n, then A[i] > A[n] = A[j], so A[n] is not swapped.
Then, when i = n and j = 1, A[i] = A[n] < A[1] = A[j], so A[1] and A[n] are swapped, placing the smallest element in position 1 at last.
Solution is even simpler:
Empty case after 1 iteration (I=1) the largest number is at position 1 Base case: after 2 iterations (I=2) the 2 first elements are ordered, and the largest number is at position 2
Assume N case: after N iterations the first N numbers are ordered (within the sub list, not for the entire array) and the largest number is at position N
N+1 case (I=N+1): For Every J<N+1 and A[J]>= A[I] nothing will happen From the first J where J<N+1 and A[J] < A[I] (denote x) each cell will switch as A[J] < A[J+1] At J=N+1 the largest number will move from A[N] to A[I] For every J>N+1 Nothing will happen as the largest number is at A[I]
Not part of the proof but to make it clear we get: - for J<x A[J]<A[I] - for J=x A[J]=A[I] - for J<x A[J]>=A[I] and the list is ordered for the first J elements
After j ranges from 1 to i, it will still be the case that the values from 1 to i are sorted. So you can assume that j starts at i without disturbing the induction condition.
The reason this is true is... If A[i] is >= any element A[j] for j < i, then it is clearly true. Otherwise, there is some smallest j in 1 to i-1 inclusive such that A[i] < A[j]. Then A[i] will swap with that A[j], then it will swap again with the next A[j+1] (because the original A[j] was < A[j+1]) by the induction case, then with the next element, ... swapping with every element until j reaches i.
After this is done we have maintained the condition that A[1] .. A[i] are sorted after j ranges from 1 to i.
Deleted Comment
In fact if k of the first i elements are <= A[i+1], then the first k iterations of the inner loop will not swap, and the next i-k iterations will swap. The latter can be seen to maintain order.
The last n-i iterations of the inner loop (where j>=i) do not affect this reasoning and can be omitted, as can the first iteration of the outer loop.
Clojure code:
Then in the repl:Deleted Comment
If you compare it with both Insertion and Bubble sort. You can see it looks more like insertion sort than bubble sort.
I think it is very possible to write this algorithm by mistake in intro compsci classes when you try to code a bubble sort by heart. I would think TAs may have many such instances in their students' homework.
This one is kind of special, though, since it's somehow more offensive to intuition than bubble sort itself.
To the frustration of the rest of the development team who first called me an idiot (I was new) then they could not make quicksort run as fast on inputs that were capped at something like 500 items. Apparently, algorithmic complexity isn't everything.
> algorithmic complexity isn’t everything
Yeah very true. Or at least hopefully everyone knows that complexity analysis only applies to large n, and small inputs can change everything. In console video games it was very common to avoid dynamic allocation and to use bubble sort on small arrays. Also extremely common to avoid a sort completely and just do a linear search on the (small) array while querying, that can end up being much faster than sorting and binary searching, especially when the total number of queries or the number of queries per frame is also low.
The issue was the device was so low in memory (2MB of unified NV and flash for code, files and operating memory) that there simply did not exist enough space for a lot of things to be held to be any problem for the 20MHz ARM7 controller as long as you did not do anything stupid. 600kB of it was used by OpenSSL and further 300kB by operating system anyway (I am aware of how funny it sounds when OpenSSL takes twice as much space as OS).
It is not necessarily +10000, it can also be something like x5000.
Because CPUs really, really, really like working short, simple, predictable loops that traverse data in a simple pattern and hate when it is interrupted with something like dereferencing a pointer or looking up a missing page.
So your super complex and super intelligent algorithm might actually be only good on paper but doing more harm to your TLB cache, prefetcher, branch predictor, instruction pipeline, memory density, etc.
So there is this fun question:
"You are generating k random 64 bit integers. Each time you generate the integer, you have to insert it in a sorted collection. You implement two versions of the algorithm, one with a singly-linked list and one with a flat array. In both cases you are not allowed to cheat by using any external storage (indexes, caches, fingers, etc.)
The question: in your estimation, both algorithms being implemented optimally, what magnitude k needs to be for the linked list to start being more efficient than array list."
The fun part of this question is that the answer is: NEVER.
Deleted Comment
EDIT
^W^W^W
Nope, I'm wrong. I did a worked example on paper. I think the devious thing here is the algorithm looks simple at a lazy glance.
To sort: 4-3-6-9-1, next round pivot for the i loop in [].
I can see that it sorts everything to the left of a pivot, then because it does that n times it comes to a sorted list. A reasonable proof will be more complicated than I thought.Does anyone here also use SPARK for this sort of thing? Are there other formal methods tools you'd use if you had to prove something like this?
Edit: Some things I noticed. The package gnat-12 does not have gnatprove. Ada mode for emacs requires a compilation step that failes with gnat community edition. With alire there is no system gnat so it cannot compile it (quite possible I'm missing something). In the end I gave up on using emacs. Gnatstudio wouldn't run for me until I realized it needed ncurses. It also had some unhandled exceptions for me (raised PROGRAM_ERROR : adjust/finalize raised GNATCOLL.VFS.VFS_INVALID_FILE_ERROR: gnatcoll-vfs.adb:340), but in the end I managed to get it up and running.
Edit2: After playing around with it, I'm extremely impressed with what spark can do. I made a function to add 7 to numbers. Tried putting a post condition that the return value is bigger than input. "Nuh uh, it could overflow". Ok so I add a pre condition that numbers must be less than 100. "Uhm, buddy you are passing in 200 right there". This is useful stuff for real and easy to write too.
It also explains how to install GNATprove: ``alr with gnatprove``
Them: So what shows do you watch?
Me: ... It's complicated.
There are a lot of different sorting algorithms. Like, a lot, a lot.
As I watch them, I try to figure out what they were optimizing for. Some only scan in one direction. Some only use the swap operation. Some seem to do the minimum number of writes. Some are incremental performance improvements over others.
When I see an algorithm like this, I don't assume the person who wrote it was an idiot. I assume they were optimizing for something that's not obvious to me. Its only modifying operation is swap, so maybe that operation is faster than an arbitrary insert for whatever system or data structure they're using. There are no temporary variables besides loop counters, so maybe they're on a memory-constrained environment. There's barely any code here, so maybe this is for a microcontroller with precious little ROM. Or maybe they're applying this as a binary patch and they have a strict ceiling to the number of ops they can fit in.
Or maybe it's just the first sorting algorithm they could think of in an environment that doesn't ship with one and the performance is adequate for their needs. In that case, it's optimized for developer time and productivity. And honestly, it's a far more elegant algorithm than my "naive" version would be.
These are all valid reasons to use a "naive" sorting algorithm.
While it's been on my list for a while, I'm more curious to try out Ada now.
https://learn.adacore.com/ - Pretty much the best free source until you're actually ready to commit to the language. The "Introduction to Ada" course took me maybe a week of 1-2 hours a day reading and practicing to go through. There's also a SPARK course that takes a bit longer, but is also interactive.
The language reference manuals for 2012 and 202x (which should become Ada 2022):
http://www.ada-auth.org/standards/ada12.html
http://www.ada-auth.org/standards/ada2x.html
1- contracts are written in the same language as the code to be checked/proved. This made important to add expressive features to the language (for all / for some: quantifiers! expression functions, if- and case-statements and more recently the new delta-aggregates notation): these additions make the language far more expressive without too much loss of readability.
2- executable contracts: most contracts can be checked at runtime or proved. And the escape hatch (code only present for proof) is 'ghost' code which is also a nice addition.
Lots of little nifty additions to the language, from just 'thinking in contracts' or 'designing for probability'.
https://news.ycombinator.com/item?id=28758106 (318 comments)