Neural programmer better than Quicksort

It's really hard to write correct code.

Sorting was broken in java and nobody noticed for a long time: http://envisage-project.eu/wp-content/uploads/2015/02/sortin...

The same was true for java's binary search: https://ai.googleblog.com/2006/06/extra-extra-read-all-about...

So I am not sure I will ever trust an ML algorithm trained on inputs/outputs only (which is what I think "neural program induction" means). The above bugs are only hit because the programmers who wrote it didn't think about overflow. What is this "neural programmer" assuming about their inputs? We'll never know.

OTOH the standard library sort can be improved if you know the distribution of the numbers you're sorting (e.g., small numbers can bucket sort, small lists can use a sorting network, etc). If this thing can efficiently almost-correctly-sort because it's better at these types of pattern matching, we can just run a final pass of insertion sort to make it useful!

malisper · 6 years ago

> OTOH the standard library sort can be improved if you know the distribution of the numbers you're sorting (e.g., small numbers can bucket sort, small lists can use a sorting network, etc). If this thing can efficiently almost-correctly-sort because it's better at these types of pattern matching, we can just run a final pass of insertion sort to make it useful!

I haven't read this paper, but that strategy was exactly the strategy behind "The Case for Learned Index Structures"[0]. The idea was you could train a model to predict the location of a row based on a primary key. After computing the model, you could then go through every row and compute a bound on how far off the model is from the actual row. To perform a lookup, you start off at the model's guess, then look around based on the bound on the model for a row with the given primary key.

[0] https://arxiv.org/abs/1712.01208

eru · 6 years ago

> If this thing can efficiently almost-correctly-sort because it's better at these types of pattern matching, we can just run a final pass of insertion sort to make it useful!

Depends on what kinds of mistakes the almost-correct approach makes. If it just puts elements in the wrong order, your suggestion works. But if it makes mistakes like duplicating elements, or dropping them, or completely making up new entries, no post-processing will help.

> OTOH the standard library sort can be improved if you know the distribution of the numbers you're sorting (e.g., small numbers can bucket sort, small lists can use a sorting network, etc).

Even better: the standard library sort usually assumes only comparison. For special cases like sorting on integer keys you can conceivably do better.

ogogmad · 6 years ago

> Depends on what kinds of mistakes the almost-correct approach makes. If it just puts elements in the wrong order, your suggestion works. But if it makes mistakes like duplicating elements, or dropping them, or completely making up new entries, no post-processing will help.

The algorithm modifies the original list only by swapping entries. As such, it's guaranteed to result in a permutation.

im3w1l-alt · 6 years ago

If you let the number of objects to sort go to infinity while keeping size of objects in bits capped, then they will behave like integers.

mpfundstein · 6 years ago

IMO its not about replacing deterministic algorithms used in van Neuman machines (even though that would be a nice side effect).

I feel that if we ever want to build an intelligent problem solver machine, we need some facility for abstract reasoning and work like this papet is a major contributor towards that goal. (if its reproducible and valid ofc...)

pansa2 · 6 years ago

> I am not sure I will ever trust an ML algorithm trained on inputs/outputs only

It’s doing TDD!

jtms · 6 years ago

Imagine a world where all programmers do is write the tests and the generalized learning algos write all the implementations.

nardi · 6 years ago

Next step is to prove the ML-produced algorithm is correct. I can see a world where we generate algorithms and then automatically prove they're correct based on a formal description of the problem being solved.

bufferoverflow · 6 years ago

It doesn't need to be 100% correct. There's a place for fast sorting algorithms which are correct most of the time - basically anything non-critical, like sorting comments by upvotes.

According to the article, they couldn't find a case where it wasn't correct.

adrianN · 6 years ago

We suck really hard at proving code correct that was written explicitly with the goal in mind to be formally verified. The techniques there are still quite immature. I wonder how long it will take until we have a reasonably working pipeline of ML+formal verification. Maybe it's just easier to generate correct code from the spec than to learn it from examples and then verifying it with the specification?

rusticpenn · 6 years ago

Well designed ML techniques are always correct 99% of the time. Its the 1% that is the problem.

Cthulhu_ · 6 years ago

Call me naive but, wouldn't feeding it random input and expected output sorted by existing and proven (but slower) sorting algorithms do the trick?

Shorel · 6 years ago

It would be funny if a two phase approach to sorting would result in increased average performance.

First sort using the ML algorithm. Then sort the almost sorted result with a traditional algorithm optimized for almost sorted results.

In fact, I think my own brain does the two phase approach to many problems.

_puk · 6 years ago

Could you train the ML algorithm to select the best traditional algorithm based on the distribution of the items being sorted?!

User23 · 6 years ago

> The same was true for java's binary search

I used to ask "implement binary search" as an interview question and gave up on using it because not a single candidate could do it correctly in 45 minutes. This was at a FAANG.

vertbhrtn · 6 years ago

Ha, I can get an offer from any of the FAANG without even trying and can comfortably say that not only I failed the binary search question in past when interviews were 1 hour long, I would still fail it now.

Der_Einzige · 6 years ago

I don't believe you. Surely, after a few hints, the FAANG canidates must have been able to figure out the classic gotcha of this question (why does this sometimes overflow and how do you fix {high + low / 2})

If they didn't, than I believe you did a poor job of guiding them. I refuse to believe that this is some esoteric, tough question. If you can solve dynamic programming problems, you can easily solve this one.

DaiPlusPlus · 6 years ago

Apparently modern GPUs have sorting-networks implemented in hardware (giving you effectively constant-time sorting for fixed-size input) - which may be preferable for use if you’re processing hostile input which may be contrived to hit QuickSort’s worst-case and the cost of hitting the PCI Express bus is less than the cost of randomizing the input to mitigate QuickSort attacks.

But yes - I agree it’s weird that many standard-libraries (looking at you, .NET!) have only a single `Sort` method oh their vector-types with the details of the implementation buried in the documentation instead of shipping with a first-class sorting-algorithms library.

david-gpu · 6 years ago

GPUs having hardware explicitly for sorting sounds a bit odd, when you can efficiently implement sorting networks in software on the same GPU.

If you have a source, I would like to learn more.

KajMagnus · 6 years ago

> "just run a final pass of insertion sort to make it useful!"

(And if this insertion sort won't complete in O(n), say <= 3n, then, fallback to Quicksort, so won't turn into O(n^2) just because the ML alg hit a corner case)

mkl · 6 years ago

Possibly better would be Timsort (used in Python, Java, V8, etc.), as it's "designed to take advantage of runs of consecutive ordered elements that already exist": https://en.wikipedia.org/wiki/Timsort

YeGoblynQueenne · 6 years ago

>> So I am not sure I will ever trust an ML algorithm trained on inputs/outputs only (which is what I think "neural program induction" means).

According to the authors, the proposed approach is notable for training not only on input/output pairs but also on program traces. In addition, reinforcement learning is used to train e.g. sorting agents on the behaviour of "teacher agents" that comprise hand-crafted implementations of well-known sorting algorithms (bubble- insertion- and quick-sort).

eli_gottlieb · 6 years ago

>So I am not sure I will ever trust an ML algorithm trained on inputs/outputs only (which is what I think "neural program induction" means).

Pfffft, just wait until they get it writing code with its own proofs of correctness in dependent type theory. It'll be correct and twice as unreadable as incorrect human-written code!

riedel · 6 years ago

Why not check in $n$ on a sorted if the postcondition holds or raise an exception otherwise. You can even generate a nearly sorted list and run a verified bubblesort. Don't see big problems if ML is used carefully.

procedure QUICKSORTAGENT(input state) 2: Let i = 1, j = 2, l = 3, h = 4 3: if FunctionID = None then 4: return vh ← Function1(vl ← vl, vh ← vh) 5: else if FunctionID = 1 then . QuickSort 6: if vl < vh then 7: if prev = None then 8: return vi ← Function2(vl ← vl, vh ← vh) 9: else if prev = (vi ← Function2(vl ← vl, vh ← vh)) then 10: return AssignVar(j, i) 11: else if prev = AssignVar(j, i) then 12: return MoveVar(i, -1) 13: else if prev = MoveVar(i, -1) then 14: if vi > vl then 15: return vi ← Function1(vl ← vl, vh ← vi) 16: else 17: return MoveVar(j, +1) 18: end if 19: else if prev = (vi ← Function1(vl ← vl, vh ← vi)) t hen 20: return MoveVar(j, +1) 21: else if prev = MoveVar(j, +1) and vj < vh then 22: return vh ← Function1(vl ← vj , vh ← vh) 23: else 24: return Return(h) 25: end if 26: else 27: return Return(h) 28: end if 29: else . Function ID = 2, Partition 30: if prev = None then 31: return AssignVar(i, l) 32: else if prev = AssignVar(i, l) then 33: return AssignVar(j, l) 34: else if vj < vh then 35: if prev = Swap(i, j) then 36: return MoveVar(i, +1) 37: else if (prev = AssignVar(j, l) or prev = MoveVar(j, +1)) and A[vj ] < A[vh] then 38: if vi 6= vj then 39: return Swap(i, j) 40: else 41: return MoveVar(i, +1) 42: end if 43: else 44: return MoveVar(j, +1) 45: end if 46: else if prev = MoveVar(j, +1) then 47: return Swap(i, h) 48: else 49: return Return(i) 50: end if 51: end if 52: end procedure

Quite dull research wrapped in fancy words.

Essentially they generate functions that map a program state to another program state and counts those function calls and compares with e.g. the number of function calls in quick sort. They "cheat" by feeding how all elements compares the their neighbours at each step. Like, if you give the algorithm that input for free what's the point.

Notably, none of the popular sorting algorithms decide which elements to swap by looking at the whole input at each execution step. On the contrary, the decisions are typically made based on local evidence only. ... We implement this principle for the sorting task by employing a small set of k(independent of n) index variables, and include in the input the information about how A[i] compares with A[i+ 1] and A[i−1]for each index variable i.

YeGoblynQueenne · 6 years ago

>> Like, if you give the algorithm that input for free what's the point.

The point, as also explained in the paper, is that even with such information and with a handful of basic operations from which to compose a program, the space of all possible programs is huge and finding a prorgam that does what it should in any reasonable amount of time is extremely difficult.

If you are familiar with the literature of program induction (which includes a lot more than neural program induction and goes back a few decades) you will notice that the hypothesis spaces for program learning are truly vast and it is impossible to make any kind of progress without some kind of "dirty trick" to reduce their size. The common approach is to impose a strong inductive bias on the representation language, so as to guide the search to find a correct program. The approach in the article is no different. And of course "inductive bias" here always means domain knowledge.

But, amicably, if you think there's any better way to do this, many people, myself included, would be very curious to hear what you have to say. In fact, if you could really innovate in this field you'd be guaranteed an illustrious research career. Learning programs from examples is hard.

EchoAce · 6 years ago

This is absolutely not cheating; every hand designed algorithm can access and compare everything in the array!

The real point of what they’re saying in your italicized quote is actually that giving the net full access hinders efficiency, so they actually restrict it. Almost like the opposite of cheating.

rightbyte · 6 years ago

It is not a pointer to an array I'm concerned about but the "neighbour diff vector" or what you should call it that provided by the "environment". See A.1.2.

Doing so many comparisons and storing them has a cost. Also the model can't decide if it is done so at each step the array has to be iterated to see if it is sorted by the "environment". Are they only counting function calls? I guess so. The paper is really hard to follow and the pseudocode syntax is quite madding.

If I understand the paper correctly of course I could be wrong.

If so, "our approach can learn to outperform custom-written solutions for a variety of problems", is bogus.

ogogmad · 6 years ago

> They "cheat" by feeding how all elements compares the their neighbours at each step. Like, if you give the algorithm that input for free what's the point.

How is that "cheating"? The algorithm is still learning how to do a comparison-sort, which has direct applications (assuming it performs well).

lone-commenter · 6 years ago

So do they feed such information only when training the system?

FabHK · 6 years ago

> none of the popular sorting algorithms decide which elements to swap by looking at the whole input at each execution

Yes. In particular, the O(n log n) lower bound holds for comparison sorts, that is algorithms that rely on pairwise comparison only.

MichaelBurge · 6 years ago

"all" of a constant number k of such inputs, not "all" of the entire input. It's not possible to feed a neural network an arbitrarily-large number of inputs in one batch.

rightbyte · 6 years ago

My error it seemed to be a reference implementation with all. Is it correct that they feed a vector of 68 such comparisons in the window inferface?