Readit News logoReadit News
ghj · 6 years ago
It's really hard to write correct code.

Sorting was broken in java and nobody noticed for a long time: http://envisage-project.eu/wp-content/uploads/2015/02/sortin...

The same was true for java's binary search: https://ai.googleblog.com/2006/06/extra-extra-read-all-about...

So I am not sure I will ever trust an ML algorithm trained on inputs/outputs only (which is what I think "neural program induction" means). The above bugs are only hit because the programmers who wrote it didn't think about overflow. What is this "neural programmer" assuming about their inputs? We'll never know.

OTOH the standard library sort can be improved if you know the distribution of the numbers you're sorting (e.g., small numbers can bucket sort, small lists can use a sorting network, etc). If this thing can efficiently almost-correctly-sort because it's better at these types of pattern matching, we can just run a final pass of insertion sort to make it useful!

malisper · 6 years ago
> OTOH the standard library sort can be improved if you know the distribution of the numbers you're sorting (e.g., small numbers can bucket sort, small lists can use a sorting network, etc). If this thing can efficiently almost-correctly-sort because it's better at these types of pattern matching, we can just run a final pass of insertion sort to make it useful!

I haven't read this paper, but that strategy was exactly the strategy behind "The Case for Learned Index Structures"[0]. The idea was you could train a model to predict the location of a row based on a primary key. After computing the model, you could then go through every row and compute a bound on how far off the model is from the actual row. To perform a lookup, you start off at the model's guess, then look around based on the bound on the model for a row with the given primary key.

[0] https://arxiv.org/abs/1712.01208

eru · 6 years ago
> If this thing can efficiently almost-correctly-sort because it's better at these types of pattern matching, we can just run a final pass of insertion sort to make it useful!

Depends on what kinds of mistakes the almost-correct approach makes. If it just puts elements in the wrong order, your suggestion works. But if it makes mistakes like duplicating elements, or dropping them, or completely making up new entries, no post-processing will help.

> OTOH the standard library sort can be improved if you know the distribution of the numbers you're sorting (e.g., small numbers can bucket sort, small lists can use a sorting network, etc).

Even better: the standard library sort usually assumes only comparison. For special cases like sorting on integer keys you can conceivably do better.

ogogmad · 6 years ago
> Depends on what kinds of mistakes the almost-correct approach makes. If it just puts elements in the wrong order, your suggestion works. But if it makes mistakes like duplicating elements, or dropping them, or completely making up new entries, no post-processing will help.

The algorithm modifies the original list only by swapping entries. As such, it's guaranteed to result in a permutation.

im3w1l-alt · 6 years ago
If you let the number of objects to sort go to infinity while keeping size of objects in bits capped, then they will behave like integers.
mpfundstein · 6 years ago
IMO its not about replacing deterministic algorithms used in van Neuman machines (even though that would be a nice side effect).

I feel that if we ever want to build an intelligent problem solver machine, we need some facility for abstract reasoning and work like this papet is a major contributor towards that goal. (if its reproducible and valid ofc...)

pansa2 · 6 years ago
> I am not sure I will ever trust an ML algorithm trained on inputs/outputs only

It’s doing TDD!

jtms · 6 years ago
Imagine a world where all programmers do is write the tests and the generalized learning algos write all the implementations.
nardi · 6 years ago
Next step is to prove the ML-produced algorithm is correct. I can see a world where we generate algorithms and then automatically prove they're correct based on a formal description of the problem being solved.
bufferoverflow · 6 years ago
It doesn't need to be 100% correct. There's a place for fast sorting algorithms which are correct most of the time - basically anything non-critical, like sorting comments by upvotes.

According to the article, they couldn't find a case where it wasn't correct.

adrianN · 6 years ago
We suck really hard at proving code correct that was written explicitly with the goal in mind to be formally verified. The techniques there are still quite immature. I wonder how long it will take until we have a reasonably working pipeline of ML+formal verification. Maybe it's just easier to generate correct code from the spec than to learn it from examples and then verifying it with the specification?
rusticpenn · 6 years ago
Well designed ML techniques are always correct 99% of the time. Its the 1% that is the problem.
Cthulhu_ · 6 years ago
Call me naive but, wouldn't feeding it random input and expected output sorted by existing and proven (but slower) sorting algorithms do the trick?
Shorel · 6 years ago
It would be funny if a two phase approach to sorting would result in increased average performance.

First sort using the ML algorithm. Then sort the almost sorted result with a traditional algorithm optimized for almost sorted results.

In fact, I think my own brain does the two phase approach to many problems.

_puk · 6 years ago
Could you train the ML algorithm to select the best traditional algorithm based on the distribution of the items being sorted?!
User23 · 6 years ago
> The same was true for java's binary search

I used to ask "implement binary search" as an interview question and gave up on using it because not a single candidate could do it correctly in 45 minutes. This was at a FAANG.

vertbhrtn · 6 years ago
Ha, I can get an offer from any of the FAANG without even trying and can comfortably say that not only I failed the binary search question in past when interviews were 1 hour long, I would still fail it now.
Der_Einzige · 6 years ago
I don't believe you. Surely, after a few hints, the FAANG canidates must have been able to figure out the classic gotcha of this question (why does this sometimes overflow and how do you fix {high + low / 2})

If they didn't, than I believe you did a poor job of guiding them. I refuse to believe that this is some esoteric, tough question. If you can solve dynamic programming problems, you can easily solve this one.

DaiPlusPlus · 6 years ago
Apparently modern GPUs have sorting-networks implemented in hardware (giving you effectively constant-time sorting for fixed-size input) - which may be preferable for use if you’re processing hostile input which may be contrived to hit QuickSort’s worst-case and the cost of hitting the PCI Express bus is less than the cost of randomizing the input to mitigate QuickSort attacks.

But yes - I agree it’s weird that many standard-libraries (looking at you, .NET!) have only a single `Sort` method oh their vector-types with the details of the implementation buried in the documentation instead of shipping with a first-class sorting-algorithms library.

david-gpu · 6 years ago
GPUs having hardware explicitly for sorting sounds a bit odd, when you can efficiently implement sorting networks in software on the same GPU.

If you have a source, I would like to learn more.

KajMagnus · 6 years ago
> "just run a final pass of insertion sort to make it useful!"

(And if this insertion sort won't complete in O(n), say <= 3n, then, fallback to Quicksort, so won't turn into O(n^2) just because the ML alg hit a corner case)

mkl · 6 years ago
Possibly better would be Timsort (used in Python, Java, V8, etc.), as it's "designed to take advantage of runs of consecutive ordered elements that already exist": https://en.wikipedia.org/wiki/Timsort
YeGoblynQueenne · 6 years ago
>> So I am not sure I will ever trust an ML algorithm trained on inputs/outputs only (which is what I think "neural program induction" means).

According to the authors, the proposed approach is notable for training not only on input/output pairs but also on program traces. In addition, reinforcement learning is used to train e.g. sorting agents on the behaviour of "teacher agents" that comprise hand-crafted implementations of well-known sorting algorithms (bubble- insertion- and quick-sort).

eli_gottlieb · 6 years ago
>So I am not sure I will ever trust an ML algorithm trained on inputs/outputs only (which is what I think "neural program induction" means).

Pfffft, just wait until they get it writing code with its own proofs of correctness in dependent type theory. It'll be correct and twice as unreadable as incorrect human-written code!

riedel · 6 years ago
Why not check in $n$ on a sorted if the postcondition holds or raise an exception otherwise. You can even generate a nearly sorted list and run a verified bubblesort. Don't see big problems if ML is used carefully.
xiphias2 · 6 years ago
The code (which is a few percent faster than quicksort) is on the last page:

  procedure QUICKSORTAGENT(input state)
  2: Let i = 1, j = 2, l = 3, h = 4
  3: if FunctionID = None then
  4: return vh ← Function1(vl ← vl, vh ← vh)
  5: else if FunctionID = 1 then . QuickSort
  6: if vl < vh then
  7: if prev = None then
  8: return vi ← Function2(vl ← vl, vh ← vh)
  9: else if prev = (vi ← Function2(vl ← vl, vh ← vh)) then
  10: return AssignVar(j, i)
  11: else if prev = AssignVar(j, i) then
  12: return MoveVar(i, -1)
  13: else if prev = MoveVar(i, -1) then
  14: if vi > vl then
  15: return vi ← Function1(vl ← vl, vh ← vi)
  16: else
  17: return MoveVar(j, +1)
  18: end if
  19: else if prev = (vi ← Function1(vl ← vl, vh ← vi)) t hen 
  20: return MoveVar(j, +1)
  21: else if prev = MoveVar(j, +1) and vj < vh then
  22: return vh ← Function1(vl ← vj , vh ← vh)
  23: else
  24: return Return(h)
  25: end if
  26: else
  27: return Return(h)
  28: end if
  29: else . Function ID = 2, Partition
  30: if prev = None then
  31: return AssignVar(i, l)
  32: else if prev = AssignVar(i, l) then
  33: return AssignVar(j, l)
  34: else if vj < vh then
  35: if prev = Swap(i, j) then
  36: return MoveVar(i, +1)
  37: else if (prev = AssignVar(j, l) or prev = MoveVar(j, 
  +1)) and A[vj ] < A[vh] then
  38: if vi 6= vj then
  39: return Swap(i, j)
  40: else
  41: return MoveVar(i, +1)
  42: end if
  43: else
  44: return MoveVar(j, +1)
  45: end if
  46: else if prev = MoveVar(j, +1) then
  47: return Swap(i, h)
  48: else
  49: return Return(i)
  50: end if
  51: end if
  52: end procedure

collyw · 6 years ago
This reminds me of programming basic on an Acorn Electron. Line numbers and I don't remember indentation, though it was a few decades ago.
dspillett · 6 years ago
The Electron had a new enough version of BBC Basic that "proper" procedures and UDFs were supported (in a limited fashion, but well enough considering the constraints of the machine) so you could pretty much ignore line numbers, in fact by using a text editor and TYPEing the result into BASIC you could write code without them (the interpreter needed them, the TYPE trick made it put them in for itself).

But yes, this was uncommon so line numbers were a major thing.

And indentation was optional, you could add extra spaces to the start of lines and they would be kept and would have no effect on execution, but you would be wasting precious bytes of RAM and if using one of the heavier display modes (0, 1 or 2) you would only have 8.5Kbyte to play with for all the code and run-time storage (your variables, BASIC's call stack).

misja111 · 6 years ago
The source code in the PDF has indentation, it is a bit easier to read.
quickthrower2 · 6 years ago
Ready for that COPY key!
pX0r · 6 years ago
Was this "discovered" by their system or was it hand-crafted by the authors from their custom instruction set elements ?
YeGoblynQueenne · 6 years ago
No, that is the training agent implemented as an instance of quicksort. I don't know why the OP reports that it's "a few percent faster than quicksort" because I can't find where that is claimed in the paper.

In fact, as far as I understand it the paper claims that the learned model (i.e. the student agent) has learned to reproduce the behaviour of this teacher agent after training for a smaller number of steps on average than what the teacher needs to sort a list. That is what the entire claim about superior efficiency rests on (there is no example of a learned model and no asymptotic analysis of the program such a model represents).

rightbyte · 6 years ago
As I understand it they designed quicksort like that to be able to train with it. It is quite clear from the video where it is called "quick sort agent" compared to the model one and function1 and function2 is in the stack trace.

"We found adding the previously executed action to the input state st is sufficient to handle dis-ambiguation for this quick sort implementation. Alg. 8 shows the converted quick sort scripted agent"

rightbyte · 6 years ago
Quite dull research wrapped in fancy words.

Essentially they generate functions that map a program state to another program state and counts those function calls and compares with e.g. the number of function calls in quick sort. They "cheat" by feeding how all elements compares the their neighbours at each step. Like, if you give the algorithm that input for free what's the point.

Notably, none of the popular sorting algorithms decide which elements to swap by looking at the whole input at each execution step. On the contrary, the decisions are typically made based on local evidence only. ... We implement this principle for the sorting task by employing a small set of k(independent of n) index variables, and include in the input the information about how A[i] compares with A[i+ 1] and A[i−1]for each index variable i.

YeGoblynQueenne · 6 years ago
>> Like, if you give the algorithm that input for free what's the point.

The point, as also explained in the paper, is that even with such information and with a handful of basic operations from which to compose a program, the space of all possible programs is huge and finding a prorgam that does what it should in any reasonable amount of time is extremely difficult.

If you are familiar with the literature of program induction (which includes a lot more than neural program induction and goes back a few decades) you will notice that the hypothesis spaces for program learning are truly vast and it is impossible to make any kind of progress without some kind of "dirty trick" to reduce their size. The common approach is to impose a strong inductive bias on the representation language, so as to guide the search to find a correct program. The approach in the article is no different. And of course "inductive bias" here always means domain knowledge.

But, amicably, if you think there's any better way to do this, many people, myself included, would be very curious to hear what you have to say. In fact, if you could really innovate in this field you'd be guaranteed an illustrious research career. Learning programs from examples is hard.

EchoAce · 6 years ago
This is absolutely not cheating; every hand designed algorithm can access and compare everything in the array!

The real point of what they’re saying in your italicized quote is actually that giving the net full access hinders efficiency, so they actually restrict it. Almost like the opposite of cheating.

rightbyte · 6 years ago
It is not a pointer to an array I'm concerned about but the "neighbour diff vector" or what you should call it that provided by the "environment". See A.1.2.

Doing so many comparisons and storing them has a cost. Also the model can't decide if it is done so at each step the array has to be iterated to see if it is sorted by the "environment". Are they only counting function calls? I guess so. The paper is really hard to follow and the pseudocode syntax is quite madding.

If I understand the paper correctly of course I could be wrong.

If so, "our approach can learn to outperform custom-written solutions for a variety of problems", is bogus.

ogogmad · 6 years ago
> They "cheat" by feeding how all elements compares the their neighbours at each step. Like, if you give the algorithm that input for free what's the point.

How is that "cheating"? The algorithm is still learning how to do a comparison-sort, which has direct applications (assuming it performs well).

lone-commenter · 6 years ago
So do they feed such information only when training the system?
FabHK · 6 years ago
> none of the popular sorting algorithms decide which elements to swap by looking at the whole input at each execution

Yes. In particular, the O(n log n) lower bound holds for comparison sorts, that is algorithms that rely on pairwise comparison only.

MichaelBurge · 6 years ago
"all" of a constant number k of such inputs, not "all" of the entire input. It's not possible to feed a neural network an arbitrarily-large number of inputs in one batch.
rightbyte · 6 years ago
My error it seemed to be a reference implementation with all. Is it correct that they feed a vector of 68 such comparisons in the window inferface?
me551ah · 6 years ago
According to the limitations stated in the doc.

1. It runs slower on modern CPUs and you need neural logic units to see the speed benefits

2. Model needs the be fined tuned to the data for best performance.

While it may be better than quicksort in certain usecases it isn't going to replace it anytime soon.

sinuhe69 · 6 years ago
"Model needs the be fined tuned to the data for best performance." -> typical :D
spyckie2 · 6 years ago
This is pretty cool, pardon my limited understanding, trying to see if I get this correctly:

1) they train a model to do sorting, it actually sorts correctly. 2) they optimize the model on efficiency and it becomes better than many custom sort functions?

If I remember correctly from school, you can basically speed up any kind of sorting by using more space (basically by using a hash table). Is the neural network just using the normal sort algo until it sees a pattern in the input and then skips the sort algo for the pattern output? Or to put it plainly, is it just a smarter (or dumber, depending on the size of the neural network) hash table?

radicalbyte · 6 years ago
The question is how does the algorithm perform on average, what are the pathological cases and how slow are they?

How do you prove that with an algorithm that no human can reason about?

Basically this paper looks like they've found an algorithm which is efficient for given sets or given classes of sets. Whether that generalizes is a different problem.

It's basically a cool automatic heuristic generator.

Deleted Comment

seek3r00 · 6 years ago
Exactly, that’s what I thought. Although, they should be able to look at the set of instructions that generated the output, which is basically an algorithm in itself. Then they could try to prove whether that algorithm would really generalise.
collyw · 6 years ago
> How do you prove that with an algorithm that no human can reason about?

This is something I see becoming a problem in the future for machine learning.

seek3r00 · 6 years ago
The model is given the input and a set of instructions (e.g. swap) for producing the output. Essentially, it’s taking a permutation of the instructions that minimise the running time, based on some underlying patterns in the data.
woeirua · 6 years ago
Eh... Look I get that everyone wants to publish exciting results, but to call this unequivocally "better" than quick sort is a big stretch. The profiling results are unconvincing at best. A small, < 10% improvement, could easily (and much more likely) be explained by any number of possible differences in optimization between the quicksort implementation they're using and the neural network library.

Table 3 is very suspicious too. If they're doing what they claim, then why is Binary Search all of a sudden significantly better than the learned agent for some problem sizes? It really feels like they're running up against some kind of inherent optimization boundary inside their neural network library...

curiousgal · 6 years ago
Better as in lower number of instructions not lower wall clock time.
seek3r00 · 6 years ago
Some interesting extracts:

“The instruction set together with the input representations jointly determine the class of algorithms that are learnable by the neural controller, and we see this as a fruitful avenue for future research akin to how current instruction sets shaped microprocessors.”

“The generalization or correctness we aim for is mostly about generalization to instances of arbitrary sizes.”

“[...] computing a = f(s) can be more expensive on current CPUs than executing typical computation employed in the algorithms studied here. We thus hope that this research will motivate future CPUs to have “Neural Logic Units” to implement such functions f fast and efficiently, effectively extending their instruction set, and making such approaches feasible.”