I was one of the members who reviewed expertly what has been done both in sorting and hashing. Overall it's more about assembly, finding missed compiler optimizations and balancing between correctness and distribution (in hashing in particular).
It was not revolutionary in a sense it hasn't found completely new approaches but converged to something incomprehensible for humans but relatively good for performance which proves the point that optimal programs are very inhuman.
Note that for instructions in sorting, removing them does not always lead to better performance, for example, instructions can run in parallel and the effect can be less profound. Benchmarks can lie and compiler could do something differently when recompiling the sort3 function which was changed.
For hashing it was even funnier, very small strings up to 64 bit already used 3 instructions like add some constant -> multiply 64x64 -> xor upper/lower. For bigger ones the question becomes more complicated, that's why 9-16 was a better spot and it simplified from 2 multiplications to just one and a rotation. Distribution on real workloads was good, it almost passed smhasher and we decided it was good enough to try out in prod. We did not rollback as you can see from abseil :)
But even given all that, it was fascinating to watch how this system was searching and was able to find particular programs can be further simplified. Kudos to everyone involved, it's a great incremental change that can bring more results in the future.
This sounds like a more intelligent version of superoptimization. The original Masselin SO, albeit for its time, also created surprising results which is similar to AlphaDev's incomprehensible for humans. You see the same thing in computer chess which Agadmator calls disgusting engine lines.
I'm disappointed a the hashing is just based on training on microbenchmarks and SMHasher, rather than designing a fast _provably_ universal hash.
Suites like SMHasher are never complete. They are just trying to catch the most common weaknesses. If you train on the test cases you'll only get an algorithm that passes the tests, but people can always find a set of values on which you will do badly.
I think you're confusing proving that the hash function is collision resistant with the other goal which is hashing speed. If you really need a collision resistant hash you need to use a cryptographic hash function, but outside of cryptographic applications that is rarely the requirement. And (huge caveat, this isn't my domain expertise) I'm not sure what security properties are really "proven" about existing cryptographic hash functions, AFAIK existing cryptographic hash functions are considered secure because we don't know how to break them, not because of some fundamental mathematical property about them.
For the other 99.999% of hashing applications there is a balance between collision resistance and hashing latency. For example, in a hash table (probably the most common use for a non-cryptographic hash function) there is a cost incurred by hash collisions because lookups on keys with collisions may have to do extra probing. On the other hand, every hash table lookup requires doing at least one hash operation, regardless of whether or not it collides. So it may make sense to have a slightly worse hash function (in the sense that it is more likely to have collisions with pathological inputs) if it has slightly lower latency. The only way to really know what is faster for a real world application is to have some kind of benchmark to train against as a loss function.
Indeed, and this has been the case for quite a while now. You can always improve on some general algorithm by taking advantage of knowledge of the data but that never generalizes and usually leads to either worse performance on other data and/or new pathological cases that result in results that are unusable.
To what extent is this simply working around the weirdness of x86? Do these improvements apply to something like MIPS, ARM64, or RISC-V that have inherently simpler ISAs?
In this particular case they were universal but in paper it's said the optimizations were done on x86. One of the ideas was to use LLVM IR but intuition for optimizer over optimizer was unlikely to work properly.
> 9-16 was a better spot and it simplified from 2 multiplications to just one and a rotation
I'm very confused as to why rotation was at all useful. Xoring with a random-ish constant makes sense, because the constant has high entropy and is likely to decorrelate bits from the input (also can use a different constant per hash table). But rotating by a constant—and a fixed one at that—seems like it just accounts for expected input distribution. Especially (assuming this is intended for text) if shifting by a value >8 makes a significant difference (vs shifting by the same value mod 8), it smells like serious overfit. Could be useful for something like a perfect hash, but seems problematic and prone to issues as a general hash.
Edit: to make my objection clearer: the hash simply replaces lo with rotr(lo, 53). If rotr(lo, 53) performs significantly better than rotr(lo, 53 mod 8), then that implies the following. I can take a set of strings, and I can apply the same permutation to the characters of all of the strings in the set, and this will significantly affect the quality of the hash function. That seems like an undesirable property, even for a non-cryptographic hash.
Does that end up moving high bits into the low bits? That could possibly be very helpful for all sets of strings, since the equality test will start with the first byte, so it could better to have the rotr on the hash so that the hash is less affected by the first byte (and more affected by later bytes). Just hypothetically speaking that is where the implication could break down, since it doesn’t consider the non-uniform cost of the equality.
This might be buggy whip talk, but I wonder if you could take the same system and apply it to smaller problems (e.g. computing an 8-bit hash) so the novel techniques could be identified and used by humans.
One of the examples from another comment[1] here was:
"They found that in a sorting network handling 3 inputs, the AI found a way to save an instruction by reducing a "min(A, B, C)" operation to just "min(A, B)" by taking advantage of the fact that previous operations guaranteed that B <= C."
> proves the point that optimal programs are very inhuman
Maybe there should be an AI that produces optimally-readable/understandable programs? That's what I would want if I was adding the output to a codebase.
Optimal routing of delivery vehicles for UPS/Fedex etc is also non-compressible to the drivers, so the planners often intentionally generate suboptimal solutions.
A suboptimal implemented solution is a better than an optimal not implemented one.
It’s not that is written like obfuscated, the routine/ algo is just hard to understand even if they commented every line. Likely some recursive trick is involved, those are always hard to follow
that's what LLMs can with rl from human (or ai) readability feedback & instruction tuning + prompting. we will 100% see this if gpt-4 doesn't already do this.
Thought: optimizing JIT compilers like V8 already observe code behavior and use that to choose how to optimize things as they go. I wonder if one day V8 will incorporate some ML to help study and optimize the running code
This doesn't really sound like SA, which is a principled approach to avoid the local min problem in descent algorithms and end up with a true global optimization. The general result is clean and impractical, the approximations hard to analyze.
I haven’t read the paper, but it sounds like this is a sort of similar approach to genetic algorithms? Create lots of agents with random changes and pick the ones that produce the most promising results? Is my impression wrong?
If more compute is spent than before but in parallel to reduce latency then wouldn't this increase power? Shouldn't latency and power both be objective?
Some very cool improvements found in already highly optimized algorithms.
They found that in a sorting network handling 3 inputs, the AI found a way to save an instruction by reducing a "min(A, B, C)" operation to just "min(A, B)" by taking advantage of the fact that previous operations guaranteed that B <= C. They also found that in a sorting network handling 4 inputs, when it is part of a "sort 8" network, there are similar guarantees (D >= min(A, C) in this case) that can be taken advantage of to remove an instruction as well. The compiler may not be able to compete with hand-written assembly, but it seems an AI can hand-write assembly code that's even better in some cases.
Another improvement is also very cool. They discuss VarSort4, which takes a list that may be 2, 3, or 4 elements long, and sorts it. The existing algorithm is
The AI found an algorithm that looks totally different:
if (len = 2) {
sort2
}
else {
sort3
if (len = 4) {
sortspecial
}
}
It's pretty wild! It immediately runs Sort3 on something that may be either 3 elements or 4 elements long, and only afterwards does it check to see how long it really is. If it's 3 elements long, we're done; otherwise, run Sort4 - but because (having already run Sort3) you know the first three elements are sorted, you can use a special and much simpler algorithm to simply put the 4th element in the right spot.
Very cool. Improving on core LLVM sorting algorithms that have already been heavily hand-optimized by the best in the world is definitely a "it's 1997 and Deep Blue defeats the World Champion in chess" kind of feeling.
Interesting that it took AI to pull this off. I think mutagen testing would have discovered the first (by virtue of checking which code isn’t necessary globally) though likely not the second (which needs a new branch, unless that branch was already there?).
As I understand it it kind of _did that_, just in an extremely guided kind of way, which is why it produced results in some reasonable amount of time (probably)
I'm surprised they only report results up to sort5.
It seems at that level, you could just iterate through all possible programs.
AI generated code seems more interested once you get to 10+ values, where classical methods break down.
I believe that's because above 5 elements, fixed sorting networks are no longer used. Introsort takes over and dispatches to insertion sort, quicksort, or heap sort as appropriate.
Divide and conquer strategies are used for larger sorts, and the smaller arrays could include the fixed lengths 3, 4, 5.
“ The compiler may not be able to compete with hand-written assembly, but it seems an AI can hand-write assembly code that's even better in some cases.”
This made me think “imagine if AI was the compiler”, that is to say you went from C or whatever to assembly via AI directly so it was “hand writing” the equivalent assembly for your instructions instead of using generic compilation.
> AlphaDev uncovered new sorting algorithms that led to improvements in the LLVM libc++ sorting library that were up to 70% faster for shorter sequences and about 1.7% faster for sequences exceeding 250,000 elements.
As someone that knows a thing or two about sorting... bullshit. No new algorithms were uncovered, and the work here did not lead to the claimed improvements.
They found a sequence of assembly that saves... one MOV. That's it. And it's not even novel, it's simply an unrolled insertion sort on three elements. That their patch for libc++ is 70% faster for small inputs is only due to the library not having an efficient implementation with a *branchless* sorting network beforehand. Those are not novel either, they already exist, made by humans.
> By open sourcing our new sorting algorithms in the main C++ library, millions of developers and companies around the world now use it on AI applications across industries from cloud computing and online shopping to supply chain management. This is the first change to this part of the sorting library in over a decade and the first time an algorithm designed through reinforcement learning has been added to this library. We see this as an important stepping stone for using AI to optimise the world’s code, one algorithm at a time.
I'm happy for the researchers that the reinforcement learning approach worked, and that it gave good code. But the paper and surrounding press release is self-aggrandizing in both its results and impact. That this is the first change to 'this part' of the sorting routine in a decade is also just completely cherry-picked. For example, I would say that my 2014 report and (ignored patch of) the fact that the libc++ sorting routine was QUADRATIC (https://bugs.llvm.org/show_bug.cgi?id=20837) finally being fixed late 2021 https://reviews.llvm.org/D113413 is quite the notable change. If anything it shows that there wasn't a particularly active development schedule on the libc++ sorting routine the past decade.
I feel like your take is overly cynical. The fact that humans can do the same thing by hand is not really the point. The contribution lies in the fact that their method derived this improvement *automatically*, which is where the impact lies. No one cares all that much if a human can make a sorting routine 2% faster, but if a program can do it, it suddenly becomes interesting (since it suggests that a similar approach can be applied to many other routines).
I am not cynical about the research itself, I am critical of claims such as "new sorting algorithm uncovered", "up to 70% faster", or "first change in a decade". The research is good. The achieved results are massively inflated.
What they achieved: automatically generated good code.
What they claim: automatically generated code that is revolutionary and an improvement on the state of the art.
There's also automatic searching being done on faster sorting networks that actually recently produced better than state of the art sorting networks: https://github.com/bertdobbelaere/SorterHunter
i don't read it as cynical. It's fair game to call bullshit on bullshit. If an approach exists and is known the "insert your favorite AI here" does not discover anything.
What's surprising is that anyone would've expected an AI to come up with a brand-new algorithm with better complexity than pre-exsiting human-made solutions.
How could it possibly come up with something better when it doesn't even understand how the original authors of qsort/mergesort/etc came up with their own..
Sure, it's great PR for the company, but.. the results just aren't there.
This is DeepMind's modus operandi. Every press release is just utterly hyperbolic nonsense that doesn't withstand the slightest scrutiny. AlphaGo, AlphaFold, AlphaDev... they've done literally nothing to improve the human condition, and may have actually made things worse. Go players have DECREASED their Elo after playing AlphaGo (or just quit the game altogether). I would be embarrassed to be associated with DeepMind.
This is unbelievably wrong. Deepmind has probably the best academic group in RL. The difference between Deepmind and OpenAI is that Deepmind favors academic endeavours and novelty much more, while completely ignoring any commercialization or products, while OpenAI is the stark opposite in that they almost entirely focus on products first, and typically their academic endeavours are just slight modifications of other works scaled up.
Don't get me wrong, Sutskyver (et al) has done incredibly good work previously, but when it comes to the products, they're much more engineering and marketing polish than scientific endeavours.
Botvinick's work on metaRL for example is an interesting direction that Deepmind has shown that few other companies that are only interested in engineering would venture towards.
Around the time of the AlphaGo challenge and afterwards...
1) you could see increased activity in go clubs and online go servers
2) the analysis of the games published by Deepmind has resulted in interesting "discoveries" (or rediscoveries) and changes to what is considered joseki.
3) many people started analyzing their kifus using AI, to find fluctuations in estimated win rate across moves.
I thought the same thing - it smacks of desperation at the moment, any tiny win is exaggerated .
It’s not hard to see why, with the emergence (ha) of OpenAI, Midjourney and all of this generative modelling, what has DeepMind done? I imagine the execs at Google are asking them some very probing questions on their mediocre performance over the last 5 years.
A decade ago Google had people with breadth and context who could have adjusted the framing of the result of this work (and maybe even re-targeted it yo something useful). Today however Google is a mix of hyper-narrow expert ICs and leaders who lack domain expertise. Paper in Nature? Sure let’s take it!
The LLVM implementation of libc++ is very young compared to other implementations (it was started 5 years ago or so), so there is still a lot of things missing and a lot of things that can be improved.
The claim "faster sorting algorithm" is wrong.
They have to show the time-complexity. They have to prove that their algorithm is faster than linear sorting algorithms. Otherwise, they have to accept that their claim is wrong.
I get your sentiment but note that discovering a new algorithm doesn't have to imply a better time complexity. Bubble Sort and Insertion Sort have the same time complexity but are different algorithms.
What you're saying is a certain perspective which seeks to look at it from first principles.
To the brutalist, the algorithm change is faster, and thats all that matters. A human didn't previously come up with the optimisation. You might as well say a computer sorting an algorithm is bullshit vs a person because the difference is just a bloated chip does it instead, and thats it.
The title implies it found an entirely new algorithmic approach to sorting (like quick sort) which would have been a fantastic discovery. But it feels a lot like micro-optimizing the control flow and codegen.
I'm not sure there is a new algorithmic approach to sorting like you're thinking of. From a high level you can divide sorting algorithms into roughly two camps, "those that use divide-and-conquer", and "those that do not". The divide-and-conquer (split into smaller sub-lists, sort the sub-lists, and merge them while preserving sort order) algorithms are better. From this perspective, algorithms that we think of as quite distinct like quicksort and mergesort are not that different - quicksort just does the "divide" step of divide-and-conquer in a smarter way.
In the end, no matter what divide-and-conquer sorting algorithm you pick, you will be doing lots of "small sorts". And it is those small sorts that DeepMind has optimized here.
> I'm not sure there is a new algorithmic approach to sorting like you're thinking of. From a high level you can divide sorting algorithms into roughly two camps, "those that use divide-and-conquer", and "those that do not". ...
I think the sci-fi-but-possibly-real hope is that for sorting (among other things), we may have the perspective that there isn't any new algorithmic approach available, but an AI finds one for us.
There's lot of accepted sorting algorithms [1]. I'm sure we can come up with novel new algorithms, even if they're not optimal. Like Wikipedia mentions, they all fall within some small number of higher level categories (eg. Partitioning, Merging, Selection, Insertion). I'm still not convinced that the optimizations presented in the article amount to the discovery of NEW sorting algorithms but merely optimizations of existing ones.
Indeed, it's easy to prove that any sorting algorithm that works by comparing elements (unlike e.g. radix sort) requires Ω(n log n) time in the worst case.
Still a fantastic discovery, because now there are more powerful automated algorithm improvement discovery pipelines. I wonder what will happen when those improvements are added to the processing that found the improvements. :D
Yes, I'm not discounting the results. I think it's a very interesting approach. I just think the language is important here. If you ask a CS class turn in their own quick sort implementation, you have n implementations of 1 algorithm, not n new algorithms.
I think this is still super interesting. It's something humans are unable to do/there's few humans that can. I very much like the pattern of writing basic logic myself and then using a coding model to optimize it. It's effectively what we do with compilers already, this just makes it better.
My first intuition, knowing the limitations of generating first-order logic sequences, was that they must have somehow restricted the sorting sequence to a number of elements.
The most interesting part of this paper to me is that they let the agent guess how efficient it’s own solutions were and only had the model experimentally verify it’s guesses in 0.002% of cases. This allowed the model to search much faster than another program that didn’t guess and had to run every program.
This is the same sort of "more guesses faster beats smarter guesses slower" that made afl-fuzz by far the best at exploring large search spaces in program fuzzing
Fast search often beats accurate search. Sometimes adding clever heuristics or more complex scoring "works" but slows down the search enough that it's an overall loss. Another kind of a bitter lesson, perhaps
But why isn't the proposed method an instance of smart guessing? It reduces oracle complexity with heuristics. The heuristic is "build a machine learning model of the objective function and use it to fake oracle queries most of the time."
This is actually quite common to optimize stuff in several disciplines. You essentially fit a surrogate model (keyword if you want to look up more) to whatever you want to optimize, then use the model to guide the procedure, making sure that the model is correct every one in a while.
I've been wondering about a similar approach for biomolecular simulations, where exact computations are still a hard bottleneck. I wonder if something like this could give us a few orders of magnitude more speed.
Given an executable program, give me human-readable code that generates the executable exactly. With machine learning guessed function, type, variable names...?
I get that it's a very hard problem. But... Does it just not work? Or have people not done it yet? Or have I missed it?
Training datasets should be pretty trivial for this, all open source software that is buildable on the internet could provide training sets (source code & compiled code).
But I guess it would have to be trained specifically for every architechture.
Personally I think this would be very exciting. Currently there are a ton of projects to decompile older games (see the Mario 64 and Ocarina of Time de-compilations and subsequent native PC ports as an example), but those projects are huge and take whole teams years to finish. If you could simply point an AI at a project like that and have usable source code in a reasonable amount of time, it would be huge for those scenes. You would have a sort of defacto-open source moment for just about all software. It feels like the stuff of far off sci-fi.
This actually doesn't sound that difficult, given we can produce a practically infinite training dataset using a compiler with existing programs.
What's more interesting to me would be a model that can automatically port a program between languages, such as converting a C program to a fairly idiomatic Rust program.
I don't understand WebAssembly much other than superficially, but if wasm is going to be a new, harder level obfuscation for websites, making them harder to customize locally, then I hope ML/AI can counteract that and preserve the Web as a user-customizable space.
This was educational because I learned about Sorting Networks.
I didn't hear of them in CS undergrad algorithms, perhaps because they can be thought of as a special case or optimization, rather than fundamental and generic variable-length sort algos.
It's a simple concept, and forms the basis of all the sort optimizations described here.
I was one of the members who reviewed expertly what has been done both in sorting and hashing. Overall it's more about assembly, finding missed compiler optimizations and balancing between correctness and distribution (in hashing in particular).
It was not revolutionary in a sense it hasn't found completely new approaches but converged to something incomprehensible for humans but relatively good for performance which proves the point that optimal programs are very inhuman.
Note that for instructions in sorting, removing them does not always lead to better performance, for example, instructions can run in parallel and the effect can be less profound. Benchmarks can lie and compiler could do something differently when recompiling the sort3 function which was changed.
For hashing it was even funnier, very small strings up to 64 bit already used 3 instructions like add some constant -> multiply 64x64 -> xor upper/lower. For bigger ones the question becomes more complicated, that's why 9-16 was a better spot and it simplified from 2 multiplications to just one and a rotation. Distribution on real workloads was good, it almost passed smhasher and we decided it was good enough to try out in prod. We did not rollback as you can see from abseil :)
But even given all that, it was fascinating to watch how this system was searching and was able to find particular programs can be further simplified. Kudos to everyone involved, it's a great incremental change that can bring more results in the future.
https://courses.cs.washington.edu/courses/cse501/15sp/papers...
For the other 99.999% of hashing applications there is a balance between collision resistance and hashing latency. For example, in a hash table (probably the most common use for a non-cryptographic hash function) there is a cost incurred by hash collisions because lookups on keys with collisions may have to do extra probing. On the other hand, every hash table lookup requires doing at least one hash operation, regardless of whether or not it collides. So it may make sense to have a slightly worse hash function (in the sense that it is more likely to have collisions with pathological inputs) if it has slightly lower latency. The only way to really know what is faster for a real world application is to have some kind of benchmark to train against as a loss function.
It's an instance of overfitting.
I'm very confused as to why rotation was at all useful. Xoring with a random-ish constant makes sense, because the constant has high entropy and is likely to decorrelate bits from the input (also can use a different constant per hash table). But rotating by a constant—and a fixed one at that—seems like it just accounts for expected input distribution. Especially (assuming this is intended for text) if shifting by a value >8 makes a significant difference (vs shifting by the same value mod 8), it smells like serious overfit. Could be useful for something like a perfect hash, but seems problematic and prone to issues as a general hash.
Edit: to make my objection clearer: the hash simply replaces lo with rotr(lo, 53). If rotr(lo, 53) performs significantly better than rotr(lo, 53 mod 8), then that implies the following. I can take a set of strings, and I can apply the same permutation to the characters of all of the strings in the set, and this will significantly affect the quality of the hash function. That seems like an undesirable property, even for a non-cryptographic hash.
This might be buggy whip talk, but I wonder if you could take the same system and apply it to smaller problems (e.g. computing an 8-bit hash) so the novel techniques could be identified and used by humans.
"They found that in a sorting network handling 3 inputs, the AI found a way to save an instruction by reducing a "min(A, B, C)" operation to just "min(A, B)" by taking advantage of the fact that previous operations guaranteed that B <= C."
Which isn't incomprehensible at all.
[1] https://news.ycombinator.com/item?id=36229068
Maybe there should be an AI that produces optimally-readable/understandable programs? That's what I would want if I was adding the output to a codebase.
A suboptimal implemented solution is a better than an optimal not implemented one.
sounds revolutionary
They found that in a sorting network handling 3 inputs, the AI found a way to save an instruction by reducing a "min(A, B, C)" operation to just "min(A, B)" by taking advantage of the fact that previous operations guaranteed that B <= C. They also found that in a sorting network handling 4 inputs, when it is part of a "sort 8" network, there are similar guarantees (D >= min(A, C) in this case) that can be taken advantage of to remove an instruction as well. The compiler may not be able to compete with hand-written assembly, but it seems an AI can hand-write assembly code that's even better in some cases.
Another improvement is also very cool. They discuss VarSort4, which takes a list that may be 2, 3, or 4 elements long, and sorts it. The existing algorithm is
The AI found an algorithm that looks totally different: It's pretty wild! It immediately runs Sort3 on something that may be either 3 elements or 4 elements long, and only afterwards does it check to see how long it really is. If it's 3 elements long, we're done; otherwise, run Sort4 - but because (having already run Sort3) you know the first three elements are sorted, you can use a special and much simpler algorithm to simply put the 4th element in the right spot.Very cool. Improving on core LLVM sorting algorithms that have already been heavily hand-optimized by the best in the world is definitely a "it's 1997 and Deep Blue defeats the World Champion in chess" kind of feeling.
Divide and conquer strategies are used for larger sorts, and the smaller arrays could include the fixed lengths 3, 4, 5.
This made me think “imagine if AI was the compiler”, that is to say you went from C or whatever to assembly via AI directly so it was “hand writing” the equivalent assembly for your instructions instead of using generic compilation.
We might find everything runs much faster.
As someone that knows a thing or two about sorting... bullshit. No new algorithms were uncovered, and the work here did not lead to the claimed improvements.
They found a sequence of assembly that saves... one MOV. That's it. And it's not even novel, it's simply an unrolled insertion sort on three elements. That their patch for libc++ is 70% faster for small inputs is only due to the library not having an efficient implementation with a *branchless* sorting network beforehand. Those are not novel either, they already exist, made by humans.
> By open sourcing our new sorting algorithms in the main C++ library, millions of developers and companies around the world now use it on AI applications across industries from cloud computing and online shopping to supply chain management. This is the first change to this part of the sorting library in over a decade and the first time an algorithm designed through reinforcement learning has been added to this library. We see this as an important stepping stone for using AI to optimise the world’s code, one algorithm at a time.
I'm happy for the researchers that the reinforcement learning approach worked, and that it gave good code. But the paper and surrounding press release is self-aggrandizing in both its results and impact. That this is the first change to 'this part' of the sorting routine in a decade is also just completely cherry-picked. For example, I would say that my 2014 report and (ignored patch of) the fact that the libc++ sorting routine was QUADRATIC (https://bugs.llvm.org/show_bug.cgi?id=20837) finally being fixed late 2021 https://reviews.llvm.org/D113413 is quite the notable change. If anything it shows that there wasn't a particularly active development schedule on the libc++ sorting routine the past decade.
It's also worth noting that the paper is blindingly obvious and everyone started doing this a long time ago but didn't want to tip their cards.
And that's the real contribution here - Google is tipping their cards. We now have a rough baseline to compare our results against.
What they achieved: automatically generated good code.
What they claim: automatically generated code that is revolutionary and an improvement on the state of the art.
And as another commenter noted, superoptimizers are also already a thing: https://en.wikipedia.org/wiki/Superoptimization
There's also automatic searching being done on faster sorting networks that actually recently produced better than state of the art sorting networks: https://github.com/bertdobbelaere/SorterHunter
Sure, it's great PR for the company, but.. the results just aren't there.
RL doesn’t stop at human levels
Don't get me wrong, Sutskyver (et al) has done incredibly good work previously, but when it comes to the products, they're much more engineering and marketing polish than scientific endeavours.
Botvinick's work on metaRL for example is an interesting direction that Deepmind has shown that few other companies that are only interested in engineering would venture towards.
1) you could see increased activity in go clubs and online go servers
2) the analysis of the games published by Deepmind has resulted in interesting "discoveries" (or rediscoveries) and changes to what is considered joseki.
3) many people started analyzing their kifus using AI, to find fluctuations in estimated win rate across moves.
So I disagree entirely
Can you explain this for someone unfamiliar with the game?
It’s not hard to see why, with the emergence (ha) of OpenAI, Midjourney and all of this generative modelling, what has DeepMind done? I imagine the execs at Google are asking them some very probing questions on their mediocre performance over the last 5 years.
Go players are using AI to get better at Go.
They eliminated a register-register mov.
Dead Comment
To the brutalist, the algorithm change is faster, and thats all that matters. A human didn't previously come up with the optimisation. You might as well say a computer sorting an algorithm is bullshit vs a person because the difference is just a bloated chip does it instead, and thats it.
In the end, no matter what divide-and-conquer sorting algorithm you pick, you will be doing lots of "small sorts". And it is those small sorts that DeepMind has optimized here.
I think the sci-fi-but-possibly-real hope is that for sorting (among other things), we may have the perspective that there isn't any new algorithmic approach available, but an AI finds one for us.
[1] https://en.wikipedia.org/wiki/Sorting_algorithm
Benchmark seems to be in the range of a 1-5% improvement for 80% of sizes.
Fast search often beats accurate search. Sometimes adding clever heuristics or more complex scoring "works" but slows down the search enough that it's an overall loss. Another kind of a bitter lesson, perhaps
Where are the machine learning de-compilers?
Given an executable program, give me human-readable code that generates the executable exactly. With machine learning guessed function, type, variable names...?
I get that it's a very hard problem. But... Does it just not work? Or have people not done it yet? Or have I missed it?
Training datasets should be pretty trivial for this, all open source software that is buildable on the internet could provide training sets (source code & compiled code).
But I guess it would have to be trained specifically for every architechture.
What's more interesting to me would be a model that can automatically port a program between languages, such as converting a C program to a fairly idiomatic Rust program.
I didn't hear of them in CS undergrad algorithms, perhaps because they can be thought of as a special case or optimization, rather than fundamental and generic variable-length sort algos.
It's a simple concept, and forms the basis of all the sort optimizations described here.
https://en.wikipedia.org/wiki/Sorting_network