Deepmind Alphadev: Faster sorting algorithms discovered using deep RL

You can see hashing optimizations as well https://www.deepmind.com/blog/alphadev-discovers-faster-sort..., https://github.com/abseil/abseil-cpp/commit/74eee2aff683cc7d...

I was one of the members who reviewed expertly what has been done both in sorting and hashing. Overall it's more about assembly, finding missed compiler optimizations and balancing between correctness and distribution (in hashing in particular).

It was not revolutionary in a sense it hasn't found completely new approaches but converged to something incomprehensible for humans but relatively good for performance which proves the point that optimal programs are very inhuman.

Note that for instructions in sorting, removing them does not always lead to better performance, for example, instructions can run in parallel and the effect can be less profound. Benchmarks can lie and compiler could do something differently when recompiling the sort3 function which was changed.

For hashing it was even funnier, very small strings up to 64 bit already used 3 instructions like add some constant -> multiply 64x64 -> xor upper/lower. For bigger ones the question becomes more complicated, that's why 9-16 was a better spot and it simplified from 2 multiplications to just one and a rotation. Distribution on real workloads was good, it almost passed smhasher and we decided it was good enough to try out in prod. We did not rollback as you can see from abseil :)

But even given all that, it was fascinating to watch how this system was searching and was able to find particular programs can be further simplified. Kudos to everyone involved, it's a great incremental change that can bring more results in the future.

CalChris · 2 years ago

This sounds like a more intelligent version of superoptimization. The original Masselin SO, albeit for its time, also created surprising results which is similar to AlphaDev's incomprehensible for humans. You see the same thing in computer chess which Agadmator calls disgusting engine lines.

https://courses.cs.washington.edu/courses/cse501/15sp/papers...

mtlmtlmtlmtl · 2 years ago

Stockfish actually use machine learning to pick the best magic numbers for their code.

thomasahle · 2 years ago

I'm disappointed a the hashing is just based on training on microbenchmarks and SMHasher, rather than designing a fast _provably_ universal hash. Suites like SMHasher are never complete. They are just trying to catch the most common weaknesses. If you train on the test cases you'll only get an algorithm that passes the tests, but people can always find a set of values on which you will do badly.

eklitzke · 2 years ago

I think you're confusing proving that the hash function is collision resistant with the other goal which is hashing speed. If you really need a collision resistant hash you need to use a cryptographic hash function, but outside of cryptographic applications that is rarely the requirement. And (huge caveat, this isn't my domain expertise) I'm not sure what security properties are really "proven" about existing cryptographic hash functions, AFAIK existing cryptographic hash functions are considered secure because we don't know how to break them, not because of some fundamental mathematical property about them.

For the other 99.999% of hashing applications there is a balance between collision resistance and hashing latency. For example, in a hash table (probably the most common use for a non-cryptographic hash function) there is a cost incurred by hash collisions because lookups on keys with collisions may have to do extra probing. On the other hand, every hash table lookup requires doing at least one hash operation, regardless of whether or not it collides. So it may make sense to have a slightly worse hash function (in the sense that it is more likely to have collisions with pathological inputs) if it has slightly lower latency. The only way to really know what is faster for a real world application is to have some kind of benchmark to train against as a loss function.

jacquesm · 2 years ago

Indeed, and this has been the case for quite a while now. You can always improve on some general algorithm by taking advantage of knowledge of the data but that never generalizes and usually leads to either worse performance on other data and/or new pathological cases that result in results that are unusable.

It's an instance of overfitting.

hajile · 2 years ago

To what extent is this simply working around the weirdness of x86? Do these improvements apply to something like MIPS, ARM64, or RISC-V that have inherently simpler ISAs?

danlark · 2 years ago

In this particular case they were universal but in paper it's said the optimizations were done on x86. One of the ideas was to use LLVM IR but intuition for optimizer over optimizer was unlikely to work properly.

moonchild · 2 years ago

> 9-16 was a better spot and it simplified from 2 multiplications to just one and a rotation

I'm very confused as to why rotation was at all useful. Xoring with a random-ish constant makes sense, because the constant has high entropy and is likely to decorrelate bits from the input (also can use a different constant per hash table). But rotating by a constant—and a fixed one at that—seems like it just accounts for expected input distribution. Especially (assuming this is intended for text) if shifting by a value >8 makes a significant difference (vs shifting by the same value mod 8), it smells like serious overfit. Could be useful for something like a perfect hash, but seems problematic and prone to issues as a general hash.

Edit: to make my objection clearer: the hash simply replaces lo with rotr(lo, 53). If rotr(lo, 53) performs significantly better than rotr(lo, 53 mod 8), then that implies the following. I can take a set of strings, and I can apply the same permutation to the characters of all of the strings in the set, and this will significantly affect the quality of the hash function. That seems like an undesirable property, even for a non-cryptographic hash.

manwe150 · 2 years ago

Does that end up moving high bits into the low bits? That could possibly be very helpful for all sets of strings, since the equality test will start with the first byte, so it could better to have the rotr on the hash so that the hash is less affected by the first byte (and more affected by later bytes). Just hypothetically speaking that is where the implication could break down, since it doesn’t consider the non-uniform cost of the equality.

13of40 · 2 years ago

> something incomprehensible for humans

This might be buggy whip talk, but I wonder if you could take the same system and apply it to smaller problems (e.g. computing an 8-bit hash) so the novel techniques could be identified and used by humans.

tyingq · 2 years ago

One of the examples from another comment[1] here was:

"They found that in a sorting network handling 3 inputs, the AI found a way to save an instruction by reducing a "min(A, B, C)" operation to just "min(A, B)" by taking advantage of the fact that previous operations guaranteed that B <= C."

Which isn't incomprehensible at all.

[1] https://news.ycombinator.com/item?id=36229068

cabalamat · 2 years ago

> proves the point that optimal programs are very inhuman

Maybe there should be an AI that produces optimally-readable/understandable programs? That's what I would want if I was adding the output to a codebase.

whatever1 · 2 years ago

Optimal routing of delivery vehicles for UPS/Fedex etc is also non-compressible to the drivers, so the planners often intentionally generate suboptimal solutions.

A suboptimal implemented solution is a better than an optimal not implemented one.

m3kw9 · 2 years ago

It’s not that is written like obfuscated, the routine/ algo is just hard to understand even if they commented every line. Likely some recursive trick is involved, those are always hard to follow

make3 · 2 years ago

that's what LLMs can with rl from human (or ai) readability feedback & instruction tuning + prompting. we will 100% see this if gpt-4 doesn't already do this.

brundolf · 2 years ago

Thought: optimizing JIT compilers like V8 already observe code behavior and use that to choose how to optimize things as they go. I wonder if one day V8 will incorporate some ML to help study and optimize the running code

gumby · 2 years ago

Your description makes the approach sound like applying ca 1980s simulated annealing, also a form of gradient descent. Am I missing something?

ska · 2 years ago

This doesn't really sound like SA, which is a principled approach to avoid the local min problem in descent algorithms and end up with a true global optimization. The general result is clean and impractical, the approximations hard to analyze.

thesz · 2 years ago

So it works as a superoptimizer of sorts.

ks84 · 2 years ago

I haven’t read the paper, but it sounds like this is a sort of similar approach to genetic algorithms? Create lots of agents with random changes and pick the ones that produce the most promising results? Is my impression wrong?

sytelus · 2 years ago

If more compute is spent than before but in parallel to reduce latency then wouldn't this increase power? Shouldn't latency and power both be objective?

lukeplato · 2 years ago

> optimal programs are very inhuman

sounds revolutionary

> AlphaDev uncovered new sorting algorithms that led to improvements in the LLVM libc++ sorting library that were up to 70% faster for shorter sequences and about 1.7% faster for sequences exceeding 250,000 elements.

As someone that knows a thing or two about sorting... bullshit. No new algorithms were uncovered, and the work here did not lead to the claimed improvements.

They found a sequence of assembly that saves... one MOV. That's it. And it's not even novel, it's simply an unrolled insertion sort on three elements. That their patch for libc++ is 70% faster for small inputs is only due to the library not having an efficient implementation with a *branchless* sorting network beforehand. Those are not novel either, they already exist, made by humans.

> By open sourcing our new sorting algorithms in the main C++ library, millions of developers and companies around the world now use it on AI applications across industries from cloud computing and online shopping to supply chain management. This is the first change to this part of the sorting library in over a decade and the first time an algorithm designed through reinforcement learning has been added to this library. We see this as an important stepping stone for using AI to optimise the world’s code, one algorithm at a time.

I'm happy for the researchers that the reinforcement learning approach worked, and that it gave good code. But the paper and surrounding press release is self-aggrandizing in both its results and impact. That this is the first change to 'this part' of the sorting routine in a decade is also just completely cherry-picked. For example, I would say that my 2014 report and (ignored patch of) the fact that the libc++ sorting routine was QUADRATIC (https://bugs.llvm.org/show_bug.cgi?id=20837) finally being fixed late 2021 https://reviews.llvm.org/D113413 is quite the notable change. If anything it shows that there wasn't a particularly active development schedule on the libc++ sorting routine the past decade.

blazespin · 2 years ago

Yeah, this type of grifting is very proforma for Researchers. I used to stress about it, but realize it just sort of goes with the territory.

It's also worth noting that the paper is blindingly obvious and everyone started doing this a long time ago but didn't want to tip their cards.

And that's the real contribution here - Google is tipping their cards. We now have a rough baseline to compare our results against.

aldousd666 · 2 years ago

Well. I feel grifted too because being a developer, yet not into the state of the art in algorithms (at that level,) i was inclined to buy their BS

ryao · 2 years ago

It is likely meant to help secure further research funds.

saiojd · 2 years ago

I feel like your take is overly cynical. The fact that humans can do the same thing by hand is not really the point. The contribution lies in the fact that their method derived this improvement *automatically*, which is where the impact lies. No one cares all that much if a human can make a sorting routine 2% faster, but if a program can do it, it suddenly becomes interesting (since it suggests that a similar approach can be applied to many other routines).

orlp · 2 years ago

I am not cynical about the research itself, I am critical of claims such as "new sorting algorithm uncovered", "up to 70% faster", or "first change in a decade". The research is good. The achieved results are massively inflated.

What they achieved: automatically generated good code.

What they claim: automatically generated code that is revolutionary and an improvement on the state of the art.

And as another commenter noted, superoptimizers are also already a thing: https://en.wikipedia.org/wiki/Superoptimization

There's also automatic searching being done on faster sorting networks that actually recently produced better than state of the art sorting networks: https://github.com/bertdobbelaere/SorterHunter

x86x87 · 2 years ago

i don't read it as cynical. It's fair game to call bullshit on bullshit. If an approach exists and is known the "insert your favorite AI here" does not discover anything.

smeagull · 2 years ago

A thing that is also not novel. People have done search for optimisations for at least the past decade.

mtlmtlmtlmtl · 2 years ago

Automatic tuning and optimisation of code is not new.

wsdookadr · 2 years ago

What's surprising is that anyone would've expected an AI to come up with a brand-new algorithm with better complexity than pre-exsiting human-made solutions. How could it possibly come up with something better when it doesn't even understand how the original authors of qsort/mergesort/etc came up with their own..

Sure, it's great PR for the company, but.. the results just aren't there.

congoe · 2 years ago

How could AlphaZero possibly play better chess than humans when it doesn’t even understand the history of chess theory?

RL doesn’t stop at human levels

tlringer · 2 years ago

Every DeepMind press release is like this.

hugopenmir · 2 years ago

You know people at Google, tell Demis about this.

TaupeRanger · 2 years ago

This is DeepMind's modus operandi. Every press release is just utterly hyperbolic nonsense that doesn't withstand the slightest scrutiny. AlphaGo, AlphaFold, AlphaDev... they've done literally nothing to improve the human condition, and may have actually made things worse. Go players have DECREASED their Elo after playing AlphaGo (or just quit the game altogether). I would be embarrassed to be associated with DeepMind.

chaxor · 2 years ago

This is unbelievably wrong. Deepmind has probably the best academic group in RL. The difference between Deepmind and OpenAI is that Deepmind favors academic endeavours and novelty much more, while completely ignoring any commercialization or products, while OpenAI is the stark opposite in that they almost entirely focus on products first, and typically their academic endeavours are just slight modifications of other works scaled up.

Don't get me wrong, Sutskyver (et al) has done incredibly good work previously, but when it comes to the products, they're much more engineering and marketing polish than scientific endeavours.

Botvinick's work on metaRL for example is an interesting direction that Deepmind has shown that few other companies that are only interested in engineering would venture towards.

rcxdude · 2 years ago

I think this is a bit far in the other direction. Deepmind's stuff is often deeply impressive, they just have a tendency to exaggerate on top of that.

programmer_dude · 2 years ago

BTW Elo is not an abbreviation it is a persons name: https://en.wikipedia.org/wiki/Arpad_Elo

29athrowaway · 2 years ago

Around the time of the AlphaGo challenge and afterwards...

1) you could see increased activity in go clubs and online go servers

2) the analysis of the games published by Deepmind has resulted in interesting "discoveries" (or rediscoveries) and changes to what is considered joseki.

3) many people started analyzing their kifus using AI, to find fluctuations in estimated win rate across moves.

So I disagree entirely

blazespin · 2 years ago

It's a bit surprising how poorly DeepMind has lived up to their hype. But they're an OK lab, maybe a bit overly vain.

vosper · 2 years ago

> Go players have DECREASED their ELO after playing AlphaGo (or just quit the game altogether)

Can you explain this for someone unfamiliar with the game?

chpatrick · 2 years ago

Massive improvements in protein folding do nothing to improve the human condition? What?

malux85 · 2 years ago

I thought the same thing - it smacks of desperation at the moment, any tiny win is exaggerated .

It’s not hard to see why, with the emergence (ha) of OpenAI, Midjourney and all of this generative modelling, what has DeepMind done? I imagine the execs at Google are asking them some very probing questions on their mediocre performance over the last 5 years.

gdy · 2 years ago

"may have actually made things worse. Go players..."

Go players are using AI to get better at Go.

dundarious · 2 years ago

Yes, I believe what they're doing already exists in the literature as "supercompilation", though good to see its application under any name.

choppaface · 2 years ago

A decade ago Google had people with breadth and context who could have adjusted the framing of the result of this work (and maybe even re-targeted it yo something useful). Today however Google is a mix of hyper-narrow expert ICs and leaders who lack domain expertise. Paper in Nature? Sure let’s take it!

scotty79 · 2 years ago

Shouldn't libc++ be kinda good? Since it's the standard and all? Why isn't/wasn't it?

petschge · 2 years ago

The LLVM implementation of libc++ is very young compared to other implementations (it was started 5 years ago or so), so there is still a lot of things missing and a lot of things that can be improved.

usmansk · 2 years ago

Can anyone refute the claim of orlp that it is unrolled insertion sort

usmansk · 2 years ago

The claim "faster sorting algorithm" is wrong. They have to show the time-complexity. They have to prove that their algorithm is faster than linear sorting algorithms. Otherwise, they have to accept that their claim is wrong.

danlark · 2 years ago

I agree on the sorting front. Removing one cmov is not likely to improve much.

orlp · 2 years ago

It's not even a cmov. Look at Figure 3 in the paper: https://www.nature.com/articles/s41586-023-06004-9

They eliminated a register-register mov.

as4296 · 2 years ago

Lol I was about to say that would be incredibly crazy if they found a new sorting algorithm. My time complexity in USACO bout to go crazy.

dataflow · 2 years ago

I get your sentiment but note that discovering a new algorithm doesn't have to imply a better time complexity. Bubble Sort and Insertion Sort have the same time complexity but are different algorithms.

Dead Comment

neximo64 · 2 years ago

What you're saying is a certain perspective which seeks to look at it from first principles.

To the brutalist, the algorithm change is faster, and thats all that matters. A human didn't previously come up with the optimisation. You might as well say a computer sorting an algorithm is bullshit vs a person because the difference is just a bloated chip does it instead, and thats it.