My first superoptimizer

Happy to see this! Superoptimizers are one of my all time favorite topics.

My favorite trick when cutting down the search space of possible programs is to generate DAGs directly instead of an instruction sequence. This way, it's easier to write rules that avoid nonsensical programs and to some extent avoid generating multiple equivalent programs.

For example to avoid programs that overwrite calculations you make sure that each node in the generated DAG has something that depends on it.

In a suitable environment the DAG can then be executed directly without lowering to a sequence of instructions.

shoo · 2 years ago

This sounds a little bit like representing boolean functions as binary decision diagrams -- although for a superoptimizer presumably each node in the DAG would correspond to an instruction from the target instruction set, which might do something rather complicated, not just a primitive logical operation between a few bits.

https://en.wikipedia.org/wiki/Binary_decision_diagram

(i first read about BDDs in Knuth's, "The Art of Computer Programming", Volume 4, Fascicle 1, then spent an enjoyable few weeks going down a rabbit hole of building a boolean function to decide if an input 2d grid of bits was a maze or not, then uniformly sampling mazes from it...)

Jetbrains was hiring mathematicians working on homotopy type theory to do something similar if I recall correctly. The idea is that HTT can be used to generate equivalent programs to that which is in your buffer and do something like superoptimization. Maybe I am hallucinating.

lambdaxymox · 2 years ago

I don't know about superoptimization, but JetBrains does have a HoTT-based language and proof assistant called Arend for doing mathematics in.

bmc7505 · 2 years ago

Is superoptimization related to supercompilation, or are these unrelated compiler optimization techniques? From what I have read, it seems like Turchin [1] et al. are trying to prove equality using some sort of algebraic rewriting system (i.e., the "intensional" approach), whereas superoptimization uses an extensional technique to first try proving disequality by testing on a small set of inputs, then applies formal verification to the remaining candidates.

Massalin (1987) [2] calls the first phase "probabilistic execution" and claims that nearly all of the functions which pass the PE test also pass the more rigorous Boolean verification test. Can you give any insight into the benefits of TT over more "automated" optimizations? I am curious if MLTT/HoTT is more suitable for certain compiler optimizations or offers additional expressive power for proving equivalence, or is the benefit mostly ergonomics?

[1]: https://dl.acm.org/doi/pdf/10.1145/5956.5957

[2]: https://dl.acm.org/doi/pdf/10.1145/36177.36194

SideQuark · 2 years ago

Better than brute force is using a model of your instruction set (x86 start here [1]), then using something like Z3 [] to find solutions. Here's a paper doing the same [2]. With these approaches you can get vastly bigger pieces of code generated than brute force, and with significantly less hand heuristic tuning effort (which likely still loses out to current solvers).

[1] https://github.com/zwegner/x86-sat

[2] https://people.cs.umass.edu/~aabhinav/Publications/Unbounded...

malft · 2 years ago

If you are familiar with that paper, I'm a little confused by the results section. If I'm reading p01 from figure 7 correctly, the "before" is x&(x-1) and the "after" is a single instruction x&(x<<1). What am I doing wrong?

(The other two wins listed are "recognize the Hacker's Delight trick and replace it with the straight-forward implementation a human would write", which I like.)

mungoman2 · 2 years ago

petercooper · 2 years ago

The 1987 paper about superoptimizers is well worth the read, BTW. It's quite short. Even the first example on its own (writing the sign function without any conditionals) is a mind-bender but becomes straightforward once you see the trick used: https://courses.cs.washington.edu/courses/cse501/15sp/papers...

WalterBright · 2 years ago

The cool thing about that paper is compiler implementers immediately added those optimizations to their code generators.

moonchild · 2 years ago

By 'those optimisations' I assume you mean the specific rewrites, not superoptimisation? There has been some interesting recent work in rewrite inference, e.g. ruler. (I recall one such project did a survey and found that many—possibly a majority—of the rewrites it had generated automatically were not present in the hand-written corpuses of gcc and clang. I'm not sure if there was subsequent work to supplement gcc and clang; this was a few years ago now; I think they were looking at gcc 7.)

pfdietz · 2 years ago

Alexia Massalin's thesis work on Synthesis was also cool.

manasij7479 · 2 years ago

The pruning part can do a lot of heavy lifting to make it a practical tool.

Related: I work on Souper (https://github.com/google/souper).

Feel free to reach out if anyone has questions!

viraptor · 2 years ago

If anyone's interested in non-toy usage, there's the souper project as well https://github.com/google/souper

It's a shame it's a bit hard to actually build these days. Its target is people working on compilers, not end users. But there are some odd cases where I'd really like to use it myself, like trying to get speedup in llama.cpp.

fooker · 2 years ago

That's an interesting idea.

Are llama.cpp (and similar targets) made into standalone C++ source files without dependencies doing the heavy lifting?

HanClinto · 2 years ago

Yes, that's the idea of the project.

One can now compile against cuBLAS and other GPU libraries to help take some of the load, but the core of the project is indeed to run on the CPU and do all of the work without dependencies.

mhh__ · 2 years ago

Is llama performance not basically bound by the memory hierarchy?

ipnon · 2 years ago

inkyoto · 2 years ago

There is (long abandoned tho) a GNU superoptimiser that supports multiple hardware architectures including HP PA-RISC, Motorola 68k and 88k, Intel i960 amongst others – https://www.gnu.org/software/superopt/

It is based on the exhaustive search with backtracking, kind of what Prolog is best for, and given a function that takes an input vector of integers, it attempts to find the most optimal instruction sequence that produces an output integer.

divan · 2 years ago

How about "cognitive superoptimizer" – find the optimal version of the code for human comprehension?

First try - get every permutation, feed to some stable code-oriented LLM, ask to explain code and measure number of words.