I had to use a very crude form of superoptimization for an assembly course this summer. The goal was to create a fast to upper function for ASCII.
I was already familiar with the method to check for zero (with chars being packed in a word):
( x - 0x010101010x01010101UL) & ~(x) & 0x8080808080808080UL
so my idea was to find an equivalent formula that puts 0x80 in every byte where there is a value between 0 - 25, shift it by 2, and xor it with the original word.
I simply tired every combination of simple operators and repeating-constants and found (0x9999999999999999UL - x) & (~x) & 0x8080808080808080UL
Back in the day, "stochastic superoptimization" would be a reasonable description for an application of genetic programming. Guess I will have to read the paper to understand what's new.
Yes, genetic programming arguably goes back to Turing which is way back in the day, but the term superoptimizer goes back to Henry Masselin's 1987 paper, Superoptimizer - a look at the smallest program.
Superoptimization is a methodology for building an optimizer. When you run a superoptimizer it’s “just” doing peephole optimization. Although, it’s similar to comparing bogo sort (regular opt) to sorting networks (super opt).
I haven't looked at it in detail, but from what I gathered from the README, it just seems like a very simple form of linear genetic programming (with only mutation and no crossover).
Yes. "STOKE can be used in many different scenarios, such as optimizing code for performance or size". Note that this is the 4th sentence in the linked page. However the README hints that STOKE is unlikely to improve much over gcc or llvm optimizations such as -Os / -Oz for binary size.
I guess it depends on the optimization function? However, I suspect to avoid loops there will be loop unrolling and to avoid interprocedural analyses (which I believe is beyond the scope of Stoke) it may use inlining, which may increase the size of the code.
This is extremely intesting domain. I feel that even more interesting topic than optimizing code would be in generating code to replicate a process based on input / output samples.
This is, in essence, a monkey trying to lower a cost function by randomly manipulating an input AST (empty for synthesis). So theoretically it will be able to produce any program, just not in a reasonable time frame.
How do you reason about correctness when stoke changes your code? Does it rely on invariance of test cases or does it try to 'understand' the intent of the program?
That's not a terrible goal if you want to make key libraries faster and more efficient.
But I'm unconvinced that it would be easy to scale up these techniques to auto-clone and improve a big application like Photoshop or Word - or even a generic CRUD app.
I guess this is what the second paragraph is about.
"In addition to searching over programs, STOKE contains verification infrastructure to show the equivalence between x86-64 programs. STOKE can consider test-cases, perform bounded verification all the way to fully formal verification that shows the equivalence for all possible inputs."
Program equivalence is in general uncomputable if I'm not mistaken, so a big part of the possible performance enhancements depend on the strength of their algorithm for proving equivalence.
Definitely an interesting though extremely challenging problem.
I was already familiar with the method to check for zero (with chars being packed in a word): ( x - 0x010101010x01010101UL) & ~(x) & 0x8080808080808080UL
so my idea was to find an equivalent formula that puts 0x80 in every byte where there is a value between 0 - 25, shift it by 2, and xor it with the original word.
I simply tired every combination of simple operators and repeating-constants and found (0x9999999999999999UL - x) & (~x) & 0x8080808080808080UL
https://web.stanford.edu/class/cs343/resources/cs343-annot-s...
Re: "smallest program" I will always think of Tom Ray's Tierra. Went to a demo of his in 1991, mind-blowing.
https://www.cc.gatech.edu/~turk/bio_sim/articles/tierra_thom...
GP is certainly on-topic here! People are using GP to do this kind of thing too. But I think this system is not using GP.
1, 2, 4, ?
Deleted Comment
That's not a terrible goal if you want to make key libraries faster and more efficient.
But I'm unconvinced that it would be easy to scale up these techniques to auto-clone and improve a big application like Photoshop or Word - or even a generic CRUD app.
"In addition to searching over programs, STOKE contains verification infrastructure to show the equivalence between x86-64 programs. STOKE can consider test-cases, perform bounded verification all the way to fully formal verification that shows the equivalence for all possible inputs."
Definitely an interesting though extremely challenging problem.
My thinking is to provide hand-crafted primitives to transform byte code, and maybe use an existing optimizer to do the search.
https://github.com/google/souper