My intuition for why BOLT works is that:
- If you try to profile an unoptimized (or even insufficiently optimized) binary, you don't get accurate profiling because the timings are different.
- If you try to profile an optimized binary and then rerun the compiler from source using that profiling data, then you'll have a bad time mapping the profiler's observations back to what the source looked like. This is because the compiler pipeline will have done many transforms - totally changing control flow layout in some cases - that make some of the profiling meaningless when you try to inject it before those optimizations happened.
But BOLT injects the profiling data into the code exactly as it was at time of profiling, i.e. the binary itself.
It's totally insane, wacky, and super fucking cool - these folks should be hella proud of themselves.
But what you just described sounds awesome!(and crazy)