After learning about it, I went on to replace the UCT formula in MCTS with it and the results were... not much better, actually. But it made me understand both a little better.
Thompson Sampling, a.k.a. Bayesian Bandits, is a powerful method for runtime performance optimization. We use it in ClickHouse to optimize compression and to choose between different instruction sets: https://clickhouse.com/blog/lz4-compression-in-clickhouse
This is great. I remember finding another really good resource on the Bernoulli bandit that was interactive. Putting feelers out there to see if anyone knows what I’m talking about off the top of their heads.
You take the action which you computed to be optimal under the hypothetical of your posterior sample; this then yields a new observation. You add that to the dataset, and train a new NN.
After learning about it, I went on to replace the UCT formula in MCTS with it and the results were... not much better, actually. But it made me understand both a little better.
https://iosband.github.io/2015/07/19/Efficient-experimentati...
It can also learn how different variants perform in different contexts.