Your other points are a good start. The main thing I would add is a floor on salary. H1B for a >$200k job makes some sense, it shows it's essential, the employer really wants to fill it and is having a hard time finding a US citizen. H1B for average or below average salaries is where the real abuse is. It's basically a form of indentured servitude.
> By disassembly of ptxas, it is indeed hard-coded that they have logic like: strstr(kernel_name, "cutlass").
> it is likely that, this is an unstable, experimental, aggressive optimization by NVIDIA, and blindly always enabling it may produce some elusive bugs.
An optimization with a universal >=0 speedup across your entire suite of tests is a really hard thing to come by. Something is always going to have a negative speedup.
My experience is with non-Nvidia GPU systems, but this feels like a familiar situation. They probably found something that has great outcomes for one set of kernels, terrible outcomes for another, and no known reliable heuristic or modeling they could use to automatically choose.