In `libnvidia-nvvm.so` the string `cutlass` appears right after `Memory Dependence Analysis` and `memdep`. Perhaps it acts as an optimization attribute of some sort, where the compiler is allowed to make assumptions about the kernel's behavior that are not valid in general?
The link is long dead and the Wayback machine doesn’t have a copy.
But in 2001 ATI was caught applying optimizations to Quake 3 when someone realized if you renamed the executable from “quake” to “quack” the score dropped a ton. It was a big scandal.
I know that’s common now but that wasn’t a thing that was done at the time.
Seems this is likely due to ongoing work on FP8 support on nvidia/cutlass. From my reading, the alternative code path was likely added recently for testing by external contributors to the cutlass project, and other involved parties. (Rather than attempting to distribute custom packaged internal builds of cuda.)
That's strange because the cutlass docs explicitly does NOT mention fp8 support. So it looks like it can be used nevertheless with fp8 by using the name hack.
I have small experience with compilers and llvm but youd be shocked how many things rely on names and parsing names
If you have hundreds of passes that are complex and rely on various "contracts" like type names or some shit, then really crazy things like this can happen unintentionally and not maliciously
Some names are standardized items, like memcpy. Matching those is ok, nothing sneaky going on there. Matching something vendor-specific in a general-purpose API is different story.
Ooh, I remember this, but actually the thing is older than it.
First, nVidia and ATI used executable names for detecting games, then they started to add heuristics.
If you think they stopped the practice, you're very mistaken. Every AMD and nVidia driver has game and app specific fixes and optimizations.
nVidia cheated in 3D Mark that way, so they patched/changed their benchmark to prevent it. Also, again nVidia, patched their drivers so some of the more expensive but visually invisible calls like scene flushes in a particular game is batched (e.g. do all 50 flushes at the 50th call) to prevent the game becoming a slide show on expensive hardware.
This is also why AMDs and Intel's open source drivers under Linux a success, because they are vanilla drivers written from scratch per spec, and if your code calls OpenGL/Vulkan to spec, then you're golden.
Even some companies cross compile AMD's Linux drivers for windows on embedded systems since they're free from useless optimizations from them.
Interestingly, most benchmark controversies back in the day are now expected behaviour, i.e. game-specific optimizations with no (well, in this age of upscalers and other lossy optimization techniques, probably even somewhat) visible image degradation. A gaming-specific driver with no game-specific improvements in its changelog would be considered strange, and it very much works with executable detection.
Back in the day, there was still the argument that drivers should not optimize for benchmarks even when visually identical, because it wouldn't show the hardware's real world potential. Kinda cute from today's perspective. :)
But of course there were the obvious cases...
The Quack3 lowering filtering quality as shown above, of course (at least that one was put into the driver as a togglable setting later on).
But the most cheeky one has to be nVidia's 3dmark03 "optimizations", where they blatantly put static clip planes into the scenes so that everything outside the predefined camera path from the benchmark sequence would simply be cut from the scene early (which e.g. fully broke the freelook patched into 3dmark and would generally break any interactive application)
I think that was the first case (to go public), but I remember reading about this in game magazines a couple times after this, for both ATI and nvidia.
It is also part of the benchmarks game they play against each other.
But in 2001 ATI was caught applying optimizations to Quake 3 when someone realized if you renamed the executable from “quake” to “quack” the score dropped a ton. It was a big scandal.
I know that’s common now but that wasn’t a thing that was done at the time.
Dead Comment
This ticket is a good starting place to see the chain of issues around the ongoing work: https://github.com/NVIDIA/cutlass/pull/2037
https://docs.nvidia.com/cutlass/index.html
I wonder if we search the comments if we can find something referencing this.
If you have hundreds of passes that are complex and rely on various "contracts" like type names or some shit, then really crazy things like this can happen unintentionally and not maliciously
Names can be both informative, and misdirecting, at the same time.
First, nVidia and ATI used executable names for detecting games, then they started to add heuristics.
If you think they stopped the practice, you're very mistaken. Every AMD and nVidia driver has game and app specific fixes and optimizations.
nVidia cheated in 3D Mark that way, so they patched/changed their benchmark to prevent it. Also, again nVidia, patched their drivers so some of the more expensive but visually invisible calls like scene flushes in a particular game is batched (e.g. do all 50 flushes at the 50th call) to prevent the game becoming a slide show on expensive hardware.
This is also why AMDs and Intel's open source drivers under Linux a success, because they are vanilla drivers written from scratch per spec, and if your code calls OpenGL/Vulkan to spec, then you're golden.
Even some companies cross compile AMD's Linux drivers for windows on embedded systems since they're free from useless optimizations from them.
Interestingly, most benchmark controversies back in the day are now expected behaviour, i.e. game-specific optimizations with no (well, in this age of upscalers and other lossy optimization techniques, probably even somewhat) visible image degradation. A gaming-specific driver with no game-specific improvements in its changelog would be considered strange, and it very much works with executable detection.
Back in the day, there was still the argument that drivers should not optimize for benchmarks even when visually identical, because it wouldn't show the hardware's real world potential. Kinda cute from today's perspective. :)
But of course there were the obvious cases...
The Quack3 lowering filtering quality as shown above, of course (at least that one was put into the driver as a togglable setting later on).
But the most cheeky one has to be nVidia's 3dmark03 "optimizations", where they blatantly put static clip planes into the scenes so that everything outside the predefined camera path from the benchmark sequence would simply be cut from the scene early (which e.g. fully broke the freelook patched into 3dmark and would generally break any interactive application)
Deleted Comment
Dead Comment
If the headline was "FB8 is ~7% faster when kernel name has 'cutlass' in it...", it wouldn't seem sensational.