Indeed, aside from a party trick, why build an executable trampoline at runtime when you can store and retrieve the context, or a pointer to the context, with SetWindowLong() / GetWindowLong() [1]?
Slightly related: in my view Win32 windows are a faithful implementation of the Actor Model. The window proc of a window is mutable, it represents the current behavior, and can be changed in response to any received message. While I haven't personally seen this used in Win32 programs it is a powerful feature as it allows for implementing interaction state machines in a very natural way (the same way that Miro Samek promotes in his book.)
[1] https://learn.microsoft.com/en-us/windows/win32/api/winuser/...
The code as written, though, is missing a call to FlushInstructionCache() and might not work in processes that prohibit dynamic code generation. An alternative is to just pregenerate an array of trampolines in a code segment, each referencing a mutable pointer in a parallel array in the data segment. These can be generated straightforwardly with a little template magic. This adds size to the executable unlike an empty RWX segment, but doesn't run afoul of any dynamic codegen restrictions or require I-cache flushing. The number of trampolines must be predetermined, but the RWX segment has the same limitation.
Windows x64 and ARM64 do use register passing, with 4 registers for x64 (rcx/rdx/r8/r9) and 8 registers for ARM64 (x0-x7). Passing an additional parameter on the stack would be cheap compared to the workarounds that everyone has to do now.