EDIT: also, wondering why the above article says SetMenu requires a syscall, which is not mentioned in OP, I found this: http://www.fengyuan.com/article/win32ksyscall.html - looks like many GUI operations on Windows NT/2000 were implemented in kernel. That can't have been very good for performance to constantly context-switch to draw a window, no?
Moving bits of GDI and USER into the kernel was to improve performance in Windows NT 4.0, I think previously in NT 3.x they often needed user space context switches, whilst syscalls were cheaper (or something like that), especially for ones that ended up making driver / kernel calls anyway.
In hindsight a silly idea, trading off security for a speed boost.
In Windows NT 3, the graphics calls were stubs that sent LPC messages to the CSRSS, where the graphics drivers lived. At the time it was a very microkernel-like design, with message passing to server processes.
And the criticisms were the contemporary 1990s criticisms of microkernel designs, namely that all of this message passing was slow. (In reality, this is a red herring in most critiques of microkernels. In systems ranging from AT&T STREAMS to Windows NT today, I/O subsystems are built anyway around the idea of passing request packets around, be they mbufs or IRPs.) So it all got moved into the kernel, and then some of it in later versions got moved back out again, and rearchitected when windowing became more about compositing multiple whole bitmaps that could be constructed locally rather than drawing into a shared area with lots of clip rectangles.
> In reality, this is a red herring in most critiques of microkernels.
That’s true. I remember reading the L4 microkernel papers, showing how they achieved really decent message-passing performance on Intel 486 processors IIRC. I figure context switches are even more optimised on modern architectures.
In the early days, GDI was drawing directly into the framebuffer; making callers go through the kernel provides a limited amount of security against just drawing all over or reading the entire desktop, I suppose. Back in those days you had enough RAM to store one (1) copy of the framebuffer, the active one, and non-foreground windows or parts thereof simply got overpainted.
And WPF and subsequent toolkits follow a more modern model of "application draws into its own RAM region using DirectX which is then composited into the desktop".
I wonder how the DRM used in some video games that calls Windows system calls directly handles these changes. We know that some of it uses windows system calls directly because both Linux and Wine had to be patched to support it:
Some operating systems have massive fan-in and fan-out on what is internally one single system call crossing the shell/kernel divide. Witness how much is done by sysctl() on the BSDs, for example. Whereas others will be more 1-to-1.
Then there's the fact that the Native API for Windows NT is structurally very different to the Linux API (as they are both different from, say, XNU). It's basically entirely coincidental that the numbers are even close, and there are no general conclusions that one can draw from such statistics.
Except, perhaps, a general but rather facile conclusion that these operating systems aren't running on 8-bit processor architectures. (-:
I recall discussion (can't find now) that performance would be impacted for Linux if a syscall table lookup had to spill to a second page. That gives a limit of 512 64-bit pointers for syscalls we want to be high-performance, which may drive both OS's to start limiting new syscalls as they get close.
I don't know if there's more to this claim than just concern about an extra TLB entry though.
The interesting thing about this is that some syscalls are versioned, even though the syscall interface is internal and private. There's NtLoadKey, NtLoadKey2, NtLoadKey3 and even NtLoadKeyEx.
This kind of versioning on public APIs, I understand, but syscalls are only meant to be invoked by ntdll. Why do they need more than one?
The consumers of the Native API are things like the original POSIX subsystem, the Interix POSIX subsystems, the OS/2 subsystem, the fourth POSIX subsystem used in WSLv1, NTVDMs, and of course the Win32 subsystem. Some of these were frozen long ago. The still live ones do not necessarily change in lockstep.
That said, for those particular API functions there is an interesting history that is, rather, about mitigating security holes:
Yes, but the Native API is NTDLL, a userspace wrapper around the system calls. On Windows nothing except NTDLL is meant to invoke system calls directly, and my experience was that this is basically true. Some apps will bypass the Win32 subsystem and link against NTDLL directly (which they aren't meant to do), but outside of a handful of very obfuscated video game DRM systems and malware, not much is invoking system calls directly.
Changes to the system calls to close exploits are clear, but I'm really curious what software is invoking NtLoadKey directly that Microsoft themselves can't change, and then kept doing it even as the system call evolved over time. These aren't documented even in headers so it takes some reverse engineering to be able to do that.
> This kind of versioning on public APIs, I understand, but syscalls are only meant to be invoked by ntdll. Why do they need more than one?
You got three common suffixes for function names in Windows. A and W relate to the encoding of the string parameters - A refers to the strings being encoded in plain ASCII, W to UTF-16.
And Ex, 2, 3, whatever - these refer to extensions with different parameters. Like, the "original" function may have had only two parameters or a tiny struct, but for more modern usecases they might have added a few parameters, expanded the struct or, even worse, re-arrange fields in the struct.
Now, of course they could have gone the Java/C++ path - basically, overload the function name with different parameters and have the "old" functions call whatever the most detailed constructor is and set default values for the newly expected parameters (or convert, if needed). But that doesn't work with C code at all, which is why the functions/syscalls have to have different names, and additionally the Java/C++ way imposes another function call with its related expenses while having dedicated functions/entrypoints allows for a tiny bit more performance, at the cost of sometimes significant code duplication.
And on top of all of that, MS has to take into account that there are a lot of third party code that doesn't use ntdll, user32 and god knows what else is supposed to be the actual API interface, but instead goes as close to the kernel as possible. Virus scanners, DRM solutions, anti-cheat measures, audit/compliance tools - these all hook themselves in everywhere they can. It's a routine source of issues with Windows updates...
The versioned syscalls exist to maintain binary compatibility with older applications while adding new functionality - when Microsoft needs to extend a syscall with new parameters, they create a new version rather than breaking existing internal callers that might be used by third-party applications reverse-engineering ntdll.
I can't tell you if that is actually the case here but most private Win32 API is actually public API since so many things are using it anyways. They never drew the line there and a lot of people go "well, they never changed this in the past, why would they now".
This is not the Win32 API, and Raymond Chen and others at Microsoft very much did draw a line when it came to people using the Native API of Windows NT.
The table is not empty. You need to pick the windows versions to show since the table would be so huge if it showed all of them that they are hidden by default. Click “show” in the table head.
EDIT: also, wondering why the above article says SetMenu requires a syscall, which is not mentioned in OP, I found this: http://www.fengyuan.com/article/win32ksyscall.html - looks like many GUI operations on Windows NT/2000 were implemented in kernel. That can't have been very good for performance to constantly context-switch to draw a window, no?
In hindsight a silly idea, trading off security for a speed boost.
And the criticisms were the contemporary 1990s criticisms of microkernel designs, namely that all of this message passing was slow. (In reality, this is a red herring in most critiques of microkernels. In systems ranging from AT&T STREAMS to Windows NT today, I/O subsystems are built anyway around the idea of passing request packets around, be they mbufs or IRPs.) So it all got moved into the kernel, and then some of it in later versions got moved back out again, and rearchitected when windowing became more about compositing multiple whole bitmaps that could be constructed locally rather than drawing into a shared area with lots of clip rectangles.
That’s true. I remember reading the L4 microkernel papers, showing how they achieved really decent message-passing performance on Intel 486 processors IIRC. I figure context switches are even more optimised on modern architectures.
Aero introduced accelerated compositing: https://learn.microsoft.com/en-us/archive/msdn-magazine/2007...
And WPF and subsequent toolkits follow a more modern model of "application draws into its own RAM region using DirectX which is then composited into the desktop".
https://docs.kernel.org/admin-guide/syscall-user-dispatch.ht...
If I recall correctly, Jurassic World Evolution’s DRM is one of those that needed this to work.
- https://www.mdsec.co.uk/2022/04/resolving-system-service-num...
- https://klezvirus.github.io/RedTeaming/AV_Evasion/NoSysWhisp...
- https://whiteknightlabs.com/2024/07/31/layeredsyscall-abusin...
Though I'm not sure which of these techniques, if any, would be most favored by a game DRM as I've never looked into it.
And 467 on the Linux list https://filippo.io/linux-syscall-table.
Ballpark the same number.
Some operating systems have massive fan-in and fan-out on what is internally one single system call crossing the shell/kernel divide. Witness how much is done by sysctl() on the BSDs, for example. Whereas others will be more 1-to-1.
Then there's the fact that the Native API for Windows NT is structurally very different to the Linux API (as they are both different from, say, XNU). It's basically entirely coincidental that the numbers are even close, and there are no general conclusions that one can draw from such statistics.
Except, perhaps, a general but rather facile conclusion that these operating systems aren't running on 8-bit processor architectures. (-:
I don't know if there's more to this claim than just concern about an extra TLB entry though.
This kind of versioning on public APIs, I understand, but syscalls are only meant to be invoked by ntdll. Why do they need more than one?
That said, for those particular API functions there is an interesting history that is, rather, about mitigating security holes:
* https://www.tiraniddo.dev/2020/05/silent-exploit-mitigations...
Changes to the system calls to close exploits are clear, but I'm really curious what software is invoking NtLoadKey directly that Microsoft themselves can't change, and then kept doing it even as the system call evolved over time. These aren't documented even in headers so it takes some reverse engineering to be able to do that.
You got three common suffixes for function names in Windows. A and W relate to the encoding of the string parameters - A refers to the strings being encoded in plain ASCII, W to UTF-16.
And Ex, 2, 3, whatever - these refer to extensions with different parameters. Like, the "original" function may have had only two parameters or a tiny struct, but for more modern usecases they might have added a few parameters, expanded the struct or, even worse, re-arrange fields in the struct.
Now, of course they could have gone the Java/C++ path - basically, overload the function name with different parameters and have the "old" functions call whatever the most detailed constructor is and set default values for the newly expected parameters (or convert, if needed). But that doesn't work with C code at all, which is why the functions/syscalls have to have different names, and additionally the Java/C++ way imposes another function call with its related expenses while having dedicated functions/entrypoints allows for a tiny bit more performance, at the cost of sometimes significant code duplication.
And on top of all of that, MS has to take into account that there are a lot of third party code that doesn't use ntdll, user32 and god knows what else is supposed to be the actual API interface, but instead goes as close to the kernel as possible. Virus scanners, DRM solutions, anti-cheat measures, audit/compliance tools - these all hook themselves in everywhere they can. It's a routine source of issues with Windows updates...
Deleted Comment