On all modern processors, that instruction measures wallclock time with a counter which increments at the base frequency of the CPU unaffected by dynamic frequency scaling.
Apart from the great resolution, that time measuring method has an upside of being very cheap, couple orders of magnitude faster than an OS kernel call.
If you want that on Windows, well, it’s possible, but you’re going to have to do it asynchronously from a different thread and also compute the offsets your own damn self[2].
Alternatively, on AMD processors only, starting with Zen 2, you can get the real cycle count with __aperf() or __rdpru(__RDPRU_APERF) or manual inline assembly depending on your compiler. (The official AMD docs will admonish you not to assign meaning to anything but the fraction APERF / MPERF in one place, but the conjunction of what they tell you in other places implies that MPERF must be the reference cycle count and APERF must be the real cycle count.) This is definitely less of a hassle, but in my experience the cap_user_rdpmc method on Linux is much less noisy.
[1] https://man7.org/linux/man-pages/man2/perf_event_open.2.html
[2] https://www.computerenhance.com/p/halloween-spooktacular-day...
Are you sure about that?
> time spent in syscalls (if you don’t want to count it)
The time spent in syscalls was the main objective the OP was measuring.
> cycle counter
While technically interesting, most of the time I do my micro-benchmark I only care about wallclock time. Contradictory to what you see in search engines and ChatGPT, RDTSC instruction is not a cycle counter, it’s a high resolution wallclock timer. That instruction was counting CPU cycles like 20 years ago, doesn’t do that anymore.