Intel has supported such capability via Intel Processor Trace (PT) since at least 2014 [1]. Here is a full trace recorder built by Jane Street feeding into standard program trace visualizers [2].
ARM has supported such capability via the standard CoreSight Program Trace Macrocell (PTM)[3]/Embedded Trace Macrocell (ETM)[4] since at least 2000.
If you pair it with standard data trace, which is less commonly available, then you have the prerequisites for a hardware trace time travel debugger as originally seen in the early 2000s [5]
You can get similar performance/function tracing entirely in software via software-instrumented instruction trace and similar debugging information (though less granular performance information) via record-replay time travel debugger recordings.
[1] https://www.intel.com/content/www/us/en/support/articles/000...
[2] https://blog.janestreet.com/magic-trace/
[3] https://developer.arm.com/documentation/ihi0035/b/Program-Fl...
The standard didn't say "you must implement std::unordered_map as a hash table with chained buckets and extra memory allocations", but ithe standard specified several guarantees that make it very difficult to implement hash tables with open addressing.
Every constraint that you specify potentially locks out a better implementation.
For recursive rwlocks, there's a lot of ways to implement them. Do you want to lock out high performance implementations that do less error checking, for example?
On paper, unordered_map sounds great. It lists all the admirable properties you would theoretically want in a hashtable. Then in practice when you go to implement it, you realize that you've painted yourself into a garbage fire, as the saying goes.
I suppose this is a failing of the design by committee method, where the committee isn't directly responsible for implementation either before or during standard writing.