Why does a „3rd party encryption software” appear in the call stack of a user mode process in the first place? Is this another case where a “security” software injects broken DLL files into all processes in your system?
I'd love to see Chrome and Firefox and IE working together to detect this kind of thing and put up a malware warning explaining how to uninstall it, and see how quickly it can be eliminated.
Seriously, there is zero valid reason to ever inject code into another program, other than as a debugging tool on a system being debugged.
I used to take point for Mozilla's efforts dealing with third-party interference in our binaries. Browsers are ripe targets for this kind of shit. The stories we could tell...
It seems that the chrome developers should be able to perform the same binary analysis on the suspect Mcafee software. I guess it's a bit harder without source code to reference side-by-side though.
That would be possible, but we'd have to install the software, then guess which binary was the culprit, and then have some way of finding the function boundaries. My crude analysis technique required on having symbols for chrome.dll to indicate where functions started, so I'd have to have switched tools to something else that could find those.
If one could reliably detect this DLL injection in the process address space, then the correct "fix" is to crash immediately. Authors of such tools should seek another way of accomplishing their goal, preferably one that does not export their own bugs to innocent bystanders.
Naïve question about ABIs: shouldn’t the caller be responsible for that? If I want a function to restore certain registers, wouldn’t it be simpler if I was the one that save them on my memory, call the function, and then override the registers with whatever values the function set? Otherwise it seems we’re just… asking for trouble, so to speak.
It would just be slow for the caller to have to push and pop (say 30) registers in general that the specific callee (and transitive callees) may not even use.
Most ABI specify some registers caller-saved, some registers callee-saved (retained unchanged from the perspective of the caller) and some registers scratch (not-saved).
In the end it depends on the architecture and on typical workload which are the fastest--and measurements can be made and it can be found out which combination is the fastest on average.
You usually wouldn't need to push&pop all registers, just ones that you want to preserve across the call. Regardless, yeah, non-volatile aka callee-saved registers are extremely important for good performance of code that calls functions (esp. loops - without callee-saved registers, you'd have to store the loop counter & length on the stack!)
> If I want a function to restore certain registers, wouldn’t it be simpler if I was the one that save them on my memory, call the function, and then override the registers with whatever values the function set?
ABIs are a convention that everybody follows regarding how such preservation should or should not occur, for which registers.
It's common to say "caller cleans up registers x, y, z, callee cleans up a, b, c".
So yes, the caller could do it or not do it, the choices are all possible to do both ways, but that wouldn't be the agreed upon ABI, it'd be something different.
Sounds like there should be an option when you write assembly code to tell the compiler "please save/restore any register that I'm modifying in this asm code according to the target you are compiling for"
I not sure whether that satire. In case its not, most languages that allow inline assembly (like C) have an optional "clobber list" argument that tells the dataflow analysis of the compiler that your assembly snippet overwrites certain registers [1]. Inline assembly doesn't have target specific clobber lists because it's assumed that the code only works on one target and the programmer has to take care of making it work.
When you use inline assembly, in fact the compiler does preserve the semantics of the surrounding program considering the target's ABI -- if you tell it correctly what impacts the inline asm has. Lots of rules must be followed in order to get this behavior just right. One of the most subtle ones is "early clobbers" [1].
In many cases, you can get all the benefits of inline assembly from compiler intrinsics while letting the compiler handle all the details of register allocation and scheduling.
Note that in OP's case IIUC this was assembly code and not inline assembly. If you write functions in assembly you are solely responsible for calling conventions and ABI conformance.
THat is essentially how inline assembly in GCC works. You have to declare which registers you are changing and in which registers you expect input and which contain results from your inline assembly block.
This for obvious reasons does not work if you have separate compilation unit written in assembly, then you have to follow the ABI.
In fact the fix to the WebRTC bug was to adjust the clobber list, thus telling the compiler to save the registers.
Why the compiler didn't notice that registers were being used that weren't on the clobber list is unclear to me. Your suggestion seems totally reasonable.
C does have official intrinsics for SIMD on Intel platforms, but the people who are good at writing video codecs don't like to use them because Wintel culture has such bad taste at naming functions (thanks to Hungarian notation) that using them is near-unreadable and it's easier to write everything in asm.
And, a compiler that knows custom SIMD optimizations for every algorithm anyone might ever need, and can recognize when you have coded one of them so it can substitute its SIMD version.
You could totally design a better convention in your own high-level language which compiled down to assembly, as long as you stayed inside it. But once you need to interface with external code you need to use a shared convention.
Your scheme sounds like it would make your code well-behaved as the callee (with perhaps some performance penalty?). But as a caller, you couldn't trust the external code not to clobber registers.
That kind of feature is very .. un-assembler. You certainly could, but the assembler doesn't keep track of any of the relevant information, so it would be a larger feature than you expect. It doesn't know the calling convention. It doesn't have a map of registers dirtied by which instructions. It doesn't do reachability analysis, so it doesn't even necessarily know what's "in the function".
before using XMM7 to zero anything? That way the compiler doesn't have to assume that XMM7 is zero, it can know.
Yes it's an extra instruction but XORing a register with itself is such a common metaphor for zeroing that register that CPU designers try to make it fast.
Edit: Just noticed that Veliladon essentially made the same comment herein and explained the reason why it's not done this way.
It's not just that. If you can't trust the ABI then everything else is wrong too. Everything. Not just zeroing registers. It only just so happens that in this case XMM7 is used for zeroing, but it could be used to save a variable across a function call. Then there is no trick to get it set back to the right value.
The ABI is not optional, or best effort, or best practice, or any other BS that passes in the ordinary world. It is just as required as the correct operation of instructions (e.g., add should actually add things, mul should actually multiply them, and so on).
My reading of it was that the bug was caused by XMM7 not being restored to its previous value.
E.g. on Linux, functions should restore the values of ebx, esi, edi, ... once they're done with them. The article says (on Windows) that XMM7 needs to be restored too.
If you can't trust one register being preserved (per the ABI), then you really can't trust the values of any registers.
Fair point, which is why I used the phrase "this particular example." In fact the compiler's lazy assumption about XMM7 was the key to determining that the ABI was being violated. It would have taken longer to figure this out if the compiler had been doing things my way, because then a failure of the callee to preserve XMM7 wouldn't have mattered.
I think the real fix is to set all registers to a canary value at the start of int main(), and then when exiting check the canary value is still there.
If anything has messed with any registers without permission, you crash and collect as much data as possible about any injected dll's.
Then you correlate these to find the culprits, and for each you contact the authors of the DLL and figure out a way to block the injection of any unfixed versions that cause crashes.
Seriously, there is zero valid reason to ever inject code into another program, other than as a debugging tool on a system being debugged.
Deleted Comment
https://dblohm7.ca/blog/2016/01/11/bugs-from-hell-injected-t...
https://dblohm7.ca/blog/2016/01/11/bugs-from-hell-injected-t...
Most ABI specify some registers caller-saved, some registers callee-saved (retained unchanged from the perspective of the caller) and some registers scratch (not-saved).
In the end it depends on the architecture and on typical workload which are the fastest--and measurements can be made and it can be found out which combination is the fastest on average.
That's a waste if the callee doesn't use them.
It's common to say "caller cleans up registers x, y, z, callee cleans up a, b, c".
So yes, the caller could do it or not do it, the choices are all possible to do both ways, but that wouldn't be the agreed upon ABI, it'd be something different.
1: https://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO....
In many cases, you can get all the benefits of inline assembly from compiler intrinsics while letting the compiler handle all the details of register allocation and scheduling.
Note that in OP's case IIUC this was assembly code and not inline assembly. If you write functions in assembly you are solely responsible for calling conventions and ABI conformance.
[1] https://stackoverflow.com/a/15819941/489590
This for obvious reasons does not work if you have separate compilation unit written in assembly, then you have to follow the ABI.
Why the compiler didn't notice that registers were being used that weren't on the clobber list is unclear to me. Your suggestion seems totally reasonable.
https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3...
Your scheme sounds like it would make your code well-behaved as the callee (with perhaps some performance penalty?). But as a caller, you couldn't trust the external code not to clobber registers.
Yes it's an extra instruction but XORing a register with itself is such a common metaphor for zeroing that register that CPU designers try to make it fast.
Edit: Just noticed that Veliladon essentially made the same comment herein and explained the reason why it's not done this way.
The ABI is not optional, or best effort, or best practice, or any other BS that passes in the ordinary world. It is just as required as the correct operation of instructions (e.g., add should actually add things, mul should actually multiply them, and so on).
E.g. on Linux, functions should restore the values of ebx, esi, edi, ... once they're done with them. The article says (on Windows) that XMM7 needs to be restored too.
If you can't trust one register being preserved (per the ABI), then you really can't trust the values of any registers.
If anything has messed with any registers without permission, you crash and collect as much data as possible about any injected dll's.
Then you correlate these to find the culprits, and for each you contact the authors of the DLL and figure out a way to block the injection of any unfixed versions that cause crashes.