If you decode the instruction, it makes sense to use XOR:
- mov ax, 0 - needs 4 bytes (66 b8 00 00) - xor ax,ax - needs 3 bytes (66 31 c0)
This extra byte in a machine with less than 1 Megabyte of memory did id matter.
In 386 processors it was also - mov eax,0 - needs 5 bytes (b8 00 00 00 00) - xor eax,eax - needs 2 bytes (31 c0)
Here Intel made the decision to use only 2 bytes. I bet this helps both the instruction decoder and (of course) saves more memory than the old 8086 instruction.
https://news.ycombinator.com/item?id=45297066