the classification seems very high (9.3). looks like they've said User Interaction is none, but from reading the writeup looks like you would need the image injected into a response prompted by a user?
if I understand it correctly, user's prompt does not need to be related to the specific malicious email. It's enough that such email was "indexed" by Copilot and any prompt with sensitive info request could trigger the leak.