It seems to me many models - maybe by design - have a recognizable style which would be much easier to detect than evaluating the factual quality of answers.
But, in practice, when asking a model to pick the best answer they see a single question / answers pair and focus on determining what they think is best.
Of all my time at uni, I wish I had a recording of this event.
I understood from students who had attended a writing workshop with her earlier in the day, that she was gifted teacher.