This is actually very wrong. Consider for instance the fact that people who grade your tests in school are typically more talented, capable, trained than the people taking the test. This is true even when an answer key exists.
> Also, human labels are good but have problems of their own,
Granted, but...
> it isn’t like by using a “different intelligence architecture” we elide all the possible errors
nobody is claiming this. We elide the specific, obvious problem that using a system to test itself gives you no reliable information. You need a control.
I don’t think we should assume answering a test would be easy for a Scantron machine just because it is very good at grading them, either.
In fact we know how to live forever, control our telomeres. We know it works because cancer exists. We just can’t control it but controlled cancer is effectively immortality.