The entire field of math is fractal-like. There are many, many low hanging fruits everywhere. Much of it is rote and not life changing. A big part of doing “interesting” math is picking what to work on.
A more important test is to give an AI access to the entire history of math and have it _decide_ what to work on, and then judge it for both picking an interesting problem and finding a novel solution.
If LLMs were already a breakthrough in proving theorems, even for obscure minor theorems, there would be a massive increase in published papers due to publish or perish academic incentives.
That's why it's always a hypothetical never backed with actual examples. It's one of those things that sounds plausible until you look at the numbers. Movies close to 100% have pretty high average scores and Movies with majority 3/5's are nowhere near 100%.
Yeah 100% for RT doesn't mean 10/10, but that's it.
These are all movies with (at the time) >90% "approval" rating but average score about 7/10 with most reviews around the 6/10 threshold and tapering down at 7/10,8/10 (as opposed to being multi-modal/split-opinion, e.g. many at 6/10 and many also at 10/10).