I'm impressed but only a little surprised an AI reasoning model could help with Aaronson's proof.
The reason I'm only a little surprised is that it's the kind of question I would expect to be in the literature somewhere, either as stated or stated similarly, and I suspect this is why GPT5 can do it.
I am impressed because I know how hard it can be to find an existing proof, having spent a very long time on a problem before finding the solution in a 1950 textbook by Feller. I would not expect this to be at all easy to find.
I can see this ability advancing science in many areas. The number of published papers on medical science is insane. I look forward to medical researchers questions being answered by GPT5 too, although in that case it'd need to provide a citation since proof can be harder to come by.
Also, it's a difficult proof step and if I'd come up with it, I'd be /very/ pleased with myself. Although I suspect GPT5 probably didn't come up with this based on my limited experience using it to try and solve unrelated problems.
As someone who has worked in adjacent areas, I guessed that one might find it in random matrix pedagogy, but only after reading Sam (B) Hopkin's comment was I able to get google to give a source for something close to that formula:
I appreciated the realistic method he described for working with GPT-5:
> Given a week or two to try out ideas and search the literature, I’m pretty sure that Freek and I could’ve solved this problem ourselves. Instead, though, I simply asked GPT5-Thinking. After five minutes, it gave me something confident, plausible-looking, and (I could tell) wrong. But rather than laughing at the silly AI like a skeptic might do, I told GPT5 how I knew it was wrong. It thought some more, apologized, and tried again, and gave me something better. So it went for a few iterations, much like interacting with a grad student or colleague.
Excerpt:
But here’s a reason why other people might care. This is the first paper I’ve ever put out for which a key technical step in the proof of the main result came from AI—specifically, from GPT5-Thinking.
It can't offer solutions, it can offer cribbed patterns from the training corpus (more specifically some fuzzy superposition of symbol combinations) that apply in some specific context. It's not clear why Aaronson is constantly hyping this stuff b/c it seems like he is much more rigorous in his regular work than when he is making grand proclamations about some impending singularity wherein everyone just asks the computer the right questions to get the right answers.
> maybe GPT5 had seen this or a similar construction somewhere in its training data
I'm disappointed that he didn't spend a little time checking if this was the case before publishing the blog post. Without GPT, would it really have taken "a week or two to try out ideas and search the literature", or would it just have taken an hour or so to find a paper that used this function? Just saying "I spent some time searching and couldn't find this exact function published anywhere" would have added a lot to the post.
Sharing the conversation would be cool too, I'm curious if Scott just said "no that won't work" 10 times until it did, or if he was constructively working with the LLM to get to an answer.
The expression f(z) = \sum_i 1/(z-\lambda_i) is called Stieltjes transform and is heavily used in random matrix theory and similar expressions are used in other works such as Batson, Spielman and Srivastava. This is all to analyze the behavior of eigenvalues which is exactly what they were trying to understand. I'd be very surprised if Aaronson doesn't know about this.
> or would it just have taken an hour or so to find a paper that used this function?
It is pretty hard to find something like this perhaps if you had math aware search engine enhanced with AI and access to all math papers you could find if this was used in the past. I tried using approach0 (math aware search engine) but it isn't good enough and I didn't found anything.
Yeah, if you don't know the name of the thing you're looking for, you can spend weeks looking for it. If you just search for generic like "eigenvalue bound estimate", you'll find thousands of papers and hundreds of textbooks, and it will take substantial amount of time to decide whether each is actually relevant to what you're looking for.
Scott Aaronson worked on watermarking text from GPT to catch plagiarizing. This is the most commercially naïve project ever, given, at the time, most of ChatGPT's paid usage was from students using ChatGPT's output to cheat on assignments. If anything this should serve to disprove his impure motives in reporting these results.
I think you are missing the forest in the trees. This is one of the world's leading experts in Quantum Computing, receiving ground breaking technical help, in his field of expertise from a commercially available AI.
What he worked on is irrelevant. If you are a contractor for an American startup, it is highly likely that you received an options package, especially if you are high profile.
The help is not ground breaking. There are decades old theorem prover tactics that are far more impressive, all without AI.
>I don't find this trial and error pattern matching with human arbitration very impressive.
It might not be very impressive, but if it allows experts in mathematics and physics to reduce the amount of time it takes them to produce new proofs from 1-2 weeks to 1-2 hours, that's a very meaningful improvement in their productivity.
If Aaronson had stock or options in OpenAI I don't think he'd feel much need to make misleading statements to try and juice the stock price. For one thing it's not a listed stock and his readers can't buy it however much he hyped it. For another OpenAI's private market valuation is actually doing okay already. This blog probably doesn't have any ability to move the perceived value of OpenAI.
Finally he's a very principled academic, not some kind of fly by night stock analyst. If you'd been reading his blog a while you'd know the chances of him saying something like this would be vanishingly small, unless it was true.
It’s always a crowd pleaser to be skeptical of ai development. Not sure what people feel like they are achieving for continually announcing they aren’t buying it when someone claims they’ve made effective use of these tools.
This post about GPT-5 helping with quantum complexity theory highlights how we're still thinking about these systems wrong.
The AI suggested using Tr[(I-E(θ))^-1] to analyze eigenvalue behavior—a clever combination of existing mathematical techniques, not some mystical breakthrough.
This is exactly what you'd expect from a system trained on mathematical literature: sophisticated pattern matching across formal languages, combining known approaches in useful ways.
The real question isn't "how did AI get so smart?" but "why do we keep being surprised when language models excel at manipulating structured formal languages?"
Mathematics is linguistics. Of course these systems are good at it.
The reason I'm only a little surprised is that it's the kind of question I would expect to be in the literature somewhere, either as stated or stated similarly, and I suspect this is why GPT5 can do it.
I am impressed because I know how hard it can be to find an existing proof, having spent a very long time on a problem before finding the solution in a 1950 textbook by Feller. I would not expect this to be at all easy to find.
I can see this ability advancing science in many areas. The number of published papers on medical science is insane. I look forward to medical researchers questions being answered by GPT5 too, although in that case it'd need to provide a citation since proof can be harder to come by.
Also, it's a difficult proof step and if I'd come up with it, I'd be /very/ pleased with myself. Although I suspect GPT5 probably didn't come up with this based on my limited experience using it to try and solve unrelated problems.
https://mathoverflow.net/a/300915
(In particular, I had to prompt with "Stieltjes transform". "Resolvent" alone didn't work.)
OpenAI took the answer from here or elsewhere, stripped attribution and credit and a tenured professor celebrates the singularity.
If there is no pushback from ethics commissions (in general), academia is doomed.
> Given a week or two to try out ideas and search the literature, I’m pretty sure that Freek and I could’ve solved this problem ourselves. Instead, though, I simply asked GPT5-Thinking. After five minutes, it gave me something confident, plausible-looking, and (I could tell) wrong. But rather than laughing at the silly AI like a skeptic might do, I told GPT5 how I knew it was wrong. It thought some more, apologized, and tried again, and gave me something better. So it went for a few iterations, much like interacting with a grad student or colleague.
I'm disappointed that he didn't spend a little time checking if this was the case before publishing the blog post. Without GPT, would it really have taken "a week or two to try out ideas and search the literature", or would it just have taken an hour or so to find a paper that used this function? Just saying "I spent some time searching and couldn't find this exact function published anywhere" would have added a lot to the post.
Sharing the conversation would be cool too, I'm curious if Scott just said "no that won't work" 10 times until it did, or if he was constructively working with the LLM to get to an answer.
Deleted Comment
It is pretty hard to find something like this perhaps if you had math aware search engine enhanced with AI and access to all math papers you could find if this was used in the past. I tried using approach0 (math aware search engine) but it isn't good enough and I didn't found anything.
Anyway, it took multiple tries and, as the article itself states, GPT might have seen a similar function in the training data.
I don't find this trial and error pattern matching with human arbitration very impressive.
I think you are missing the forest in the trees. This is one of the world's leading experts in Quantum Computing, receiving ground breaking technical help, in his field of expertise from a commercially available AI.
Deleted Comment
The help is not ground breaking. There are decades old theorem prover tactics that are far more impressive, all without AI.
It might not be very impressive, but if it allows experts in mathematics and physics to reduce the amount of time it takes them to produce new proofs from 1-2 weeks to 1-2 hours, that's a very meaningful improvement in their productivity.
Finally he's a very principled academic, not some kind of fly by night stock analyst. If you'd been reading his blog a while you'd know the chances of him saying something like this would be vanishingly small, unless it was true.
The AI suggested using Tr[(I-E(θ))^-1] to analyze eigenvalue behavior—a clever combination of existing mathematical techniques, not some mystical breakthrough.
This is exactly what you'd expect from a system trained on mathematical literature: sophisticated pattern matching across formal languages, combining known approaches in useful ways.
The real question isn't "how did AI get so smart?" but "why do we keep being surprised when language models excel at manipulating structured formal languages?"
Mathematics is linguistics. Of course these systems are good at it.