Readit News logoReadit News
aluminum96 commented on OpenAI claims gold-medal performance at IMO 2025   twitter.com/alexwei_/stat... · Posted by u/Davidzheng
bwfan123 · 2 months ago
Did humans formalize the inputs ? or was the exact natural language input provided to the llm. A lot of detail is missing on the methodology used. Not to mention of any independent validation.

My skepticism stems from the past frontier math announcement which turned out to be a bluff.

aluminum96 · 2 months ago
People are reading a lot into the FrontierMath articles from a couple months ago, but tbh I don’t really understand what the controversy is supposed to be there. failing to clearly disclose sponsoring Epoch to make the benchmark clearly doesn’t affect performance of a model on it
aluminum96 commented on OpenAI claims gold-medal performance at IMO 2025   twitter.com/alexwei_/stat... · Posted by u/Davidzheng
skdixhxbsb · 2 months ago
> We can only go off their word

We’re talking about Sam Altman’s company here. The same company that started out as a non profit claiming they wanted to better the world.

Suggesting they should be given the benefit of the doubt is dishonest at this point.

aluminum96 · 2 months ago
“they must be lying because I personally dislike them”

This is why HN threads about AI have become exhausting to read

aluminum96 commented on OpenAI claims gold-medal performance at IMO 2025   twitter.com/alexwei_/stat... · Posted by u/Davidzheng
AIPedant · 2 months ago
It almost certainly is specialized to IMO problems, look at the way it is answering the questions: https://xcancel.com/alexwei_/status/1946477742855532918

E.g here: https://pbs.twimg.com/media/GwLtrPeWIAUMDYI.png?name=orig

Frankly it looks to me like it's using an AlphaProof style system, going between natural language and Lean/etc. Of course OpenAI will not tell us any of this.

aluminum96 · 2 months ago
OpenAI explicitly stated that it is natural language only, with no tools such as Lean.

https://x.com/alexwei_/status/1946477745627934979?s=46&t=Hov...

aluminum96 commented on OpenAI claims gold-medal performance at IMO 2025   twitter.com/alexwei_/stat... · Posted by u/Davidzheng
blibble · 2 months ago
openai have been caught doing exactly this before
aluminum96 · 2 months ago
Why do people keep making up controversial claims like this? There is no evidence at all to this effect
aluminum96 commented on OpenAI claims gold-medal performance at IMO 2025   twitter.com/alexwei_/stat... · Posted by u/Davidzheng
johnecheck · 2 months ago
The key bit here is whether the LLM doing the cherry picking had knowledge of the solution. If it didn't, this is a meaningful result. That's why I'd like more info, but I fear OpenAI is going to try to keep things under wraps.
aluminum96 · 2 months ago
Mark Chen posted that the system was locked before the contest. [1] It would obviously be crazy cheating to give verifiers a solution to the problem!

[1] https://x.com/markchen90/status/1946573740986257614?s=46&t=H...

aluminum96 commented on OpenAI claims gold-medal performance at IMO 2025   twitter.com/alexwei_/stat... · Posted by u/Davidzheng
emp17344 · 2 months ago
Why don’t they release some info beyond a vague twitter hype post? I’m beginning to hate OpenAI for releasing statements like this that invariably end up being less impressive than they make it sound initially.
aluminum96 · 2 months ago
The proofs were published on GitHub for inspection, along with some details (generated within the time limit, by a system locked before the problems were released, with no external tools).

https://github.com/aw31/openai-imo-2025-proofs/tree/main

aluminum96 commented on OpenAI claims gold-medal performance at IMO 2025   twitter.com/alexwei_/stat... · Posted by u/Davidzheng
bwfan123 · 2 months ago
Based on the past history with frontier-math & AIME 2025 [1],[2] I would not trust announcements which cant be independently verified. I am excited to try it out though.

Also, the performance of LLMs on imo 2025 was not even bronze [3].

Finally, this article shows that LLMs were just mostly bluffing [4] on usamo 2025.

[1] https://www.reddit.com/r/slatestarcodex/comments/1i53ih7/fro...

[2] https://x.com/DimitrisPapail/status/1888325914603516214

[3] https://matharena.ai/imo/

[4] https://arxiv.org/pdf/2503.21934

aluminum96 · 2 months ago
The solutions were publicly posted to GitHub: https://github.com/aw31/openai-imo-2025-proofs/tree/main
aluminum96 commented on GPT-4.5 or GPT-5 being tested on LMSYS?   rentry.co/GPT2... · Posted by u/atemerev
93po · a year ago
what's the right answer? my assumption is "not enough information"
aluminum96 · a year ago
What, you mean your fruit preferences don't form a total order?
aluminum96 commented on Google lays off its Python team   social.coop/@Yhg1s/112332... · Posted by u/compiler-guy
dmoy · a year ago
Same thing with the Kythe (aka Grok) team that does the cross references for codesearch.
aluminum96 · a year ago
Just vaporized a whole team so the roles can be moved overseas :(
aluminum96 commented on SFMTA's train system running on floppy disks; city fears 'catastrophic failure'   abc7news.com/san-francisc... · Posted by u/sidlls
aluminum96 · a year ago
SF Voters rejected Proposition A in 2022 [1], which would have included funding to upgrade Muni's control systems (among many other projects). We'll eventually have to find the money somewhere else when the system fails.

[1] https://www.sfchronicle.com/sf/article/S-F-voters-narrowly-r...

u/aluminum96

KarmaCake day986October 30, 2018View Original