gallerdude (u/gallerdude)

gallerdude commented on Yet Another LLM Rant overengineer.dev/txt/2025... · Posted by u/sohkamyung

efilife · 19 days ago

> it cannot "logically reason" like a human does

Reason? Maybe. But there's one limitation that we currently have no idea how to overcome; LLMs don't know how much they know. If they tell you they don't something it may be a lie. If they tell you they do, this may be a lie too. I, a human, certainly know what I know and what I don't and can recall from where I know the information

gallerdude · 19 days ago

> OpenAI researcher Noam Brown on hallucination with the new IMO reasoning model:

> Mathematicians used to comb through model solutions because earlier systems would quietly flip an inequality or tuck in a wrong step, creating hallucinated answers.

> Brown says the updated IMO reasoning model now tends to say “I’m not sure” whenever it lacks a valid proof, which sharply cuts down on those hidden errors.

> TLDR, the model shows a clear shift away from hallucinations and toward reliable, self‑aware reasoning.

Source: https://x.com/chatgpt21/status/1950606890758476264

gallerdude commented on OpenAI claims gold-medal performance at IMO 2025 twitter.com/alexwei_/stat... · Posted by u/Davidzheng

beering · a month ago

What do you mean by “pure language model”? The reasoning step is still just the LLM spitting out tokens and this was confirmed by Deepseek replicating the o models. There’s not also a proof verifier or something similar running alongside it according to the openai researchers.

If you mean pure as in there’s not additional training beyond the pretraining, I don’t think any model has been pure since gpt-3.5.

gallerdude · a month ago

Local models you can get just the pretrained versions of, no RLHF. IIRC both Llama and Gemma make them available.

gallerdude commented on Rolling the ladder up behind us xeiaso.net/blog/2025/roll... · Posted by u/techknowlogick

burlesona · 2 months ago

> The issue with an industry awash with cheap dross, is that it becomes prohibitively expensive to produce high Quality stuff.

This seems to be one of the brutal truths of the modern world, and as far as I can tell it applies to everything. There's always a race to the bottom to make everything as cheaply as possible, and the further the industry goes down that "cheapness" scale, the more "quality" loses market share, the more expensive "quality" must be in order to operate at all, and finally things that used to be just "normal" and not too expensive are now luxury goods.

Consider textiles, carpentry, masonry, machine tooling, appliances, etc. etc.

This doesn't feel like a good outcome, but I'm not sure there's anything that can be done about it.

gallerdude · 2 months ago

I can see both sides of it. There’s a fancy bread bakery by where I live. I go infrequently, the bread is great. But it’s expensive, most of the I just want a cheap loaf from Target, as do most people.

Instead of broad employment of artisan breadsmiths, we have people doing email work, because it’s more economically valuable. If the government mandated a higher quality of bread, we’d be slightly richer and bread and slightly poorer in everything else.

gallerdude commented on VVVVVV Source Code github.com/TerryCavanagh/... · Posted by u/radeeyate

unwind · 4 months ago

Wow, that is cool! Did it help/affect your later choices with your career, did you end up a game developer, or at least try it or so? Always fun with closure! :)

gallerdude · 4 months ago

I made a very mediocre platformer in my senior year of high school, published on itch.io. I ended up becoming a software developer, which I enjoy 80% as much, but without any burnout or worrying about the superstar economics of being a game dev. Once the singularity hits, maybe I'll make more games.

https://gallerdude.itch.io/the-journey-east-full

gallerdude commented on VVVVVV Source Code github.com/TerryCavanagh/... · Posted by u/radeeyate

gallerdude · 4 months ago

When I was near the end of high school, my family visited London, and I was thinking about being a game dev. So I sent Terry Cavanagh an email, and to my surprise he completely agreed to get lunch.

He was extremely kind, gave me a lot of interesting life advice. I remember him saying that he got most of his ideas just from playing around with mechanics and experimenting a lot, he was never really one to get grand visions.

Anyways, great fellow, glad he opened source V (as he called it).

gallerdude commented on DeepSeek-Prover-V2 github.com/deepseek-ai/De... · Posted by u/meetpateltech

smusamashah · 4 months ago

Sorry, forgot multiply by 100

gallerdude · 4 months ago

classic human hallucination

gallerdude commented on AGI Is Still 30 Years Away – Ege Erdil and Tamay Besiroglu dwarkesh.com/p/ege-tamay... · Posted by u/Philpax

fusionadvocate · 4 months ago

Can someone throw some light on this Dwarkesh character? He landed a Zucc podcast pretty early on... how connected is he? Is he an industry plant?

gallerdude · 4 months ago

He's awesome.

I listened to Lex Friedman for a long time, and there was a lot of critiques of him (Lex) as an interviewer, but since the guests were amazing, I never really cared.

But after listening to Dwarkesh, my eyes are opened (or maybe my soul). It doesn't matter I've heard of not-many of his guests, because he knows exactly the right questions to ask. He seems to have genuine curiosity for what the guest is saying, and will push back if something doesn't make sense to him. Very much recommend.

gallerdude commented on OpenAI o3 and o4-mini openai.com/index/introduc... · Posted by u/maheshrijal

brap · 4 months ago

Where's the comparison with Gemini 2.5 Pro?

gallerdude · 4 months ago

For coding, I like the Aider polyglot benchmark, since it covers multiple programming languages.

Gemini 2.5 Pro got 72.9%

o3 high gets 81.3%, o4-mini high gets 68.9%

gallerdude commented on The most underreported story in AI is that scaling has failed to produce AGI fortune.com/2025/02/19/ge... · Posted by u/unclebucknasty

garymarcus · 6 months ago

For those wanting some background, rather than just wanting to vent:

1. Here is evaluation of my recent predictions: https://garymarcus.substack.com/p/25-ai-predictions-for-2025...

2. Here is annotated evaluation, slightly dated, considering almost line by line, of the original Deep Learning is Hitting a Wall paper: https://garymarcus.substack.com/p/two-years-later-deep-learn...

Ask yourself how much has really changed in the intervening year?

gallerdude · 6 months ago

It's funny, I see myself as basically just a pretty unabashed AI believer, but when I look at your predictions, I don't really have any core disagreements.

I know you as like the #1 AI skeptic (no offense), but like when I see points like "16. Less than 10% of the work force will be replaced by AI. Probably less than 5%.", that's something that seems OPTIMISTIC about AI capabilities to me. 5% of all jobs being automated would be HUGE, and it's something that we're up in the air about.

Same with "AI “Agents” will be endlessly hyped throughout 2025 but far from reliable, except possibly in very narrow use cases." - even the very existence of agents who are reliable in very narrow use cases is crazy impressive! When I was in college 5 years ago for Computer Science, this would sound like something that would take a decade of work for one giant tech conglomerate for ONE agentic task. Now its like a year off for one less giant tech conglomerate, for many possible agentic tasks.

So I guess it's just a matter of perspective of how impressive you see or don't see these advances.

I will say, I do disagree with your comment sentiment right here where you say "Ask yourself how much has really changed in the intervening year?".

I think the o1 paradigm has been crazy impressive. There was much debate over whether scaling up models would be enough. But now we have an entirely new system which has unlocked crazy reasoning capabilities.