FINDarkside (u/FINDarkside)

FINDarkside commented on Seven replies to the viral Apple reasoning paper and why they fall short garymarcus.substack.com/p... · Posted by u/spwestwood

ummonk · 2 months ago

I'm more saying that points 1 and 2 get subsumed under point 5 - to the extent that existing algorithms / logical systems for solving such problems are written by humans, an AGI wouldn't need to match the performance of those algorithms / logical systems - it would merely need to be able to create / use such algorithms and systems itself.

You make a good point though that the question of whether LLMs reason or not should not be conflated with the question of whether they're on the pathway to AGI or not.

FINDarkside · 2 months ago

Right, I agree there. Also that's something LLMs can already do. If you give the problem to ChatGPT o3 model, it will actually write python code, run it and give you the solution. But I think points 1 and 2 are still very valid things to talk about, because while Tower of Hanoi can be solved by writing code that doesn't apply to every problem that would require extensive reasoning.

FINDarkside commented on Seven replies to the viral Apple reasoning paper and why they fall short garymarcus.substack.com/p... · Posted by u/spwestwood

simonw · 2 months ago

That very much depends on which AGI definition you are using. I imagine there are a dozen or so variants out there. See also "AI" and "agents" and (apparently) "vibe coding" and pretty much every other piece of jargon in this field.

FINDarkside · 2 months ago

I think it's very widely accepted definition and there's really no competing definitions either as far as I know. While some people might think AGI means superintelligence, it's only because they've heard the term but never bothered to look up what it means.

FINDarkside commented on Seven replies to the viral Apple reasoning paper and why they fall short garymarcus.substack.com/p... · Posted by u/spwestwood

labrador · 2 months ago

You're assuming we're saying LLMs can't reason. That's not what we're saying. They can execute reasoning-like processes when they've seen similar patterns, but this breaks down when true novel reasoning is required. Most people do the same thing. Some poeple can come up with novel solutions to new problems, but LLMs will choke. Here's an example:

Prompt: "Let's try a reasoning test. Estimate how many pianos there are at the bottom of the sea."

I tried this on three advanced AIs* and they all choked on it without further hints from me. Claude then said:

    Roughly 3 million shipwrecks on ocean floors globally
    Maybe 1 in 1000 ships historically carried a piano (passenger ships, luxury vessels)
    So ~3,000 ships with pianos sunk
    Average maybe 0.5 pianos per ship (not all passenger areas had them)
    Estimate: ~1,500 pianos

*Claude Sonnet 4, Google Gemini 2.5 and GPT 4o

FINDarkside · 2 months ago

What does "choked on it" mean for you? Gemini 2.5 pro gives this, even estimating what amouns of those 3m ships that sank after pianos became common item. Not pasting the full reasoning here since it's rather long.

Combining our estimates:

From Shipwrecks: 12,500 From Dumping: 1,000 From Catastrophes: 500 Total Estimated Pianos at the Bottom of the Sea ≈ 14,000

Also I have to point out that 4o isn't a reasoning model and neither is Sonnet 4, unless thinking mode was enabled.

FINDarkside commented on Seven replies to the viral Apple reasoning paper and why they fall short garymarcus.substack.com/p... · Posted by u/spwestwood

thomasahle · 2 months ago

> 1. Humans have trouble with complex problems and memory demands. True! But incomplete. We have every right to expect machines to do things we can’t. [...] If we want to get to AGI, we will have to better.

I don't get this argument. The paper is about "whether RLLMs can think". If we grant "humans make these mistakes too", but also "we still require this ability in our definition of thinking", aren't we saying "thinking in humans is a illusion" too?

FINDarkside · 2 months ago

Agreed. But also his point about AGI is incorrect. AI that will perform on the level of average human in every task is AGI by definition.

FINDarkside commented on Seven replies to the viral Apple reasoning paper and why they fall short garymarcus.substack.com/p... · Posted by u/spwestwood

ummonk · 2 months ago

Most of the objections and their counterarguments seem like either poor objections (e.g. ad hominem against the first listed author) or seem to be subsumed under point 5. It’s annoying that most of this post focuses so much effort on discussing most of the other objections when the important discussion is the one to be had in point 5:

I.e. to what extent are LLMs able to reliably make use of writing code or using logic systems, and to what extent does hallucinating / providing faulty answers in the absence of such tool access demonstrate an inability to truly reason (I’d expect a smart human to just say “that’s too much” or “that’s beyond my abilities” rather than do a best effort faulty answer)?

FINDarkside · 2 months ago

I don't think most of the objections are poor at all apart from 3, it's this article that seems to make lots of strawmans. Especially the first objection is often heard because people claim "this paper proves LLMs don't reason". The author moves goalposts and is arguing against about whether LLMs lead to AGI, which is already a strawman for those arguments. And in addition, he even seems to misunderstand AGI, thinking it's some sort of super intelligence ("We have every right to expect machines to do things we can’t"). AI that can do everything at least as good as average human is AGI by definition.

It's especially weird argument considering that LLMs are already ahead of humans in Tower of Hanoi. I bet average person will not be able to "one-shot" you the moves to 8 disk tower of Hanoi without writing anything down or tracking the state with the actual disks. LLMs have far bigger obstacles to reaching AGI though.

5 is also a massive strawman with the "not see how well it could use preexisting code retrieved from the web" as well, given that these models will write code to solve these kind of problems even if you come up with some new problem that wouldn't exist in its training data.

Most of these are just valid the issues in the paper. They're not supposed to be some kind of arguments that try to make everything the paper said invalid. The paper didn't really even make any bold claims, it only concluded LLMs have limitations in its reasoning. It had a catchy title and many people didn't read past that.

FINDarkside commented on Show HN: I rewrote my Mac Electron app in Rust desktopdocs.com/?v=2025... · Posted by u/katrinarodri

taroth · 3 months ago

Here's a great comparison, updated two weeks ago. https://github.com/Elanis/web-to-desktop-framework-compariso...

Electron comes out looking competitive at runtime! IMO people over-fixate on disc space instead of runtime memory usage.

Memory Usage with a single window open (Release builds)

Windows (x64): 1. Electron: ≈93MB 2. NodeGui: ≈116MB 3. NW.JS: ≈131MB 4. Tauri: ≈154MB 5. Wails: ≈163MB 6. Neutralino: ≈282MB

MacOS (arm64): 1. NodeGui: ≈84MB 2. Wails: ≈85MB 3. Tauri: ≈86MB 4. Neutralino: ≈109MB 5. Electron: ≈121MB 6. NW.JS: ≈189MB

Linux (x64): 1. Tauri: ≈16MB 2. Electron: ≈70MB 3. Wails: ≈86MB 4. NodeGui: ≈109MB 5. NW.JS: ≈166MB 6. Neutralino: ≈402MB

FINDarkside · 3 months ago

The benchmark also says Tauri takes 25s to launch on Linux and build of empty app takes over 4 minutes on Windows. Not sure if those numbers are really correct.

FINDarkside commented on Next.js version 15.2.3 has been released to address a security vulnerability nextjs.org/blog/cve-2025-... · Posted by u/makepanic

ashishb · 5 months ago

Next.js is based on a fundamentally flawed premise that one can write code that runs in the browser as well as the backend.

The security posture for the code running in the browser is very different from the code running on a trusted backend.

A separation of concerns allows one to have two codebases, one frontend (untrustworthy but limited access) and one backend (trustworthy but a lot of access).

FINDarkside · 5 months ago

This vuln doesn't really have anything to do with that premise. Middleware always run on the server.

FINDarkside commented on Next.js version 15.2.3 has been released to address a security vulnerability nextjs.org/blog/cve-2025-... · Posted by u/makepanic

ronbenton · 5 months ago

This is a wild vuln in how trivial it is to execute. But maybe even wilder is the timeframe to event _start_ triaging the bug after it was reported. How? Was it incorrectly named? Was the severity not correctly stated? Someone help me understand how this sits for 2+ weeks.

2025-02-27T06:03Z: Disclosure to Next.js team via GitHub private vulnerability reporting

2025-03-14T17:13Z: Next.js team started triaging the report

FINDarkside · 5 months ago

Yeah, "obvious" critical vulnerability that is easy to use against any Nextjs app, spend 2 weeks making a fix and then announce on Friday evening that all Nextjs apps are free game. Lovely. Luckily doens't affect any of the sites I'm responsible for, since I hated middleware and most of the Nextjs "magic" features already.

FINDarkside commented on Next.js version 15.2.3 has been released to address a security vulnerability nextjs.org/blog/cve-2025-... · Posted by u/makepanic

ratorx · 5 months ago

I found a different article that goes into more detail:

https://zeropath.com/blog/nextjs-middleware-cve-2025-29927-a...

This looks trivially easy to bypass.

More generally, the entire concept of using middleware which communicates using the same mechanism that is also used for untrusted user input seems pretty wild to me. It divorces the place you need to write code for user request validation (as soon as the user request arrives) from the middleware itself.

Allowing ANY headers from the user except a whitelisted subset also seems like an accident waiting to happen. I think the mindset of ignoring unknown/invalid parts of a request as long as some of it is valid also plays a role.

The framework providing crutches for bad server design is also a consequence of this mindset - are there any concrete use cases where the flow for processing a request should not be a DAG? Allowing recursive requests across authentication boundaries seems like a problem waiting to happen as well.

FINDarkside · 5 months ago

That "article" looks like AI generated slop. It suggests `if (request.headers.has('x-middleware-subrequest'))` in your middleware as a fix for the problem, while the whole vulnerability is that your middleware won't be executed when that header is present.

FINDarkside commented on RabbitMQ vs. Kafka – An Architect’s Dilemma (Part 1) eranstiller.com/rabbitmq-... · Posted by u/gslin

JohnMakin · 2 years ago

No, I’m not, because it was years ago, and I’m asking for clarification because what was said immediately sounded wrong to me (I’ve managed a lot of rabbitmq deployments) and you’ve not really given one other than an appeal to authority. guess I have my answer. Can’t find anything that suggests rabbitmq natively supports anything like sink connectors. thanks.

FINDarkside · 2 years ago

So let me get this straight. You've used Kafka once, RabbitMQ never. You don't really know what you did with Kafka. But you somehow know that RabbitMQ cannot do the thing which you don't really remember anymore. Doesn't make much sense to be honest.

Nobody can really have any sources for RabbitMQ being able to do it if you don't know what it supposedly cannot do. The way you descibed it, is that you simply read data and then did something with the data and passed it to somewhere else. RabbitMQ obviously can do it.