avsteele (u/avsteele)

avsteele commented on Seven replies to the viral Apple reasoning paper and why they fall short garymarcus.substack.com/p... · Posted by u/spwestwood

foldr · 3 months ago

This sort of omission would not be considered scientific malpractice even in a journal article, let alone a blog post. A rebuttal of a position that fails to address the strongest arguments for it is a bad rebuttal, but it’s not scientific malpractice to write a bad paper — let alone a bad blog post.

I don’t think I agree with you that GM isn’t addressing the points in the paper you link. But in any case, you’re not doing your argument any favors by throwing in wild accusations of malpractice.

avsteele · 3 months ago

Malpractice slightly hyperbolic.

But anybody relying on Gary's posts in order to be be informed on this subject is being being mislead. This isn't an isolated incident either.

People need to be made be aware when you read him it is mere punditry, not substantive engagement with the literature.

avsteele commented on Seven replies to the viral Apple reasoning paper and why they fall short garymarcus.substack.com/p... · Posted by u/spwestwood

foldr · 3 months ago

It does rebut point (1) of the abstract. Perhaps not convincingly, in your view, but it does directly addresses this kind of response.

avsteele · 3 months ago

Papers make specific conclusions based on specific data. The paper I linked specifically rebuts the conclusions of the paper. Gary makes vague statements that could be interpreted as being related.

It is scientific malpractice to write a post supposedly rebutting responses to a paper and not directly address the most salient one.

avsteele commented on Seven replies to the viral Apple reasoning paper and why they fall short garymarcus.substack.com/p... · Posted by u/spwestwood

avsteele · 3 months ago

This doesn't rebut anything from the best critique of the Apple paper.

https://arxiv.org/abs/2506.09250

avsteele commented on BYD's Five-Minute Charging Puts China in the Lead for EVs spectrum.ieee.org/byd-meg... · Posted by u/pseudolus

amazingamazing · 3 months ago

still don't get the point of huge batteries. in USA average commute is about 20 miles one way. seems like a 75mile battery + gas is both more practical and requires less infra.

edit: it seems some are confused. I'm saying a PHEV is superior to BEV.

avsteele · 3 months ago

Many people drive places other (further) than work multiple times a year. "75 mile battery" wouldn't even be good enough for a one-way trip of this kind let alone there and back again.

avsteele commented on The Illusion of Thinking: Strengths and limitations of reasoning models [pdf] ml-site.cdn-apple.com/pap... · Posted by u/amrrs

lukev · 3 months ago

You can absolutely extrapolate the results, because what this shows is that even when "reasoning" these models are still fundamentally repeating in-sample patterns, and that they collapse when faced with novel reasoning tasks above a small complexity threshold.

That is not a model-specific claim, it's a claim on the nature of LLMs.

For your argument to be true would need to mean that there is a qualitative difference, in which some models possess "true reasoning" capability and some don't, and this test only happened to look at the latter.

avsteele · 3 months ago

The authors don't say anything like this that I can see. Their conclusion specifically identifies these as weaknesses of current frontier models.

Furthermore we have clearly seen increases in reasoning from previous frontier models to current frontier models.

If the authors could /did show that both previous-generation and current-generation frontier models hit a wall at similar complexity that would be something, AFAIK they do not.

avsteele commented on The Illusion of Thinking: Strengths and limitations of reasoning models [pdf] ml-site.cdn-apple.com/pap... · Posted by u/amrrs

avsteele · 3 months ago

People are drawing erroneous conclusions from this.

My read of this is that the paper demonstrates that given a particular model (and the problems examined with it) that giving more thought tokens does not help on problems above a certain complexity. It does not say anything about the capabilities of future, larger, models to handle more complex tasks. (NB: humans trend similarly)

My concern is that people are extrapolating from this to conclusions about LLM's generally, and this is not warranted

The only part about this i find even surprising is he abstract's conclusion (1): that 'thinking' can lead to worse outcomes for certain simple problem. (again though, maybe you can say humans are the same here. You can overthink things)

avsteele commented on The time bomb in the tax code that's fueling mass tech layoffs qz.com/tech-layoffs-tax-c... · Posted by u/booleanbetrayal

avsteele · 3 months ago

This is about way more than software. It's all R&D

It's effectively 6 years too. You only get to depreciate 10% in 1st year. This might have killed my company if it was around during first years.

See my comments on the previous discussion (Nov 2023) here: https://news.ycombinator.com/item?id=38145630

avsteele commented on The time bomb in the tax code that's fueling mass tech layoffs qz.com/tech-layoffs-tax-c... · Posted by u/booleanbetrayal

Terr_ · 3 months ago

That "suspends" should be understood as "continues to hold-hostage" / "renews as a time-bomb to screw over some other party".

avsteele · 3 months ago

That isn't the reason. They sunset in the bill so it has a lower CBO score (which calculates costs out to 10 years). If you sunset in the bill after 5 years, even if you know it will get renewed, the apparent cost goes down. Get it?

avsteele commented on Oh fuck! How do people feel about robots that leverage profanity? arxiv.org/abs/2505.05831... · Posted by u/rolph

zfnmxt · 3 months ago

Professionalism is not a virtue; measured irreverence is---an uncensored "Fuck" in this scenario falls into that category.

Silliness has an important and necessary place in research.

avsteele · 3 months ago

NOT in professional communication. If you want to run your lab that way, feel free.

avsteele commented on Oh fuck! How do people feel about robots that leverage profanity? arxiv.org/abs/2505.05831... · Posted by u/rolph

arp242 · 3 months ago

I've joined jobs and the first thing people said to me is "ah, you must be the new cunt!"

Different people have different standards for this type of thing. Be a good cunt and accept that there are over 8 billion people on the world, some of whom have very different norms than you have. Don't declare your own standards as somehow authoritative.

avsteele · 3 months ago

We have certain professional communications standards in the scientific community. This isn't a corner bar.