It is scientific malpractice to write a post supposedly rebutting responses to a paper and not directly address the most salient one.
It is scientific malpractice to write a post supposedly rebutting responses to a paper and not directly address the most salient one.
edit: it seems some are confused. I'm saying a PHEV is superior to BEV.
That is not a model-specific claim, it's a claim on the nature of LLMs.
For your argument to be true would need to mean that there is a qualitative difference, in which some models possess "true reasoning" capability and some don't, and this test only happened to look at the latter.
Furthermore we have clearly seen increases in reasoning from previous frontier models to current frontier models.
If the authors could /did show that both previous-generation and current-generation frontier models hit a wall at similar complexity that would be something, AFAIK they do not.
My read of this is that the paper demonstrates that given a particular model (and the problems examined with it) that giving more thought tokens does not help on problems above a certain complexity. It does not say anything about the capabilities of future, larger, models to handle more complex tasks. (NB: humans trend similarly)
My concern is that people are extrapolating from this to conclusions about LLM's generally, and this is not warranted
The only part about this i find even surprising is he abstract's conclusion (1): that 'thinking' can lead to worse outcomes for certain simple problem. (again though, maybe you can say humans are the same here. You can overthink things)
It's effectively 6 years too. You only get to depreciate 10% in 1st year. This might have killed my company if it was around during first years.
See my comments on the previous discussion (Nov 2023) here: https://news.ycombinator.com/item?id=38145630
Silliness has an important and necessary place in research.
Different people have different standards for this type of thing. Be a good cunt and accept that there are over 8 billion people on the world, some of whom have very different norms than you have. Don't declare your own standards as somehow authoritative.
I don’t think I agree with you that GM isn’t addressing the points in the paper you link. But in any case, you’re not doing your argument any favors by throwing in wild accusations of malpractice.
But anybody relying on Gary's posts in order to be be informed on this subject is being being mislead. This isn't an isolated incident either.
People need to be made be aware when you read him it is mere punditry, not substantive engagement with the literature.