gwern (u/gwern) - Readit News

gwern commented on AGI is an engineering problem, not a model training problem vincirufus.com/posts/agi-... · Posted by u/vincirufus

Foreignborn · 18 hours ago

do you have a source?

when i’ve done toy demos where GPT5, sonnet 4 and gemini 2.5 pro critique/vote on various docs (eg PRDs) they did not choose their own material more often than not.

my setup wasn’t intended to benchmark though so could be wrong over enough iterations.

gwern · 18 hours ago

I don't have any particularly canonical reference I'd cite here, but self-preference bias in LLMs is well-established. (Just search Arxiv.)

gwern commented on AGI is an engineering problem, not a model training problem vincirufus.com/posts/agi-... · Posted by u/vincirufus

mdp2021 · 19 hours ago

> when I try the intro thesis paragraph on GPT-5-Pro, it dislikes it

I don't know about GPT-5-Pro, but LLMs can dislike their own output (when they work well...).

gwern · 19 hours ago

They can, but they are known to have a self-favoring bias, and in this case, the error is so easily identified that it raises the question of why GPT-5 would both come up with it & preserve it when it can so easily identify it; while if that was part of OP's original inputs (whatever those were) it is much less surprising (because it is a common human error and mindlessly parroted in a lot of the 'scaling has hit a wall' human journalism).

gwern commented on AGI is an engineering problem, not a model training problem vincirufus.com/posts/agi-... · Posted by u/vincirufus

mdp2021 · 19 hours ago

> this was written using

How do you know?

gwern · 19 hours ago

It is lacking in URLs or references. (The systematic error in the self-reference blog post URLs is also suspicious: outdated system prompt? If nothing else, shows the human involved is sloppy when every link is broken.) The assertions are broadly cliche and truisms, and the solutions are trendy buzzwords from a year ago or more (consistent with knowledge cutoffs and emphasizing mainstream sources/opinions). The tricolon and unordered bolded triplet lists are ChatGPT. The em dashes (which you should not need to be told about at this point) and it's-not-x-but-y formulation are extremely blatant, if not 4o-level, and lacking emoji or hyperbolic language; hence, it's probably GPT-5. (Sub-GPT-5 ChatGPTs would also generally balk at talking about a 'GPT-5' because they think it doesn't exist yet.) I don't know if it was 100% GPT-5-written, but I do note that when I try the intro thesis paragraph on GPT-5-Pro, it dislikes it, and identifies several stupid assertions (eg. the claim that power law scaling has now hit 'diminishing returns', which is meaningless because all log or power laws always have diminishing returns), so probably not completely-GPT-5-written (or least, sub-Pro).

gwern commented on AGI is an engineering problem, not a model training problem vincirufus.com/posts/agi-... · Posted by u/vincirufus

dcre · 20 hours ago

The first premise of the argument is that LLMs are plateauing in capability and this is obvious from using them. It is not obvious to me.

gwern · 20 hours ago

It is especially not obvious because this was written using ChatGPT-5. One appreciates the (deliberate?) irony, at least. (Or at least, surely if they had asymptoted, OP should've been able to write this upvoted HN article with an old GPT-4, say...)

gwern commented on Everything is correlated (2014–23) gwern.net/everything... · Posted by u/gmays

eru · 2 days ago

> If two things e.g. both change over time, they will be correlated.

No?

You can have two independent random walks. Eg flip a coin, gain a dollar or lose a dollar. Do that to times in parallel. Your two account balances will change over time, but they won't be correlated.

gwern · 2 days ago

https://gwern.net/doc/statistics/causality/1926-yule.pdf

gwern commented on Everything is correlated (2014–23) gwern.net/everything... · Posted by u/gmays

endymion-light · 2 days ago

the rest of the page has amazing design, but there's just something about the graphs switching from dark to light that flashbangs my eyes really badly - i think it's the sudden light!

gwern · 2 days ago

That is unintentional and a bug in the dark-mode.

For dark-mode, we rely on https://invertornot.com/ to dynamically decide whether to fade or negate/invert. (Background: https://gwern.net/invertornot ) The service uses a small NN and is not always correct, as in these cases. Sorry.

I have filed them as errors with InvertOrNot, and will manually set them to invert.

gwern commented on The Day Novartis Chose Discovery alexkesin.com/p/the-day-n... · Posted by u/quadrin

gwern · 16 days ago

The final section pounding the desk about how terrible ending the program was seems like it is oddly at variance with all the evidence OP had just laid out about how the program wasn't working well anymore and so wasn't actually financially a good idea. It's weird to quote a bunch of things like studies showing that 'internal R&D spending works worse than external for ROI' and then write a big moralizing sermonizing conclusion about how ending internal R&D is bad for profits and how terrible it is there's no 'patient capital' (capital which was plenty available before - what's the theory, investors stopped liking making money? insurance companies with century-long investment horizons ceased to exist? etc).

gwern commented on The Inkhaven Blogging Residency inkhaven.blog/... · Posted by u/venkii

velcrovan · 16 days ago

I’m still baffled that you thought (and still think) it was reasonable or desirable to swoop in and nitpick like this, in this exchange. Why would I have responded to this unsolicited criticism, in a context where I was simply sharing about a fun spare time project with someone else?

If you had butted in to a street conversation to tell me “by the way no one likes those silly glasses” I would have non-responded in the same way. But I would still be well within my rights to think you were kind of a jerk, and even to mention it to others.

What do you believe your role will be as an “advisor” at this residency?

gwern · 16 days ago

The mountain gave birth to a mouse.

gwern commented on The Inkhaven Blogging Residency inkhaven.blog/... · Posted by u/venkii

miohtama · 17 days ago

Writing advice from AI: $20/mo.

gwern · 17 days ago

Value of LLM writing advice: −$3,480; total value: −$3,500.

gwern commented on The Inkhaven Blogging Residency inkhaven.blog/... · Posted by u/venkii

velcrovan · 17 days ago

Since you offered: https://x.com/gwern/status/1235977354918404096 — though it seems like the intermediate commenter I mentioned has also either locked or deleted their tweets.

…But responding as though this is an attack to be countered and defeated further illustrates why I had doubts about your suitability as a creative advisor. It may be the only way you _know how_ to interpret any reference to yourself or anything anyone might compare to your work. It doesn't mean you're a bad person. You just may not have the tools to draw out the best of other people’s creative skill. And then again, maybe you do, and I just caught you on two separate really bad days five years apart.

gwern · 17 days ago

The initial tweet, 2020-03-05: https://x.com/joeld/status/1235652084264886272

> "I have been working on a reimagining of the blog idea for a few years, and it includes an idea (“series”) that is quite similar to blogchains. See this section of the design docs https://thelocalyarn.com/code/uv/scribbled/Basic_Notions.htm... and partial screen shot. It’s almost ready!"

My original response, in full:

> "One thought on the docs: if there's always a well-defined 'next', why not overload Space/PgDwn to proceed to the next node, GNU Info-like? At the very least, there should be a 'next' link at the bottom, not solely hidden away at the top.

> (Also, no one likes those silly 'st' ligatures.)

> As far as the current theyarn design goes: I like the use of typographic ornaments as a theme, but the colors are confusing. (What does orange vs red vs green denote?) And the pilcrow? Sometimes it's at the beginning of articles (redundant with the hr), sometimes not?

> Hrs seem overused in general, like the (busy) footer. Appending notes chronologically is interesting but confusing, both date/where they begin/end. Are the caps deliberately not small caps? Full-width images would be useful for photos. Be interested to see the new one finished."

I consider these criticisms reasonable, accurate, and constructive, milquetoast even, and stand by them. I see no difference from the many other site critiques I have made over the years (eg https://www.lesswrong.com/posts/Nq2BtFidsnhfLuNAx/announcing... ), which are usually received positively, and I think this is a 'you' problem, especially after reading your other comments. And I will point out that you made no reply to my many concrete points until you decided to write this HN comment 5 years later.

(I have taken the liberty of adding a link to your top-level comment to the end of the existing thread, for context/updating.)

> But responding as though this is an attack to be countered and defeated further illustrates why I had doubts about your suitability as a creative advisor.

This is a remarkable way to characterize this conversation.