helloplanets (u/helloplanets)

helloplanets commented on LLMs aren't world models yosefk.com/blog/llms-aren... · Posted by u/ingve

skeledrew · 15 days ago

Agree in general with most of the points, except

> but because I know you and I get by with less.

Actually we got far more data and training than any LLM. We've been gathering and processing sensory data every second at least since birth (more processing than gathering when asleep), and are only really considered fully intelligent in our late teens to mid-20s.

helloplanets · 13 days ago

Don't forget the millions of years of pre-training! ;)

helloplanets commented on Perplexity is using stealth, undeclared crawlers to evade no-crawl directives blog.cloudflare.com/perpl... · Posted by u/rrampage

vineyardmike · 21 days ago

> You can't micro-transact the whole internet.

I agree that end-users cannot handle micro transactions across the whole internet. That said, I would like to point out that most of the internet is blanketed in ads and ads involve tons of tiny quick auctions and micro transactions that occur on each page load.

It is totally possible for a system to evolve involving tons of tiny transactions across page loads.

helloplanets · 21 days ago

You could argue that the suggested system is actually much simpler than the one we currently have for the sites that are "free", aka funded with ads.

The lengths Meta and the like go to in order to maximize clickthroughs...

helloplanets commented on I know when you're vibe coding alexkondov.com/i-know-whe... · Posted by u/thunderbong

arduanika · 25 days ago

I believe you that these tools help a lot, but they would not prevent ~any of the examples listed in the article (under "The smell of vibe coding").

helloplanets · 25 days ago

Most of those look like context issues to me. Repo map (using Tree-sitter, etc) and documentation would already do wonders. Feeding 32-64kTok of context directly into a model like Gemini Pro 2.5 is something that more people should try out in situations like this. Or even 128kTok+.

helloplanets commented on Gemini with Deep Think achieves gold-medal standard at the IMO deepmind.google/discover/... · Posted by u/meetpateltech

erichocean · a month ago

I regularly have the opposite experience: o3 is almost unusable, and Gemini 2.5 Pro is reliably great. Claude Opus 4 is a close second.

o3 is so bad it makes me wonder if I'm being served a different model? My o3 responses are so truncated and simplified as to be useless. Maybe my problems aren't a good fit, but whatever it is: o3 output isn't useful.

helloplanets · a month ago

Are you using a tool other than ChatGPT? If so, check the full prompt that's being sent. It can sometimes kneecap the model.

Tools having slightly unsuitable built in prompts/context sometimes lead to the models saying weird stuff out of the blue, instead of it actually being a 'baked in' behavior of the model itself. Seen this happen for both Gemini 2.5 Pro and o3.

helloplanets commented on OpenAI claims gold-medal performance at IMO 2025 twitter.com/alexwei_/stat... · Posted by u/Davidzheng

blibble · a month ago

it was widely covered in the press earlier in the year

helloplanets · a month ago

Source?

helloplanets commented on How I keep up with AI progress blog.nilenso.com/blog/202... · Posted by u/itzlambda

lsy · a month ago

If you have a decent understanding of how LLMs work (you put in basically every piece of text you can find, get a statistical machine that models text really well, then use contractors to train it to model text in conversational form), then you probably don't need to consume a big diet of ongoing output from PR people, bloggers, thought leaders, and internet rationalists. That seems likely to get you going down some millenarian path that's not helpful.

Despite the feeling that it's a fast-moving field, most of the differences in actual models over the last years are in degree and not kind, and the majority of ongoing work is in tooling and integrations, which you can probably keep up with as it seems useful for your work. Remembering that it's a model of text and is ungrounded goes a long way to discerning what kinds of work it's useful for (where verification of output is either straightforward or unnecessary), and what kinds of work it's not useful for.

helloplanets · a month ago

It's not a model of text, though. It's a model of multiple types of data. Pretty much all modern models are multimodal.