tripzilch (u/tripzilch)

tripzilch commented on Why LLMs can't really build software zed.dev/blog/why-llms-can... · Posted by u/srid

> but LLMs have really started to plateau off on their capabilities haven’t they?

Uhhh, no?

In the past month we've had:

- LLMs (3 different models) getting gold at IMO

- gold at IoI

- beat 9/10 human developers at atcode heuristics (optimisations problems) with the single human that actually beat the machine saying he was exhausted and next year it'll probably be over.

- agentic that actually works. And works for 30-90 minute sessions while staying coherent and actually finishing tasks.

- 4-6x reduction in price for top tier (SotA?) models. oAI's "best" model now costs 10$/MTok, while retaining 90+% of their previous SotA models that were 40-60$/MTok.

- several "harnesses" being released by every model provider. Claude code seems to remain the best, but alternatives are popping off everywhere - geminicli, opencoder, qwencli (forked, but still), etc.

- opensource models that are getting close to SotA, again. Being 6-12months behind (depending on who you ask), opensource and cheap to run (~2$/MTok on some providers).

I don't see the plateauing in capabilities. LLMs are plateauing only in benchmarks, where number goes up can only go up so far until it becomes useless. IMO regular benchmarks have become useless. MMLU & co are cute, but agentic whatever is what matters. And those capabilities have only improved. And will continue to improve, with better data, better signals, better training recipes.

Why do you think eveyr model provider is heavily subsidising coding right now? They all want that sweet sweet data & signals, so they can improve their models.

tripzilch · 14 days ago

> I don't see the plateauing in capabilities. LLMs are plateauing only in benchmarks

Don't you mean the opposite? Like, it beat an IMO, which is a benchmark, but it's nowhere remotely close to having any of even the basic mathematical capabilities someone who beat an IMO can be expected to have.

Like being unable to deal with negations ... or not getting confused by a question being stated in something other than their native alphabet ...

tripzilch commented on Why LLMs can't really build software zed.dev/blog/why-llms-can... · Posted by u/srid

andrewmutz · 17 days ago

I agree that it is probably easier for an LLM to write good code in any framework (like Rails) that has a lot of well-documented opinions about how things should be done. If there is a "right" place to put things, or a "right" way to model problems in a framework, its more likely that the model's opinions are going to line up with the human engineer's opinions.

tripzilch · 14 days ago

But isn't it crazy that while it's been impressively great at translating between human languages from the start, it's incapable of translating these well-documented best-ways-to-do-it things across domains or even programming languages.

tripzilch commented on Why LLMs can't really build software zed.dev/blog/why-llms-can... · Posted by u/srid

IshKebab · 16 days ago

Nobody said you did. I'm talking about the confidently incorrect assertions that humans would never display any of these unreliable behaviours.

tripzilch · 14 days ago

They don't. At least not for the duration that LLMs keep it up. They really don't.

If you want to pretend that being a 3 year old is not a transient state, and that controlling an AI is just like parenting an eternal 3 year old, there's probably a manga about that.

tripzilch commented on Why LLMs can't really build software zed.dev/blog/why-llms-can... · Posted by u/srid

IshKebab · 17 days ago

Yeah it's also kind of funny people discovering all the LLM failure modes and saying "see! humans would never do that! it's not really intelligent!". None of those people have children...

tripzilch · 14 days ago

Maybe because none of those people are imagining children to be eternally stuck at that level of intelligence. At that age (regardless of being a parent or not) you can literally see them getting smarter over the course of weeks or months.

tripzilch commented on VC-backed company just killed my EU trademark for a small OSS project · Posted by u/marcjschmidt

hypercube33 · 17 days ago

Deep Space Nine:

Julian Bashir: "It's not your fault things are the way they are."

Lee: "Everybody tells themselves that. And nothing ever changes."

tripzilch · 15 days ago

I mean, DS9 was literally a TV serial, of course nothing ever changes.

Bashir's statement is a true one, made out of compassion. Lee's statement is almost comically/logically/obviously false in a real Universe, unless you're in a fictional TV series designed to not ever change. Except of course for the plot arc, which again, true to Bashir's statement is not anyone's fault.

tripzilch commented on GPT-5: Key characteristics, pricing and system card simonwillison.net/2025/Au... · Posted by u/Philpax

awestroke · 23 days ago

So people that can't read or write have no language? If you don't know an alphabet and its rules, you won't know how many letters are in words. Does that make you unable to model language accurately?

tripzilch · 20 days ago

So first of, people who _can't_ read or write have a certain disability (blindness or developmental, etc). That's not a reasonable comparison for LLMs/AI (especially since text is the main modality of an LLM).

I'm assuming you meant to ask about people who haven't _learned_ to read or write, but would otherwise be capable.

Is your argument then, that a person who hasn't learned to read or write is able to model language as accurately as one who did?

Wouldn't you say that someone who has read a whole ton of books would maybe be a bit better at language modelling?

Also, perhaps most importantly: GPT (and pretty much any LLM I've talked to) does know the alphabet and its rules. It knows. Ask it to recite the alphabet. Ask it about any kind of grammatical or lexical rules. It knows all of it. It can also chop up a word from tokens into letters to spell it correctly, it knows those rules too. Now ask it about Chinese and Japanese characters, ask it any of the rules related to those alphabets and languages. It knows all the rules.

This to me shows the problem is that it's mainly incapable of reasoning and putting things together logically, not so much that it's trained on something that doesn't _quite_ look like letters as we know them. Sure it might be slightly harder to do, but it's not actually hard, especially not compared to the other things we expect LLMs to be good at. But especially especially not compared to the other things we expect people to be good at if they are considered "language experts".

If (smart/dedicated) humans can easily learn the Chinese, Japanese, Latin and Russian alphabets, then why can't LLMs learn how tokens relate to the Latin alphabet?

Remember that tokens were specifically designed to be easier and more regular to parse (encode/decode) than the encodings used in human languages ...

tripzilch commented on GPT-5: Key characteristics, pricing and system card simonwillison.net/2025/Au... · Posted by u/Philpax

awestroke · 23 days ago

I'll show you a few misspelled words and you tell me (without using any tools or thinking it through) which bits in the utf8 encoded bytes are incorrect. If you're wrong, I'll conclude you are not intelligent.

LLMs don't see letters, they see tokens. This is a foundational attribute of LLMs. When you point out that the LLM does not know the number of R's in the word "Strawberry", you are not exposing the LLM as some kind of sham, you're just admitting to being a fool.

tripzilch · 20 days ago

If I had learned to read utf8 bytes instead of Latin alphabet, this would be trivial. In fact give me a (paid) week to study utf8 for reading and I am sure I could do it. (yes I already know how utf8 works)

And the token/strawberry thing is a non-excuse. They just can't count. I can count the number of syllables in a word, regardless of how it's spelled, that's also not based on letters. Or if you want a sub-letter equivalent, I could also count the number of serifs, dots or curves in a word.

It's really not so much that the strawberry thing is a "gotcha", or easily explained by "they see tokens instead", because the same reasoning errors happen all the time in LLMs also in places where "it's because of tokens" can't possibly be the explanation. It's just that the strawberry thing is one of the easiest ways to show it just can't reason reliably.

tripzilch commented on Genie 3: A new frontier for world models deepmind.google/discover/... · Posted by u/bradleyg223

jonas21 · a month ago

When they say "the start", I think they mean the start of the current LLM era (circa 2017). The main story of this time has been a rejection of the idea that major conceptual breakthroughs and complex architectures are needed to achieve intelligence. Instead, it's better to focus on simple, general-purpose methods that can scale to massive amounts of data and compute (i.e. the Bitter Lesson [1]).

[1] http://www.incompleteideas.net/IncIdeas/BitterLesson.html

tripzilch · a month ago

Oof ... to call other people's decades of research into directed machine learning "a colossal waste of researcher's time" is indeed a rather toxic point of view unsurprisingly causing a bitter reaction in scientists/researchers.

Even if his broader point might be valid (about the most fruitful directions in ML), calling something a "bitter lesson" while insulting a whole field of science is ... something.

Also as someone involved in early RL, he should know better.

tripzilch commented on Staying cool without refrigerants: Next-generation Peltier cooling news.samsung.com/global/i... · Posted by u/simonebrunozzi

Fokamul · a month ago

AI hi guys, pretty great AI article about this AI Peltier AI cooling AI tech. Really looking AI forward for another AI Samsung new AI devices.

Best regards,

AI

Ps.: AI

tripzilch · a month ago

I heard it was trained on stolen refrigerators that were for research purposes only!

tripzilch commented on So you think you've awoken ChatGPT lesswrong.com/posts/2pkNC... · Posted by u/firloop

Workaccount2 · a month ago

Because survival is a universal trait and humans aren't necessary to keep an AI "alive". There is no reason an AI wouldn't treat humans the same way we treat everything living below us. Our needs are priority and it would be wise to assume the AI will also give itself priority.

tripzilch · a month ago

> treat humans the same way we treat everything living below us.

"below us"? speak for yourself, because that's supremacist's reasoning.