If you want to pretend that being a 3 year old is not a transient state, and that controlling an AI is just like parenting an eternal 3 year old, there's probably a manga about that.
Julian Bashir: "It's not your fault things are the way they are."
Lee: "Everybody tells themselves that. And nothing ever changes."
Bashir's statement is a true one, made out of compassion. Lee's statement is almost comically/logically/obviously false in a real Universe, unless you're in a fictional TV series designed to not ever change. Except of course for the plot arc, which again, true to Bashir's statement is not anyone's fault.
I'm assuming you meant to ask about people who haven't _learned_ to read or write, but would otherwise be capable.
Is your argument then, that a person who hasn't learned to read or write is able to model language as accurately as one who did?
Wouldn't you say that someone who has read a whole ton of books would maybe be a bit better at language modelling?
Also, perhaps most importantly: GPT (and pretty much any LLM I've talked to) does know the alphabet and its rules. It knows. Ask it to recite the alphabet. Ask it about any kind of grammatical or lexical rules. It knows all of it. It can also chop up a word from tokens into letters to spell it correctly, it knows those rules too. Now ask it about Chinese and Japanese characters, ask it any of the rules related to those alphabets and languages. It knows all the rules.
This to me shows the problem is that it's mainly incapable of reasoning and putting things together logically, not so much that it's trained on something that doesn't _quite_ look like letters as we know them. Sure it might be slightly harder to do, but it's not actually hard, especially not compared to the other things we expect LLMs to be good at. But especially especially not compared to the other things we expect people to be good at if they are considered "language experts".
If (smart/dedicated) humans can easily learn the Chinese, Japanese, Latin and Russian alphabets, then why can't LLMs learn how tokens relate to the Latin alphabet?
Remember that tokens were specifically designed to be easier and more regular to parse (encode/decode) than the encodings used in human languages ...
LLMs don't see letters, they see tokens. This is a foundational attribute of LLMs. When you point out that the LLM does not know the number of R's in the word "Strawberry", you are not exposing the LLM as some kind of sham, you're just admitting to being a fool.
And the token/strawberry thing is a non-excuse. They just can't count. I can count the number of syllables in a word, regardless of how it's spelled, that's also not based on letters. Or if you want a sub-letter equivalent, I could also count the number of serifs, dots or curves in a word.
It's really not so much that the strawberry thing is a "gotcha", or easily explained by "they see tokens instead", because the same reasoning errors happen all the time in LLMs also in places where "it's because of tokens" can't possibly be the explanation. It's just that the strawberry thing is one of the easiest ways to show it just can't reason reliably.
[1] http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Even if his broader point might be valid (about the most fruitful directions in ML), calling something a "bitter lesson" while insulting a whole field of science is ... something.
Also as someone involved in early RL, he should know better.
"below us"? speak for yourself, because that's supremacist's reasoning.
Uhhh, no?
In the past month we've had:
- LLMs (3 different models) getting gold at IMO
- gold at IoI
- beat 9/10 human developers at atcode heuristics (optimisations problems) with the single human that actually beat the machine saying he was exhausted and next year it'll probably be over.
- agentic that actually works. And works for 30-90 minute sessions while staying coherent and actually finishing tasks.
- 4-6x reduction in price for top tier (SotA?) models. oAI's "best" model now costs 10$/MTok, while retaining 90+% of their previous SotA models that were 40-60$/MTok.
- several "harnesses" being released by every model provider. Claude code seems to remain the best, but alternatives are popping off everywhere - geminicli, opencoder, qwencli (forked, but still), etc.
- opensource models that are getting close to SotA, again. Being 6-12months behind (depending on who you ask), opensource and cheap to run (~2$/MTok on some providers).
I don't see the plateauing in capabilities. LLMs are plateauing only in benchmarks, where number goes up can only go up so far until it becomes useless. IMO regular benchmarks have become useless. MMLU & co are cute, but agentic whatever is what matters. And those capabilities have only improved. And will continue to improve, with better data, better signals, better training recipes.
Why do you think eveyr model provider is heavily subsidising coding right now? They all want that sweet sweet data & signals, so they can improve their models.
Don't you mean the opposite? Like, it beat an IMO, which is a benchmark, but it's nowhere remotely close to having any of even the basic mathematical capabilities someone who beat an IMO can be expected to have.
Like being unable to deal with negations ... or not getting confused by a question being stated in something other than their native alphabet ...