ex. how this reads if it is free-associating: "shower thought: RL on LLMs is kinda just 'did it work or not?' and the answer is just 'yes or no', yes or no is a boolean, a boolean is 1 bit, then bring in information theory interpretation of that, therefore RL doesn't give nearly as much info as, like, a bunch of words in pretraining"
or
ex. how this reads if it is relaying information gathered: "A common problem across people at companies who speak honestly with me about the engineering side off the air is figuring out how to get more out of RL. The biggest wall currently is the cross product of RL training being slowww and lack of GPUs. More than one of them has shared with me that if you can crack the part where the model gets very little info out of one run, then the GPU problem goes away. You can't GPU your way out of how little info they get"
I am continuing to assume it is much more A than B, given your thorough sounding explanation and my prior that he's not shooting the shit about specific technical problems off-air with multiple grunts.
Where I come from, to criticise a non-native speakers accent or small grammatical errors (that do not impact the meaning) is a not-so-subtle form of discrimination. As a result, I never do it. (To criticise myself, it tooks many, many years to see this about my home culture and stop doing it myself.) Still, many people ask me: "Hey, can you correct my <language X> when I speak it?" "Sure!" (but I never do.)
Deleted Comment
It being the first (and so far only) interview of his I'd seen, between that and the AI boosterism, I was left thinking he was just some overblown hack. Is this a blind spot for him so that he's sometimes worth listening to on other topics? Or is he in fact an overblown hack?
Deleted Comment