- They are not making progress, currently. The elephant-in-the-room problem of hallucinations is exactly the same or, as I said above, worse as it was 3 years ago
- It's clearly possible to solve this, since we humans exist and our brains don't have this problem
There's then two possible paths: Either the hallucinations are fundamental to the current architecture of LLMs, and there's some other aspect about the human brains configuration that they've yet to replicate. Or the hallucinations will go away with better and more training.
The latter seems to be the bet everyone is making, that's why there's all these data centers being built right? So, either larger training will solve the problem, and there's enough training data, silica molecules and electricity on earth to perform that "scale" of training.
There's 86B neurons in the human brain. Each one is a stand-alone living organism, like a biological microcontroller. It has constantly-mutating state, memory: short term through RNA and protein presence or lack thereof, long term through chromatin formation, enabling and disabling it's own DNA over time, in theory also permanent through DNA rewriting via TEs. Each one has a vast array of input modes - direct electrical stimulation, chemical signalling through a wide array of signaling molecules and electrical field effects from adjacent cells.
Meanwhile, GPT-4 has 1.1T floats. No billions of interacting microcontrollers, just static floating points describing a network topology.
The complexity of the neural networks that run our minds is spectacularly higher than the simulated neural networks we're training on silicon.
That's my personal bet. I think the 88B interconnected stateful microcontrollers is so much more capable than the 1T static floating points, and the 1T static floating points is already nearly impossibly expensive to run. So I'm bearish, but of course, I don't actually know. We will see. For now all I can conclude is the frontier model developers lie incessantly in every press release, just like their LLMs.
What exactly does specifically engine efficiency have to do with horse usage? Cars like the Ford Model T entered mass production somewhere around 1908. Oh, and would you look at the horse usage graph around that date! sigh
The chess ranking graph seems to be just a linear relationship?
> This pink line, back in 2024, was a large part of my job. Answer technical questions for new hires.
>
> Claude, meanwhile, was now answering 30,000 questions a month; eight times as many questions as me & mine ever did.
So more == better. sigh. Ran any, you know, studies to see the quality of those answers? I too can consult /dev/random for answers at a rate of gigabytes per second!
> I was one of the first researchers hired at Anthropic.
Yeah. I can tell. Somebody's high on their own supply here.
If anything the quality has gotten worse, because the models are now so good at lying when they don’t know it’s really hard to review. Is this a safe way to make that syscall? Is the lock structuring here really deadlock safe? The model will tell you with complete confidence its code is perfect, and it’ll either be right or lying, it never says “I don’t know”.
Every time OpenAI or Anthropic or Google announce a “stratospheric leap forward” and I go back and try and find it’s the same, I become more convinced that the lying is structural somehow, that the architecture they have is not fundamentally able to capture “I need to solve the problem I’m being asked to solve” instead of “I need to produce tokens that are likely to come after these other tokens”.
The tool is incredible, I use it constantly, but only for things where truth is irrelevant, or where I can easily verify the answer. So far I have found programming, other than trivial tasks and greenfield ”write some code that does x”, much faster without LLMs
I think "proprietary" is a better descriptor for Google Search's inner machinations, than "secret". The general concept of engineering a search crawler is well-trodden. Many companies have done it, there are open-source examples, and Google themselves have written blogs about their own.
It would probably be more apt to say, we know where the books came from and how they were acquired, we just don't necessarily know how the archive shelves in the basement are arranged and we don't know which employee is responsible for organizing them and we don't have the source code to the library's LMS. (All of which is true, by the way, for the LOC.) Proprietary, not secret.