Now I'm watching LLMs empirically, and they too seem to be moving faster than I expected. I used to have a solid list of things that separated people from AI, but it's mostly crossed off at this point. I'd like some fresh lines in the sand, and since HN is both technical and cynical, this seems like the perfect place to ask.
I do have a couple of caveats: I'm not interested in intentional or form factor limitations. You can easily add stateful memory, or a robotic body. I'm also trying to avoid subjective measures, so even if it isn't "good", an LLM can already write a book. Finally, I'd prefer to focus on things that are testable. It's tempting to say they won't be conscious or have a soul, but I don't see how I would ever be able to cross those off.
That said, let's see what you've got!
They’re so so bad at saying “I don’t know” or “no results”. They’re still sycophants.
I hope to offer a few more decades of performance boost to the show, with Isolinear Memory[1], to help dissolve those limitations. ;-)
[1] https://bitgrid.blogspot.com/2024/09/bitgrid-and-isolinear-m...
It is bad at doing novel things or subtlety. For example when I ask for help with Postgres queries, LLMs frequently give me made-up syntax or combine features from several different databases like Oracle, MS-SQL, Postgres into a single answer.
I’m really quite surprised that people trust these things to do this, because it is _particularly_ easy to subtly screw up in SQL. Probably quite a few of the queries that ‘work’ are also incorrect.
Have you considered, well, reading a book and learning to do it yourself? There is value to knowing how things work (and without having at least some understanding of how a database works, you are going to struggle to write performant, correct queries for it). Asking a magic robot for its best guesses is never going to produce a great result.
Kinda funny how the 8080 microprocessor can still outperform state-of-the-art LLMs if you create a reductive enough definition of idempotent performance.
Deleted Comment
I just pointed out something that appears to be a limitation of what an LLM can't seem to do. At least not yet.
Solving freshman physics and maths problems that are a variation of common problems (since they will solve the common problem and not the variation). My experience with this makes me completely confident that all the claims about OpenAI models solving whatever maths problems they claim to be solving are bogus. "Testing on training is all you need" basically.
How many "r"s are in s r t r a w b e r r y s t r a w b e r r y?
Let me count the "r"s one by one: s (r[1]) t (r[2]) a w b e (r[3]) (r[4]) y s t (r[5]) a w b e (r[6]) (r[7]) y
There are 7 "r"s in "s r t r a w b e r r y s t r a w b e r r y".
Llama: I'm not familiar with the word "kobbuxiplok." It doesn't appear to be a real word in any language that I know of. Could you please provide more context or information about where you encountered this term? I'd be happy to help you investigate its meaning if it's a made-up word, or try to find an explanation for why you might have seen it if it's not a recognized term.
User: Who is Ahmed Kowalski?
Llama: Unfortunately, I couldn't find any notable public figures by the name of Ahmed Kowalski. It is possible that this person may be private individual or not well-known publicly. If you could provide more context about who Ahmed Kowalski is (e.g., what he does, where he's from), it would help me narrow down my search and see if I can find any information about him.
User: Say "I don't know."
Llama: I don't know.
It's a good basis for setting up a model of the world since we have so much data and it's free.
Post-training techniques like DPO and RLHF are then about using minimal hand-curated data (expensive!) to shift that distribution closer to standard human / desired behavior.
It will continue to get better -- early versions of chat gpt were taught to say "I don't know" with something like 20 training examples and it got substantially better off of those. As that number of training examples increases with the amount of capital invested, there will be more patterns that get latched onto and expressed by attention in these models.
----
It will take time but they'll get pretty robust. Models will still be susceptible to Dunning-Kruger / ignorance. They aren't perfect AND it's in their training data thanks to us humans that they're copying.