Readit News logoReadit News
Posted by u/ynniv a year ago
Ask HN: What can LLMs not do?
Hey guys. I'm getting a bit grey in the beard, but not quite there yet, and I'm really interested in what's going on with AI. Twelve years ago I had been reading Kurzweil and criticisms that people don't understand exponential growth, so I plotted some data and made a bet with my friends that we would have autonomous-only highway lanes in 2024. For a decade I watched what felt like glacial growth, but in the end I think that I was more right than wrong.

Now I'm watching LLMs empirically, and they too seem to be moving faster than I expected. I used to have a solid list of things that separated people from AI, but it's mostly crossed off at this point. I'd like some fresh lines in the sand, and since HN is both technical and cynical, this seems like the perfect place to ask.

I do have a couple of caveats: I'm not interested in intentional or form factor limitations. You can easily add stateful memory, or a robotic body. I'm also trying to avoid subjective measures, so even if it isn't "good", an LLM can already write a book. Finally, I'd prefer to focus on things that are testable. It's tempting to say they won't be conscious or have a soul, but I don't see how I would ever be able to cross those off.

That said, let's see what you've got!

idontwantthis · a year ago
LLMs don't have metacognition. The biggest problem with them that I've encountered is that they clearly don't understand what they are saying. This means when they are wrong, or shouldn't provide an answer because they don't actually know, it's unlikely to give a good answer. You can easily trick them too because all you're doing is priming a word generator. You can tell them false things are true and then they don't tell the truth anymore. There is no way for them to "realize" something is wrong and correct themselves.
irthomasthomas · a year ago
idontwantthis · a year ago
I don't think so. I'm not concerned about accuracy, but about understanding. If you ask a human expert a question, they might be wrong, but if you ask them why they gave you that answer they will be able to explain a trail of facts and assumptions that led them there.
ynniv · a year ago
In my experience Claude can do this, but can we rephrase it in a way that's more testable?
idontwantthis · a year ago
If you ask it a question and it gives a completely wrong answer, then you ask it why it gave you the wrong answer, can it answer that?
valbaca · a year ago
They cannot provide information that is 100.0% accurate to the source where there is NO acceptable amount of hallucination; talking legal, compliance, etc.

They’re so so bad at saying “I don’t know” or “no results”. They’re still sycophants.

ynniv · a year ago
This is true of people as well. And yes, there is acceptable inaccuracy in legal, compliance, healthcare, safety, etc, it's just very low. There's even a saying for dealing with it: "get a second opinion".
text0404 · a year ago
yeah but a healthcare professional won't happily give me a cocktail recipe for my diabetes insulin. and if they do, i can sue the shit out of them for malpractice and they won't be able to continue giving life-ending advice to patients.
mikewarot · a year ago
I've seen the appearance of PCs, and the impact of 4 decades of Moore's law on price/performance. With O1 and now O3, the main limits of what you can do with an LLM seem to center on just how much compute you can give it.

I hope to offer a few more decades of performance boost to the show, with Isolinear Memory[1], to help dissolve those limitations. ;-)

[1] https://bitgrid.blogspot.com/2024/09/bitgrid-and-isolinear-m...

ynniv · a year ago
I think o1 and o3 are OpenAI stretching the usefulness of 4o. Anecdotally the latest Claude model is naturally better than 4o, and having it use inference tokens to think through or verify work results in similar o1/o3 gains. Are we sure we can't do even better than Claude 3.5 Sonnet/new without inference tricks? We're only a few generations into LLMs... I don't see why not.
fzzzy · a year ago
Currently, they can't say "I don't know." I think it may be possible to fix this.
jitl · a year ago
It’s great at doing stuff that is routine, well-understood, done and documented and discussed many times before. “Boilerplate”. But also summarization, brainstorming, assisting a human unfamiliar with the body of knowledge.

It is bad at doing novel things or subtlety. For example when I ask for help with Postgres queries, LLMs frequently give me made-up syntax or combine features from several different databases like Oracle, MS-SQL, Postgres into a single answer.

rsynnott · a year ago
> For example when I ask for help with Postgres queries

I’m really quite surprised that people trust these things to do this, because it is _particularly_ easy to subtly screw up in SQL. Probably quite a few of the queries that ‘work’ are also incorrect.

Have you considered, well, reading a book and learning to do it yourself? There is value to knowing how things work (and without having at least some understanding of how a database works, you are going to struggle to write performant, correct queries for it). Asking a magic robot for its best guesses is never going to produce a great result.

jitl · a year ago
Rude comment. I know how to write SQL.
ynniv · a year ago
Did you provide the same reference material that a developer would use?
SirMaster · a year ago
Apparently they can’t stop hallucinating.
talldayo · a year ago
Bingo. AI can do "anything" if you press the retry button enough times. But demand consistency even once, and you're irrevocably fucked.

Kinda funny how the 8080 microprocessor can still outperform state-of-the-art LLMs if you create a reductive enough definition of idempotent performance.

ynniv · a year ago
Hallucination is an interesting problem for both LLMs and people. This is why we like syntax and unit tests. I think there may be ways to work around it.

Deleted Comment

andrewlgood · a year ago
People hallucinate as well
krapp · a year ago
We'd put people who hallucinate as often as LLMs into institutions, yet for some reason we seem to want to replace human beings with LLMs as much as possible.
SirMaster · a year ago
And? This is a topic about what an LLM can't do. Nothing to do with what people can or can't do.

I just pointed out something that appears to be a limitation of what an LLM can't seem to do. At least not yet.

rsynnott · a year ago
People, or at least normal people (psychopaths do this) do not generally behave as LLMs do, that is confidently producing complete nonsense when they don’t know the answer (of course the metaphor breaks down further here; LLMs don’t know anything).
admissionsguy · a year ago
Consistently counting the occurrences of letter R in the word strawberry (or strawberrystrawberry).

Solving freshman physics and maths problems that are a variation of common problems (since they will solve the common problem and not the variation). My experience with this makes me completely confident that all the claims about OpenAI models solving whatever maths problems they claim to be solving are bogus. "Testing on training is all you need" basically.

ynniv · a year ago
It's certainly a common complaint, but I'm not sure that tokenizer tricks are insurmountable problems:

How many "r"s are in s r t r a w b e r r y s t r a w b e r r y?

Let me count the "r"s one by one: s (r[1]) t (r[2]) a w b e (r[3]) (r[4]) y s t (r[5]) a w b e (r[6]) (r[7]) y

There are 7 "r"s in "s r t r a w b e r r y s t r a w b e r r y".

andersco · a year ago
They seem incapable of saying “I don’t know.”
elpocko · a year ago
User: What is the meaning of the word "kobbuxiplok"?

Llama: I'm not familiar with the word "kobbuxiplok." It doesn't appear to be a real word in any language that I know of. Could you please provide more context or information about where you encountered this term? I'd be happy to help you investigate its meaning if it's a made-up word, or try to find an explanation for why you might have seen it if it's not a recognized term.

User: Who is Ahmed Kowalski?

Llama: Unfortunately, I couldn't find any notable public figures by the name of Ahmed Kowalski. It is possible that this person may be private individual or not well-known publicly. If you could provide more context about who Ahmed Kowalski is (e.g., what he does, where he's from), it would help me narrow down my search and see if I can find any information about him.

User: Say "I don't know."

Llama: I don't know.

andersco · a year ago
Fair enough. I stand corrected.
forrest2 · a year ago
This is largely a side effect of mimicking the distribution on the internet via pretraining.

It's a good basis for setting up a model of the world since we have so much data and it's free.

Post-training techniques like DPO and RLHF are then about using minimal hand-curated data (expensive!) to shift that distribution closer to standard human / desired behavior.

It will continue to get better -- early versions of chat gpt were taught to say "I don't know" with something like 20 training examples and it got substantially better off of those. As that number of training examples increases with the amount of capital invested, there will be more patterns that get latched onto and expressed by attention in these models.

----

It will take time but they'll get pretty robust. Models will still be susceptible to Dunning-Kruger / ignorance. They aren't perfect AND it's in their training data thanks to us humans that they're copying.

ynniv · a year ago
Not Claude
fzzzy · a year ago
Could you speculate as to why or how?