LLM Problems Observed in Humans

I've noticed that a lot of people most skeptical of AI coding tools are biased by their experience working exclusively at some of the top software engineering organizations in the world. As someone who has never worked at a company anywhere close to FAANG, I have worked with both people and organization's that are horrifyingly incompetent. A lot of software organization paradigms are designed to play defense against poorly written software.

I feel similar about self driving cars - they don't have to be perfect when half the people on the road are either high, watching reels while driving, or both.

chelmzy · a month ago

This has been my experience as well. I see very bright people lampooning LLMs because it doesn't perform up to their expectations when they are easily in the top 1% of talent in their field. I don't think they understand the cognitive load in your average F500 role is NOT very high. Most people are doing jack shit.

demorro · a month ago

Everyone is still holding out hope for a better future. LLM advocates making this argument are saying that the field can never improve, so might as well just let the mediocre machine run rampant.

Perhaps idealistic, perhaps unrealistic. I'd still rather believe.

eru · a month ago

> A lot of software organization paradigms are designed to play defense against poorly written software.

Maybe. But the organisations who would need the defense most are the some of the least likely to apply them.

Eg it was better run organisations that had version control early, and the worse ones persisted with using shared folders for longer.

And strong type systems like what Haskell or to a lesser extent Rust have to offer are useful as safeguards for anyone, but even more useful when your organisation and its members aren't all that great. Yet again, we see more capable organisations adopting these earlier.

theshrike79 · a month ago

Exactly, we are focusing on the absolute amount of crashes by "self driving" cars.

What we should focus is that are they more or less prone to accidents than actual humans based on amount of km driven.

Again, there are those Expert Drivers who love their manual transmission BMW because automatics shift in the wrong RPM range and abhor any kind of lane assist because it doesn't drive EXACTLY like they do.

But the vast majority of average people on the road will definitely get gains from lane assist and lane keeping functions in cars.

psunavy03 · a month ago

Few things enrage me like the smell of cannabis on the highway after it was legalized in my state. Sure, hypothetically, that's the passenger. But more likely than not, it's DUI.

macintux · a month ago

Sitting in a Jeep with no doors, no top, no windows has revealed to me just how common cannabis is in my state, even not yet legalized. Hate the smell.

codyb · a month ago

What, as opposed to the people on painkillers, xanax, caffeine, nicotine, and of course the actual worst... too little sleep, too much alcohol, and their phones.

bs7280 · a month ago

Off topic of my original comment but I live in Chicago and have seen some of the most batshit insane drivers / behavior on the road you could imagine. People smoking are often the least of my worries (not to say its ok).

While I haven't experienced LLMs correcting most (or any) of the problems listed fully and consistently, I do agree that consistent use of LLMs and dealing with their frustrations has worn my patience for conversations with people who exhibit the same issues when talking.

It's kind of depressing. I just want the LLM to be a bot that responds to what I say with a useful response. However, for some reason, both Gemini and ChatGPT tend to argue with me so heavily and inject their own weird stupid ideas on things making it even more grating to interact with them which chews away at my normal interpersonal patience which, as someone on the spectrum, was already limited.

rguzman · a month ago

> However, for some reason, both Gemini and ChatGPT tend to argue with me so heavily and inject their own weird stupid ideas on things

do you have examples of this?

asking because this is not what happens to me. one of the main things i worry about when interacting with the llm is that they agree with me too easily.

acedTrex · a month ago

This is why i simply do not bother with them unless the task i need is so specific that theres no room for argument, like yesterday i asked it to generate me a bash script that ran aws ssm commands for all the following instance IDs. It did that as a two shot.

But long conversations are never worth it.

bicepjai · a month ago

>>> … However, for some reason, both Gemini and ChatGPT tend to argue with me so heavily and inject their own weird stupid ideas on things …

This is something I have not experienced. Can you provide examples ?

agloe_dreams · a month ago

Yeah this is exactly opposite my issue with LLMs. They often take what you say as the truth when it absolutely could not be.

mikasisiki · a month ago

There was a period when coding agents would always agree with you, even if you gave them a really bad idea. They’d always start with something like, “You’re right — I should…”.

Back then, what we actually wanted was for them to push back and argue with us.

GuB-42 · a month ago

I have taken the stance to not argue with LLMs, don't give them any clues, and don't ask them to roleplay. Tell them no more than what they need to know.

And if they get the answer wrong, don't try to correct them or guide them, there is a high chance they don't have the answer and what follow will be hallucinations. You can ask for details, but don't try to go against it, it will just assume you are right (even if you are not) and hallucinate around that. Keep what you already know to yourself.

As for the "you are an expert" prompts, it will mostly just make the LLM speak more authoritatively, but it doesn't mean it will be more correct. My strategy is now to give the LLM as much freedom as it can get, it may not be the best way to extract all the knowledge it has, but it helps spot hallucinations.

You can argue with actual people, if both of you are open enough, something greater make come out of it, but if not, it is useless, and with LLMs it is always useless, they are pretrained, they won't get better in the future because that little conversation sparked their interest. And on your side, you will just have your own points rephrased and sent back to you, and that will just put you deeper in your own bubble.

eru · a month ago

What's the purpose of your stance? What are you trying to achieve?

okwhateverdude · a month ago

> However, for some reason, both Gemini and ChatGPT tend to argue with me

The trick here is: "Be succinct. No commentary."

And sometimes a healthy dose of expressing frustration or anger (cursing, berating, threatening) also gets them to STFU and do the thing. As in literally: "I don't give a fuck about your stupid fucking opinions on the matter. Do it exactly as I specified"

Also generally the very first time it expresses any of that weird shit, your context is toast. So even correcting it is reinforcing. Just regenerate the response.

CamperBob2 · a month ago

Last time I bawled out an LLM and forced it to change its mind, I later realized that the LLM was right the first time.

One of those "Who am I and how did I end up in this hole in the ground, and where did all these carrots and brightly-colored eggs come from?" moments, of the sort that seem to be coming more and more frequently lately.

empath75 · a month ago

> ChatGPT tend to argue with me so heavily

I have found that quite often when ChatGPT digs in on something, that it is in fact right, and I was the one that was wrong. Not always, maybe not even most of the time, but enough that it does give me pause and make me double check.

Also, when you have an LLM that is too agreeable, that is how it gets into a folie a deux situation and starts participating in user's delusions, with disastrous outcomes.

dns_snek · a month ago

> Also, when you have an LLM that is too agreeable...

It's not a question of whether an LLM should be agreeable or argumentative. It should aim to be correct - it should be agreeable about subjective details and matters of taste, it should be argumentative when the user is wrong about a matter of fact or made an error, and it should be inquisitive and capable of actually re-evaluating a stance in a coherent and logically sound manner when challenged by the user instead of either "digging in" or just blindly agreeing.

crazygringo · a month ago

This is my experience too. About 2/3 of the time my question/prompt contained ambiguity and it interpreted it differently (but validly), so it's just about misunderstanding, but maybe 1/3 of the time I'm surprised to discover something I didn't know. I double-check it on Wikipedia and a couple of other places and learn something new.

Deleted Comment

mrweasel · a month ago

An absolute enjoyable read. It also raises a good point, regarding the Turing test. I have a family member who teaches adults and as she pointed out: You won't believe how stupid some people are.

As critical as I might be of LLMs, I fear that they already outpaced a good portion of the population "intellectually". There's a lower level, which modern LLMs won't cross, in terms of lack of general knowledge or outright stupidity.

We may have reached a point where we can tell that we're talking to a human, because there's no way a computer would lack such basic knowledge or display similar levels of helplessness.

voxleone · a month ago

I sometimes feel a peculiar resonance with these models: they catch the faintest hints of irony and return astoundingly witty remarks, almost as if they were another version of myself. Yet all of the problems, inconsistencies, and surprises that arise in human thought stem from something profoundly differen, which is our embodied experience of the world. Humans integrate sensory feedback, form goals, navigate uncertainty, and make countless micro-decisions in real time, all while reasoning causally and contextually. Cognition is active, multimodal, and adaptive; it is not merely a reflection of prior experience a continual construction of understanding.

And then there are some brilliant friends of mine, people with whom a conversation can unfold for days, rewarding me with the same rapid, incisive exchange we now associate with language models. There is, clearly, an intellectual and environmental element to it.

skybrian · a month ago

Whenever we're testing LLM's against people we need to ask "which people?" Testing a chess bot against random undergrads versus chess grandmasters tells us different things.

From an economics perspective, maybe a relevant comparison is to people who do that task professionally.

TeodorDyakov · a month ago

Gdpeval

chankstein38 · a month ago

t8sr · a month ago

Something about the way the author expresses himself (big words, “I am so smart”, flowery filler) makes me unsurprised he finds it hard to have satisfying conversations with people. If he talked to me like this IRL I wouldn’t be trying to have a deep conversation either, I’d just be looking for the exit.

Lacking a theory of mind for other people is not a sign of superiority.

josecodea · 25 days ago

You are being too generous by saying that there are big words in the text. I find it blunt and uncouth. Actually, that's the problem that I see in the text, an attitude of pessimism and lack of self-reflection. An LLM would certainly give me something more interesting to read!

hug · a month ago

Jumping from "the author uses language I dislike" straight to "also, he has no theory of mind" is a bit of a leap. Like world record winning long jump kinda stuff.

Also, what big words? 'Proliferation'? 'Incoherent'? The whole article is written at a high school reading level. There's some embedded clauses in longer sentences, but we're not exactly slogging our way through Proust, here.

sph · a month ago

Sadly that’s a personality trait that’s far too common in the field, and it can get pretty annoying.

DenisM · a month ago

> When a model exhibits hallucination, often providing more context and evidence will dispel it,

I usually have the opposite experience. One a model goes off the rails it becomes harder and harder to steer and after a few corrective prompts they stop working and it’s time for a new context.

foobiekr · a month ago

Once it’s in the context window the model invariably steers crazy. Llms cannot handle the “don’t think of an elephant” requirement.

ACCount37 · a month ago

It depends.

It's a natural inclination for all LLMs, rooted in pre-training. But you can train them out of it some. Or not.

Google doesn't know how to do it to save their lives. Other frontier labs are better at it, but none are perfect as of yet.

jdauriemma · a month ago

The narrative structure of the article would be brilliant satire but I'm 90% certain that the author is serious about the conclusions they drew at the end, which I find sad.

jdthedisciple · a month ago

Yea I find it a bit condescending. Humans ain't robots, duh!

And the world wouldn't function if everyone operated at the exact same abstraction level of ideas.

The big difference is accountability. An LLM has no mortality; it has no use for fear, no embodied concept of reputation, no persistent values. Everything is ephemera. But they are useful! More useful than humans in some scenarios! So there's that. But when I consider the purpose of conversation, utility is only one consideration among many.

SubiculumCode · a month ago

Is it too late to call it confabulation rather than hallucination? Its such a more appropriate term for both LLM "hallucinations" with an entire scientific literature on it in humans.

bookofjoe · a month ago

Concur. Alas, that ship has sailed.

baq · a month ago

Never say never. It'd take a few tweets from Karpathy et al.

officehero · a month ago

Please define "confabulation" (for us stupid non-AI, non-native speakers)

https://en.wikipedia.org/wiki/Confabulation

egypturnash · a month ago

The best thing about a good deep conversation is when the other person gets you: you explain a complicated situation you find yourself in, and find some resonance in their replies. That, at least, is what happens when chatting with the recent large models. But when subjecting the limited human mind to the same prompt—a rather long one—again and again the information in the prompt somehow gets lost, their focus drifts away, and you have to repeat crucial facts. In such a case, my gut reaction is to see if there’s a way to pay to upgrade to a bigger model, only to remember that there’s no upgrading of the human brain.

Paying for someone to put some effort into giving a damn about what you have to say has a long history. Hire a therapist. Pay a teacher. Hire a hooker. Buy a round of drinks. Grow the really good weed and bring it to the party.

And maybe remember that other humans have their own needs and desires, and if you want them to put time and energy into giving a damn about your needs, then you need to reciprocate and spend time doing the same for them instead of treating them like a machine that exists only to serve you. This whole post is coming from a place of reducing every relationship to that and it's kind of disgusting.

It's sadly also an attitude I'm not surprised to see coming out of tech, given how many people don't seem to get that "I got into this field so I could interact with computers, not people" is supposed to be a joke.

vinceguidry · a month ago

Yeah, shared context over time is the answer to all these problems and has been for both history and prehistory. Patience appears to be the scarcest resource of all these days.

chambored · a month ago

This is exactly why I stopped reading after that section.