Readit News logoReadit News
Posted by u/arduinomancer a year ago
Ask HN: Why does no one seem to care that AI gives wrong answers?
If you had a piece of code or software that sometimes produces totally wrong output we would consider that a bug.

Yet it seems like with AI all the investors/founders/PMs don’t really care and just ship a broken product anyway

I feel like I’m going crazy seeing all the AI stuff ship in products that gives straight up wrong outputs

It’s like a big collective delusion where we just ignore it or hand wave that it’ll get fixed eventually magically

0x00_NULL · a year ago
My graduate research was in this area. My lab group developed swarm robots for various terrestrial and space exploration tasks. I spent a lot of time probing why our swarm robots developed pathological behavioral breakdowns - running away from construction projects, burying each other, etc... The issue was so fundamental to our machine learning methods that we never found a way to reliably address it—by the time I left, anyway. No matter how we reconfigured the neural networks, trained, punished, deprived, or implemented forced forgetting or fine-tuned, nothing seemed to eliminate the catastrophic behavioral edge cases—nothing except for dramatically simplifying the neural networks.

Once I started seeing these behaviors in our robots, their appearance became much more pronounced every time I dug deeply into proposed ML systems: autonomous vehicles, robotic assistants, chatbots, and LLMs.

As I've had time to reflect on our challenges, I think that neural networks very quickly tend to overfit, and deep neural networks are incomparably overfitted. That condition makes them sensitive to hidden attractors that cause the system to break down when it is near these areas - catastrophically.

How do we define "near"? That would have to be determined using some topological method. But these systems are so complicated that we can't analyze their networks' topology or even brute-force probe their activations. Further, the larger, deeper, and more highly connected the network, the more challenging these hidden attractors are to find.

I was bothered by this topic a decade ago, and nothing I have seen today has alleviated my concern. We are building larger, deeper, and more connected networks on the premise that we'll eventually get to a state so unimaginably overfitted that it becomes stable again. I am unnerved by this idea and by the amount of money flowing in that direction with reckless abandon.

1vuio0pswjnm7 · a year ago
I believe I saw this research featured in a documentary or some other film. Am I remebering incorrrectly.
0x00_NULL · a year ago
I’m not aware of any specific films we were in. We filmed a lot of our robotics trials for various collaborations, but no documentaries while I was there. Shortly after I left, my team got some acclaim for their Lunar Ark project (which is really cool). But, I had been out for a couple of years by that point. If they filmed a documentary, it likely would have been for that project.
chankstein38 · a year ago
Personally, I and people I've spoken with use LLMs less and less because of how often they're wrong. The other day I asked ChatGPT about a specific built-in method in Java and it told me that it couldn't do one specific thing. I was already using it in that context so I pushed back and it said "Oh yeah you're right, sorry"

I feel like I can't trust anything it says. Mostly I use it to parse things I don't understand and then do my own verification that it's correct.

All that to say, from my perspective, they're losing some small amount of ground. The other side is that the big corps that run them don't want their golden gooses to be cooked. So they keep pushing them and shoving them into everything unnecessarily and we just have to eat it.

So I think it's a perception thing. The corps want us to think it's super useful so it continues to give them record profits. While the rest of us are slowly waking up to how useless they are if they will confidently tell us incorrect answers and are moving away from it.

So you may just be seeing sleezy marketing at work here.

a3n · a year ago
> I was already using it in that context so I pushed back and it said "Oh yeah you're right, sorry"

Same thing happened to me. I asked for all the Ukrainian noun cases, it listed and described six.

I responded that there are seven. "Oh, right." It then named and described the seventh.

That's no better than me taking an exam, so why should I rely on it, or use it at all?

paulmd · a year ago
If you find it absolutely necessary to only work with coworkers who are incapable of making mistakes, I assume that you probably work alone?
stoperaticless · a year ago
Hype is fading, so usage decreases.

But you must admit that it is still useful, and usage will not drop to zero.

yaj54 · a year ago
LLMs would be better nomenclature than AI in this context.

LLMs are not factual databases. They are not trained to retrieve or produce factual statements.

LLMs give you the most likely word after some prior words. They are incredibly accurate at estimating the probabilities of the next word.

It is a weird accident that you can use auto-regressive next word prediction to make a chat bot. It's even weirder that you can ask the chatbot questions and give it requests and it appears to produce coherent answers and responses.

LLMs are best thought of as language generators (or "writers") not as repositories of knowledge and facts.

LLM chatbots were a happy and fascinating (and for some, very helpful) accident. But they were not designed to be "factually correct" they were designed to predict words.

People don't care about (or are willing to accept) the "wrong answers" because there are enough use cases for "writing" that don't require factual accuracy. (see for instance, the entire genre of fiction writing)

I would argue that it is precisely LLMs ability to escape the strict accuracy requirements of the rest of CS and just write/hallucinate some fiction that is actually what makes this tech fascinating and uniquely novel.

chrisjj · a year ago
> LLM chatbots ... were not designed to be "factually correct" they were designed to predict words.

For this question, what LLMs were designed for is I think less relevent than what they are advertised for, e.g.

"Get answers. Find inspiration. Be more productive. Free to use. Easy to try. Just ask and ChatGPT can help with writing, learning, brainstorming, and more." https://openai.com/chatgpt/

No mention of predicting words.

mmusson · a year ago
The thing I find fascinating is that apparently there is a chunk of behavior that we might define as “intelligent” on some level that seems directly encoded in language itself.
yaj54 · a year ago
I completely agree. As language is the preferred encoding method for intelligent thought (at least in our species) it could very well be that a sufficiently accurate language model is also a generally intelligent model.
tivert · a year ago
> LLMs are best thought of as language generators (or "writers") not as repositories of knowledge and facts.

And the utility of a "language generator" without reliable knowledge or facts is extremely limited. The technical term for that kind of language is bullshit.

> People don't care about (or are willing to accept) the "wrong answers" because there are enough use cases for "writing" that don't require factual accuracy. (see for instance, the entire genre of fiction writing)

Fiction, or at least good fiction, requires factual accuracy, just not the kind of factual accuracy you recalling stuff from an encyclopedia. For instance: factual accuracy about what it was like to live in the world in a certain time or place, so you can create a believable setting; or about human psychology, so you can create believable characters.

yaj54 · a year ago
I'd argue that what you're talking about in fiction is coherence (internal consistency) not factual accuracy (consistency with an externally verifiably ground truth).

I'd also argue that the economic value of coherent bullshit is ... quite high. Many people have made careers out of producing coherent bullshit (some even with incoherent bullshit :-).

Of course, in the long run, factual accuracy has more economic value than bullshit.

rossdavidh · a year ago
So, if you dealt with a person who knew all the vocabulary related to a field, and could make well-constructed sentences about that field, and sounded confident, it would almost always mean they had spent a lot of time studying that field. That tends to mean that, although they may occasionally make a mistake, they will usually be correct. People apply the same intuition to LLMs, and because it's not a person (and it's not intelligent), this intuition is way off.

There is, additionally, the fact that there is no easy (or even medium difficult) way to fix this aspect of LLM's, and it means that the choices are either: 1) ship it now anyway and hope people pay for it regardless 2) admit that this is a niche product, useful in certain situations but not for most

Option 1 means you get a lot of money (at least for a little while). Option 2 doesn't.

0x00_NULL · a year ago
Yep. I think this is right on. The anthropomorphization in their behavior and problem descriptions is flawed.

It's precisely that analogy we learned early in our study of neural networks: the layers analyze the curves, straight segments, edges, size, shape, etc. But, when we look at the activation patterns, we see they are not doing anything remotely like that. They look like stochastic correlations, and the activation pattern was almost entirely random.

The same thing is happening here, but at incomprehensible scales and with fortunes being sunk into hope.

rsynnott · a year ago
Even with a human speaker, that's not a totally safe assumption, and a certain type of fraudster relies heavily on people making this type of assumption (the likes of L Ron Hubbard in particular liked to use the language of expertise for various fields in a nonsensical way, and this is extremely convincing to a certain sort of person). But LLMs might almost have been designed to exploit this particular cognitive bias; there's really significant danger here.
rossdavidh · a year ago
Agreed. The optimistic scenario is that being exposed to so many "hallucinating" LLM's, people will become better at spotting the same thing in humans. But, I admit freely that is just the optimistic scenario.
stavros · a year ago
I find that "intelligence" was a shaky concept to begin with, but LLMs have completely thrown the idea in the trash. When someone says "LLMs are not intelligent", I treat that as a bit of a signal that I shouldn't pay much attention to their other points, because if you haven't realized that you don't have a good definition for intelligence, what else haven't you realized?
Jensson · a year ago
> When someone says "LLMs are not intelligent", I treat that as a bit of a signal that I shouldn't pay much attention to their other points, because if you haven't realized that you don't have a good definition for intelligence, what else haven't you realized?

So you have a good definition for "intelligent", and it applies to LLM? Please tell us! And explain how that definition is so infallible that you know that everyone who says LLM aren't intelligent are wrong?

rossdavidh · a year ago
I'm sure there's plenty I haven't realized, but the reason it's worth pointing out that LLM's are not intelligent, is that their boosters routinely refer to them as "AI", and the "I" in there stands for "intelligence", so pointing out that the label applied is not accurate, is important.
taylodl · a year ago
I haven't found a human that answers every single question correctly, either. You know whom to ask a question based off that person's domain of expertise. Well, AI's domain of expertise is everything (supposedly).

What gets difficult is evaluating the response, but let's not pretend that's any easier to do when interacting with a human. Experts give wrong answers all the time. It's generally other experts who point out wrong answers provided by one of their peers.

My solution? Query multiple LLMs. I'd like to have three so I can establish a quorum on an answer, but I only have two. If they agree then I'm reasonably confident the answer is correct. If they don't agree - well, that's where some digging is required.

To your point, nobody is expecting these systems to be infallible because I think we intuitively understand that nothing knows everything. Wouldn't be surprised if someone wrote a paper on this very topic.

danrob · a year ago
This is a common argument to support the usage of LLMs: "Well, humans do it too."

We have many rules, regulations, strategies, patterns, and legions of managers and management philosophy for dealing with humans.

With humans, they're incorrect sometimes, yes, and we actively work around their failures.

We expect humans to develop over time. We expect them to join a profession and give bad answers a lot. As time goes on, we expect them to produce better answers, and if they don't we have remediations to limit the negative impact they have on our business processes. We fire them. We recommend they transfer to a different discipline. We recommend they go to college.

Comparing the successes and failures of LLMs to humans is silly. We would have fired them all by now.

The big difference is that computers CAN get every single question correctly. They ARE better than humans. LLMs are a huge step back from the benefits we got from computers.

nunez · a year ago
Also, humans can say "I don't know," a skill that seems impossible for LLMs
taylodl · a year ago
> The big difference is that computers CAN get every single question correctly.

I emphatically disagree on that point. AFAIK, nobody has been able to demonstrate, even in principle, that omniscience is possible over a domain of sufficient complexity and subtlety. My gut tells me this is related to Gödel's Incompleteness Theorem.

QuantumGood · a year ago
I find this a useful frame of reference: don't assume anyone, anything is correct. Learn to work with what is, not what could be. AI is very helpful to me, as long as I don't have unrealistic expectations.
threeseed · a year ago
a) If you ask me about surgery I will say "I don't know". LLMs won't do that.

b) Experts may give wrong answers but it will happen once. LLMs will do it over and over again.

latentsea · a year ago
>b) Experts may give wrong answers but it will happen once. LLMs will do it over and over again.

Well... Sometimes "experts" will give the wrong answer repeatedly.

threeseed · a year ago
> investors/founders/PMs don’t really care

Garry Tan from YC is a great example of this.

It's not that he doesn't care. It's just that he believes that the next model will be the one that fixes it. And companies that jump on board now can simply update their model and be in prime position. Similar to how Tesla FSD is always 2 weeks away from perfection and when it happens they will dominate the market.

And because companies are experimenting with how to apply AI these startups are making money. So investors jump in on the optimism.

The problem is that for many use cases e.g. AI agents, assistance, search, process automation etc. they very much do care about accuracy. And they are starting to run out of patience for the empty promises. So there is a reckoning coming for AI in the coming year or two and it will be brutal. Especially in this fundraising environment.

shafyy · a year ago
> It's not that he doesn't care. It's just that he believes that the next model will be the one that fixes it.

No, what he does is he hopes that they can keep the hype alive long enough to cash out and then go to the next hype. Not only Garry Tan, but most VCs. That's the fundamental business model of VCs. That's also why Tesla FSD is always two weeks away. The gold at the end of the rainbow.

f0e4c2f7 · a year ago
When I was a kid there was this new thing that came out called Wikipedia. I couldn't convince anyone it was useful though because they pointed out it was wrong sometimes. Eventually they came around though.

AI is like that right now. It's only right sometimes. You need to use judgement. Still useful though.

0x00_NULL · a year ago
I feel like this fails on the premise that the models can be improved to the point where they are reliable. I don't know that holds true. It is extremely uncommon that making a system more complex makes it more reliable.

In the rare cases where more complexity produces a more reliable system, that complexity is always incremental, not sudden.

With our current approach to deep neural networks and LLMs, we missed the incremental step and jumped to rodent brain levels of complexity. Now, we are hoping that we can improve our way to stability.

I don't know of any examples where that has happened - so I am not optimistic about the chances here.

politelemon · a year ago
The difference right now is that many people are paying for it. It feels odd to pay for something that could give wrong answers.
rfjimen · a year ago
Your point is valid if you believe LLM/Generative AI is deterministic; it is not. It is inference-based, and thus it provides different answers even given the same input at times.

The question then becomes, "How wrong can it be and still be useful?" This depends on the use case. It is much harder for applications that require high deterministic output but less important for those that do not. So yes, it does provide wrong outputs, but it depends on what the output is and the tolerance for variation. In the context of Question and Answer, where there is only one right answer, it may seem wrong, but it could also provide the right answer in three different ways. Therefore, understanding your tolerance for variation is most important, in my humble opinion.

chrisjj · a year ago
> Your point is valid if you believe LLM/Generative AI is deterministic; it is not. It is inference-based, and thus it provides different answers even given the same input at times.

Inference is no excuse for inconsistency. Inference can be deterministic and so deliver consistency.

0x00_NULL · a year ago
Yeah. Almost all of the "killer apps" for LLMs all revolve around generating content, images, or videos. My question is always the same, "Is there really such a massive market for mediocre content?"