We gotta stop ignoring AI's hallucination problem

websitescenes · a year ago

There is a huge backlash coming when the general public learns AI is plagued with errors and hallucinations. Companies are out there straight up selling snake oil to them right now.

kibwen · a year ago

Observing the realm of politics should be enough to disabuse anyone of the notion that people generally assign any value at all to truthfulness.

People will clamor for LLMs that tell them what they want to hear, and companies will happily oblige. The post-truth society is about to shift into overdrive.

Ekaros · a year ago

It depends on situation. People want their health care provider to be correct. Same goes with chat bot when they are trying to get support.

On other hand at same time they might not want to me moralized to like told that they should save more money, spend less or go on diet...

AI providing incorrect information in many cases when dealing with regulations, law and so on can have significant real world impact. And such impact is unacceptable. For example you cannot have tax authority or government chatbot be wrong about some regulation or tax law.

Loughla · a year ago

This is shockingly accurate. Other than professional work, AI just has to learn how to respond to the individual's tastes and established beliefs to be successful. Most people want the comfort of believing they're correct, not being challenged in their core beliefs.

It seems like the most successful AI business will be one in which the model learns about you from your online habits and presence before presenting answers.

llamaimperative · a year ago

Of course people generally value truthfulness. That value commonly being trumped by other competing values doesn’t negate its existence.

I don’t think defeatism is helpful (or correct).

skilled · a year ago

Exactly. This is super evident when you start asking for more complex questions in CS, and when asking for intermediate-level code examples.

Also the same for asking about apps/tools. Unless it is a super known app like Trello which has been documented and written about to death - the LLM will give you all kinds of features for a product, which it actually doesn’t have.

It doesn’t take long to realize that half the time all these LLMs just give you text for the sake of giving it.

kylebenzle · a year ago

Agreed, calling an LLM "AI" is just silly and technically makes no sense, they are text generators based on text context.

terminalcommand · a year ago

Asking LLMs for imaginary facts is the wrong thing here, not the hallucination of the LLMs.

LLMs have constraints, these are computation power and model size. Just like a human would get overwhelmed if you request too much with vague instructions LLMs also get overwhelmed.

We need to learn how to write efficient prompts to use LLMs. If you do not understand the matter, be able to provide enough context, the LLM hallucinates.

Currently criticising LLMs on hallucinations by asking factual questions is akin to saying I tried to divide by zero on my calculator and it doesn't work. LLMs were not designed for providing factual information without context, they are thinking machines excelling at higher level intellectual work.

add-sub-mul-div · a year ago

Coming? I think the general public has already come to consider "AI" synonymous with hallucination, awkward writing, and cringe art.

bayindirh · a year ago

No. From my experience, many people think that AI is an infallible assistant, and even some are saying that we should replace any and all tools with LLMs, and be done with it.

pyrale · a year ago

The art part is actually pretty nice, because everyone can see directly if the generated art fits their taste, and back-and-forth with the bot to get what you want is actually pretty funny.

It gets frustrating sometimes, but overall it's decent as a creative activity, and because people don't expect art to be knowledge.

duxup · a year ago

Are they?

Every AI use I have comes with a big warning.

The internet is full of lies and I still use it.

drewcoo · a year ago

> Companies are out there straight up selling snake oil to them right now

Well snake oil sells. And the margins are great!

kylebenzle · a year ago

Yes, calling an LLM "AI" was the first HUGE mistake.

A statistical model the can guess the next word is in no way "intelligent" and Sam Altman himself agrees this is not a path to AGI (what we used to call just AI).

pixl97 · a year ago

>is in no way "intelligent"

Please define the word intelligent in a way accepted by doctors, scientists, and other professionals before engaging in hyperbole or you're just as bad as the AGI is already here people. Intelligence is a gradient in problem solving and our software is creeping up that gradient in it's capabilities.

Deleted Comment

throwawaysleep · a year ago

Humans are also error plagued. AI just needs to beat them.

llamaimperative · a year ago

No, AI also needs to fail in similar ways as humans. A system that makes 0.001% errors, all totally random and uncorrelated, will be very different in production than a system that makes 0.001% errors systematically and consistently (random errors are generally preferable).

Bluecobra · a year ago

> It can’t consistently figure out the simplest tasks, and yet, it’s being foisted upon us with the expectation that we celebrate the incredible mediocrity of the services these AIs provide.

This is exactly how I feel about AI in it's current state. Maybe I just don't get it, but it just seems like a novelty to me right now. Like when Wolfram Alpha came out and I played around with it a few times. Copilot did help me write a one-liner with awk a few months ago so that was cool I guess.

llamaimperative · a year ago

It is definitely oversold, but more importantly IMO it needs to be productized. We’re seeing very sloppy, ad hoc productizations but there are definitely lots of valuable, interesting problems that LLMs will help with. Probably most of them buried in the innards of archaic domains and institutions, which the public writ large will never directly interact with.

vbezhenar · a year ago

AI lack responsibility. I’d hallucinate to my higher ups all day if not for the fear of firing.

duxup · a year ago

I don't expect perfection, far from it, and I find AI very useful.

That's not unlike the internet, my coworkers, lots of things and I still work with all of them.

Side note: People talk about AI's mistakes all the time. That article title is a hallucination.

singpolyma3 · a year ago

lol that headline. AFAICT we talk about nothing else. We're not ignoring it.

omnicognate · a year ago

I see fewer discussions of it. I've mostly given up getting involved in any. There's nothing much new to say, and people's views rarely change.

I'm firmly in the camp that considers calling it "hallucination" or "getting things wrong" category errors that wrongly imply it gets anything right, and have seen nothing that remotely inclines me to revise that opinion. I recognise it as an opinion, though. It can't be proven until we have an understanding of what "understanding" is that is sufficiently concrete to be able to demonstrate that LLMs do not possess it. Likewise the view that human minds are the same kind of things as LLMs cannot be disproven until we have a sufficient understanding of how human minds work (which we most certainly do not), however obviously wrong it may seem to me.

Meanwhile, as that discussion fizzles out, the commercial development of these products continues apace, so we're going to find out empirically what a world filled with them will mean, whether I think that's wise or not.

crabmusket · a year ago

> Likewise the view that human minds are the same kind of things as LLMs cannot be disproven until we have a sufficient understanding of how human minds work

This seems pretty backwards to me. Why should this speculative view need to be disproven rather than proven?

Sure, LLMs do some things kind of like some things human minds can do. But if you put that on a Venn diagram, the overlap would be miniscule.

There's also the plain observation that LLMs are made of silicon and human minds are made of neurons- this from this you might reasonably start with the assumption that they are in fact extremely different, and the counterclaim is the one needing evidence!

mrtranscendence · a year ago

What does it mean to get something right? If I ask GPT-4o (or any model, really) whether it's usually hot in Topeka during the summer, it will give me the correct answer. The model may or may not understand what it means for Topeka to be hot during the summer, but it has generated text with a single reasonable interpretation which happens to include a factually correct statement. I'm comfortable saying that GPT-4o "got it right" in that case regardless of what we believe about its understanding.

pixl97 · a year ago

The particular problem with 'proven' is a whole lot of things we use day to day are not proven and at best may only have statistical likelihood, but never be provable.

Things like understanding, consciousness, intelligence are all things on gradients that are going to be on that scale.

Loughla · a year ago

Literally no company who is using "ai" in the education space (my field) talks about hallucination or errors.

In fact, many sell their product as hallucination free.

jerkstate · a year ago

Journalists gotta stop ignoring that experts in industry are working day and night to improve deficiencies in new world-changing technology.

oddevan · a year ago

The article doesn't ignore that. The point is that we keep talking about that future as if it's now, and it's simply not.

bluGill · a year ago

Who is this "we". While I see it here, I'm not seeing it talked about as much in other areas.

duxup · a year ago

That title is a hallucination.

intended · a year ago

There is so much pressure to deliver “GenAI” numbers/success stories. Stuff conceived with evaluations as fundamental part of their design, become unrecognizable once it’s 2 steps outside of your ability to influence.

Bring up evaluation metrics and people go from enthusiastic discussions on RAG implementations, to that uncomfortable discussion where no one seems to have a common language or priors.

That said, the true injury is when someone sees the prototype and eventually asks “what are your usage numbers.”

Edit: Forgot this other pattern, the awkward shuffling as people reframe product features. Highly correlated with actual evaluation of their output quality.

LtWorf · a year ago

I asked chatgpt to tell me about the author of weborf (my bachelor thesis). It told me what it was correctly, and then claimed that some french guy had written it. The french guy does exist on github.

JonChesterfield · a year ago

The idea seems to be it'll stop getting things wrong when we feed it enough information / give Nvidia enough money so we don't need to worry about that.