There is a huge backlash coming when the general public learns AI is plagued with errors and hallucinations. Companies are out there straight up selling snake oil to them right now.
Observing the realm of politics should be enough to disabuse anyone of the notion that people generally assign any value at all to truthfulness.
People will clamor for LLMs that tell them what they want to hear, and companies will happily oblige. The post-truth society is about to shift into overdrive.
It depends on situation. People want their health care provider to be correct. Same goes with chat bot when they are trying to get support.
On other hand at same time they might not want to me moralized to like told that they should save more money, spend less or go on diet...
AI providing incorrect information in many cases when dealing with regulations, law and so on can have significant real world impact. And such impact is unacceptable. For example you cannot have tax authority or government chatbot be wrong about some regulation or tax law.
This is shockingly accurate. Other than professional work, AI just has to learn how to respond to the individual's tastes and established beliefs to be successful. Most people want the comfort of believing they're correct, not being challenged in their core beliefs.
It seems like the most successful AI business will be one in which the model learns about you from your online habits and presence before presenting answers.
Exactly. This is super evident when you start asking for more complex questions in CS, and when asking for intermediate-level code examples.
Also the same for asking about apps/tools. Unless it is a super known app like Trello which has been documented and written about to death - the LLM will give you all kinds of features for a product, which it actually doesn’t have.
It doesn’t take long to realize that half the time all these LLMs just give you text for the sake of giving it.
Asking LLMs for imaginary facts is the wrong thing here, not the hallucination of the LLMs.
LLMs have constraints, these are computation power and model size. Just like a human would get overwhelmed if you request too much with vague instructions LLMs also get overwhelmed.
We need to learn how to write efficient prompts to use LLMs. If you do not understand the matter, be able to provide enough context, the LLM hallucinates.
Currently criticising LLMs on hallucinations by asking factual questions is akin to saying I tried to divide by zero on my calculator and it doesn't work. LLMs were not designed for providing factual information without context, they are thinking machines excelling at higher level intellectual work.
No. From my experience, many people think that AI is an infallible assistant, and even some are saying that we should replace any and all tools with LLMs, and be done with it.
The art part is actually pretty nice, because everyone can see directly if the generated art fits their taste, and back-and-forth with the bot to get what you want is actually pretty funny.
It gets frustrating sometimes, but overall it's decent as a creative activity, and because people don't expect art to be knowledge.
Yes, calling an LLM "AI" was the first HUGE mistake.
A statistical model the can guess the next word is in no way "intelligent" and Sam Altman himself agrees this is not a path to AGI (what we used to call just AI).
Please define the word intelligent in a way accepted by doctors, scientists, and other professionals before engaging in hyperbole or you're just as bad as the AGI is already here people. Intelligence is a gradient in problem solving and our software is creeping up that gradient in it's capabilities.
No, AI also needs to fail in similar ways as humans. A system that makes 0.001% errors, all totally random and uncorrelated, will be very different in production than a system that makes 0.001% errors systematically and consistently (random errors are generally preferable).
> It can’t consistently figure out the simplest tasks, and yet, it’s being foisted upon us with the expectation that we celebrate the incredible mediocrity of the services these AIs provide.
This is exactly how I feel about AI in it's current state. Maybe I just don't get it, but it just seems like a novelty to me right now. Like when Wolfram Alpha came out and I played around with it a few times. Copilot did help me write a one-liner with awk a few months ago so that was cool I guess.
It is definitely oversold, but more importantly IMO it needs to be productized. We’re seeing very sloppy, ad hoc productizations but there are definitely lots of valuable, interesting problems that LLMs will help with. Probably most of them buried in the innards of archaic domains and institutions, which the public writ large will never directly interact with.
I see fewer discussions of it. I've mostly given up getting involved in any. There's nothing much new to say, and people's views rarely change.
I'm firmly in the camp that considers calling it "hallucination" or "getting things wrong" category errors that wrongly imply it gets anything right, and have seen nothing that remotely inclines me to revise that opinion. I recognise it as an opinion, though. It can't be proven until we have an understanding of what "understanding" is that is sufficiently concrete to be able to demonstrate that LLMs do not possess it. Likewise the view that human minds are the same kind of things as LLMs cannot be disproven until we have a sufficient understanding of how human minds work (which we most certainly do not), however obviously wrong it may seem to me.
Meanwhile, as that discussion fizzles out, the commercial development of these products continues apace, so we're going to find out empirically what a world filled with them will mean, whether I think that's wise or not.
> Likewise the view that human minds are the same kind of things as LLMs cannot be disproven until we have a sufficient understanding of how human minds work
This seems pretty backwards to me. Why should this speculative view need to be disproven rather than proven?
Sure, LLMs do some things kind of like some things human minds can do. But if you put that on a Venn diagram, the overlap would be miniscule.
There's also the plain observation that LLMs are made of silicon and human minds are made of neurons- this from this you might reasonably start with the assumption that they are in fact extremely different, and the counterclaim is the one needing evidence!
What does it mean to get something right? If I ask GPT-4o (or any model, really) whether it's usually hot in Topeka during the summer, it will give me the correct answer. The model may or may not understand what it means for Topeka to be hot during the summer, but it has generated text with a single reasonable interpretation which happens to include a factually correct statement. I'm comfortable saying that GPT-4o "got it right" in that case regardless of what we believe about its understanding.
The particular problem with 'proven' is a whole lot of things we use day to day are not proven and at best may only have statistical likelihood, but never be provable.
Things like understanding, consciousness, intelligence are all things on gradients that are going to be on that scale.
There is so much pressure to deliver “GenAI” numbers/success stories. Stuff conceived with evaluations as fundamental part of their design, become unrecognizable once it’s 2 steps outside of your ability to influence.
Bring up evaluation metrics and people go from enthusiastic discussions on RAG implementations, to that uncomfortable discussion where no one seems to have a common language or priors.
That said, the true injury is when someone sees the prototype and eventually asks “what are your usage numbers.”
Edit: Forgot this other pattern, the awkward shuffling as people reframe product features. Highly correlated with actual evaluation of their output quality.
I asked chatgpt to tell me about the author of weborf (my bachelor thesis). It told me what it was correctly, and then claimed that some french guy had written it. The french guy does exist on github.
The idea seems to be it'll stop getting things wrong when we feed it enough information / give Nvidia enough money so we don't need to worry about that.
People will clamor for LLMs that tell them what they want to hear, and companies will happily oblige. The post-truth society is about to shift into overdrive.
On other hand at same time they might not want to me moralized to like told that they should save more money, spend less or go on diet...
AI providing incorrect information in many cases when dealing with regulations, law and so on can have significant real world impact. And such impact is unacceptable. For example you cannot have tax authority or government chatbot be wrong about some regulation or tax law.
It seems like the most successful AI business will be one in which the model learns about you from your online habits and presence before presenting answers.
I don’t think defeatism is helpful (or correct).
Also the same for asking about apps/tools. Unless it is a super known app like Trello which has been documented and written about to death - the LLM will give you all kinds of features for a product, which it actually doesn’t have.
It doesn’t take long to realize that half the time all these LLMs just give you text for the sake of giving it.
LLMs have constraints, these are computation power and model size. Just like a human would get overwhelmed if you request too much with vague instructions LLMs also get overwhelmed.
We need to learn how to write efficient prompts to use LLMs. If you do not understand the matter, be able to provide enough context, the LLM hallucinates.
Currently criticising LLMs on hallucinations by asking factual questions is akin to saying I tried to divide by zero on my calculator and it doesn't work. LLMs were not designed for providing factual information without context, they are thinking machines excelling at higher level intellectual work.
It gets frustrating sometimes, but overall it's decent as a creative activity, and because people don't expect art to be knowledge.
Every AI use I have comes with a big warning.
The internet is full of lies and I still use it.
Well snake oil sells. And the margins are great!
A statistical model the can guess the next word is in no way "intelligent" and Sam Altman himself agrees this is not a path to AGI (what we used to call just AI).
Please define the word intelligent in a way accepted by doctors, scientists, and other professionals before engaging in hyperbole or you're just as bad as the AGI is already here people. Intelligence is a gradient in problem solving and our software is creeping up that gradient in it's capabilities.
Deleted Comment
This is exactly how I feel about AI in it's current state. Maybe I just don't get it, but it just seems like a novelty to me right now. Like when Wolfram Alpha came out and I played around with it a few times. Copilot did help me write a one-liner with awk a few months ago so that was cool I guess.
That's not unlike the internet, my coworkers, lots of things and I still work with all of them.
Side note: People talk about AI's mistakes all the time. That article title is a hallucination.
I'm firmly in the camp that considers calling it "hallucination" or "getting things wrong" category errors that wrongly imply it gets anything right, and have seen nothing that remotely inclines me to revise that opinion. I recognise it as an opinion, though. It can't be proven until we have an understanding of what "understanding" is that is sufficiently concrete to be able to demonstrate that LLMs do not possess it. Likewise the view that human minds are the same kind of things as LLMs cannot be disproven until we have a sufficient understanding of how human minds work (which we most certainly do not), however obviously wrong it may seem to me.
Meanwhile, as that discussion fizzles out, the commercial development of these products continues apace, so we're going to find out empirically what a world filled with them will mean, whether I think that's wise or not.
This seems pretty backwards to me. Why should this speculative view need to be disproven rather than proven?
Sure, LLMs do some things kind of like some things human minds can do. But if you put that on a Venn diagram, the overlap would be miniscule.
There's also the plain observation that LLMs are made of silicon and human minds are made of neurons- this from this you might reasonably start with the assumption that they are in fact extremely different, and the counterclaim is the one needing evidence!
Things like understanding, consciousness, intelligence are all things on gradients that are going to be on that scale.
In fact, many sell their product as hallucination free.
Bring up evaluation metrics and people go from enthusiastic discussions on RAG implementations, to that uncomfortable discussion where no one seems to have a common language or priors.
That said, the true injury is when someone sees the prototype and eventually asks “what are your usage numbers.”
Edit: Forgot this other pattern, the awkward shuffling as people reframe product features. Highly correlated with actual evaluation of their output quality.