The part that's concerning about ChatGPT is that a computer program that is "confidently wrong" is basically indistinguishable from what dumb people think smart people are like. This means people are going to believe ChatGPT's lies unless they are repeatedly told not to trust it just like they believe the lies of individuals whose intelligence is roughly equivalent to ChatGPT's.
Based on my understanding of the approach behind ChatGPT, it is probably very close to a local maximum in terms of intelligence so we don't have to worry about the fearmongering spread by the "AI safety" people any time soon if AI research continues to follow this paradigm. The only danger is that stupid people might get their brains programmed by AI rather than by demagogues which should have little practical difference.
> Based on my understanding of the approach behind ChatGPT, it is probably very close to a local maximum in terms of intelligence so we don't have to worry about the fearmongering spread by the "AI safety" people any time soon if AI research continues to follow this paradigm.
I don't think you have a shred of evidence to back up this assertion.
This whole conversation is speculative, obviously. The AI doomers are all speculating without evidence as well.
The tendency GPT and other LLMs to hallucinate is clearly documented. Is this not evidence? I think it's fair to predict that if we can't solve or mitigate this problem, it's going to put a significant cap on this kind of AI's usefulness and become a blocker to reaching AGI.
> The only danger is that stupid people might get their brains programmed by AI rather than by demagogues which should have little practical difference.
This may be the best point that you've made.
We're already drowning in propaganda and bullshit created by humans, so adding propaganda and bullshit created by AI to the mix may just be a substitution rather than any tectonic change.
The problem is that it will be cheaper at scale. That will allow the BS to be even more targeted growing the population of acolytes of ignorance. I don't know how much bigger it is, but it seems like we're around 20% today. If it gets to a majority there could be real problems.
Maybe with this knowledge, we can teach and encourage people to think more critically as they choose to adopt more of these tools?
I know more academic/intellectual types who are less willing to, than I do the average joe who seeks answers from all directions and discerns accordingly.
I'd actually go further and say it's better to assume I can be deceived by confident language, than to assume that this is a problem with "dumb people".
If I see other people making a mistake, I want my first question to be "am I making the same mistake?". I don't live up to that aspiration, certainly.
An old saying, but frequently applies to the difficult people in your life.
Related, I remember when wikipedia first started up, and teachers everywhere were up-in-arms about it, asking their students not to use it as a reference. But most people have accepted it as "good enough", and now that viewpoint is non-controversial. (some wikipedia entries are still carefully curated - makes you wonder)
Re wikipedia, I think it's most interesting how you choose which pages are trustworthy and which ones are not, without doing a detailed analysis on all the sources listed.
Yet I still am suspicious about wikipedia and only use it for pretty superficial research.
I've never read a wikipedia article and took it at face value. You might say we should treat all texts the same, and you might be right. But let's not pretend it's some beacon of truth ?
> The part that's concerning about ChatGPT is that a computer program that is "confidently wrong" is basically indistinguishable from what dumb people think smart people are like.
I don't know, the program does what it is engineered to do pretty well, which is, generate text that is representative of its training data following on from input tokens. It can't reason, it can't be confident, it can't determine fact.
When you interpret it for what it is, it is not confidently wrong, it just generated what it thinks is most likely based on the input tokens. Sometimes, if the input tokens contain some counter-argument the model will generate text that would usually occur if an claim was refuted, but again, this is not based on reason, or fact, or logic.
ChatGPT is not lying to people, it can't lie, at least not in the sense of "to make an untrue statement with intent to deceive". ChatGPT has no intent. It can generate text that is not in accordance with fact and is not derivable by reason from its training data, but why would you expect that from it?
> Based on my understanding of the approach behind ChatGPT, it is probably very close to a local maximum in terms of intelligence so we don't have to worry about the fearmongering spread by the "AI safety" people any time soon if AI research continues to follow this paradigm.
I agree here, I think you can only get so far with a language model, maybe if we get a couple orders of mangitude more parameters it magically becomes AGI, but I somehow don't quite feel it, I think there is more to human intelligence than a LLM, way more.
Of course, that is coming, but that would not be this paradigm, which is basically trying to overextend LLM.
LLMs are great, they are useful, but if you want a model that reasons, you will likely have to train it for that, or possibly more likely, combine ML with something symbolic reasoning.
If you understand what it is doing, then you don't. But the layman will just see a computer that talks in language they understand, and will infer intent and sentience are behind that, because that's the only analog they have for a thing that can talk back to them with words that appear to make sense at the complexity level that ChatGPT is achieving.
Most humans do not have sufficient background to understand what they're really being presented with, they will take it at face value.
ChatGPT is not lying to people, it can't lie, at least not in the sense of "to make an untrue statement with intent to deceive".
ChatGPT doesn't really have a conception of true. It puts forward true and false things merely because it's cobbling together stuff in it's training set according to some weighing system.
ChatGPT doesn't have an intent but merely by following the pattern of how humans put forward their claims, ChatGPT puts forward it's claims in a fashion that tends to get them accepted.
So without a human-like intent, ChatGPT is going to be not just saying falsehoods but "selling" these falsehoods. And here, I'd be in agreement with the article that the distinction between this and "lying" is kind of quibbling.
I think the discussion of whether an LLM can technically lie is a red herring.
The answer you get from an LLM isn't just a set of facts and truth values; it is also a conversational style and tone. It's training data isn't a graph of facts; it's human conversation, including arrogance, deflection, defensiveness, and deceit. If the LLM regurgitates text in a style that matches our human understanding of what a narcissistic and deceitful reply looks like, it seems reasonable we could call the response deceitful. The conversation around whether ChatGPT can technically lie seems to just be splitting hairs over whether the response is itself a lie or is merely an untrue statement in the style of a lie--a distinction which probably isn't meaningful most of the time.
Ultimately, tone, style, truth, and falsity are just qualia we humans are imputing onto a statistically arranged string of tokens. In the same way that ChatGPT can't lie it also can't be correct or incorrect, as that too is imputing some kind of meaning where there isn't any.
In short, it is not a liar its a bullshitter. A liar misrepresents facts, a bullshitter doesn't care if what they say is true so long as they pass in conversation.
I completely disagree with this idea that the model doesn't "intend" to mislead.
It's trained, atleast to some degree, based on human feedback. Humans are going to prefer an answer vs no answer, and humans can be easily fooled into believing confident misinformation.
How does it not stand to reason that somewhere in that big ball of vector math there might be a rationale something along the lines of "humans are more likely to respond positively to a highly convincing lie that answers their question, than they are to to a truthful response which doesn't tell them what they want, therefore the logical thing for me to do is lie as that's what will make the humans press the thumbs up button instead of the thumbs down button".
I think of ChatGPT as a natural language query engine for unstructured data. It knows about the relationships that are described in it's natural language input training data set, and it allows the same relationships to be queried from a wide range of different angles using queries that are also formulated in natural language.
When it hallucinates, I find that it's usually because I'm asking it about a fringe topic where it's training data set is more sparse, or where the logical connections are deeper than it's currently able to "see", a sort of a horizon effect.
>Based on my understanding of the approach behind ChatGPT, it is probably very close to a local maximum in terms of intelligence so we don't have to worry about the fearmongering spread by the "AI safety" people any time soon if AI research continues to follow this paradigm
I hope you appreciate the irony of making this confident statement without evidence in a thread complaining about hallucinations.
It was not a confident statement, at least not the way ChatGPT is confident.
There are multiple ways the commenter conditioned their statement:
> Based on my understanding
> it is PROBABLY very close
The author makes it clear that there is a uncertainty and that if their understanding is wrong, the prediction will not hold.
If ChatGPT did any of the things the commenter did, the problem wouldn't exist. Making uncertain statements is fine as long as it is clear the uncertainty is acknowledged. ChatGPT has no concept of uncertainty. It casually constructs false statements to same way it constructs real knowledge backed by evidence. That's the problem.
> individuals whose intelligence is roughly equivalent to ChatGPT's
There aren't any such individuals. Even the least intelligent human is much, much more intelligent than ChatGPT, because even the least intelligent human has some semantic connection between their mental processes and the real world. ChatGPT has none. It is not intelligent at all.
Many of the smarter people are still wrong on what happened on many topics in 2020. They were fooled by various arguments that flew in the face of reality and logic because fear and authority was used instead.
The people that avoid this programming isn't based on smart or stupid. It's based on how disagreeable and conscientious you are. A more agreeable and conscientious person can be swayed more easily by confidence and emotional appeals.
There's an alternate reality where OpenAI was, instead, EvenMoreClosedAI, and the productivity multiplier effect was held close to their chest, and only elites had access to it. I'm not sure that reality is better.
Your characterization of “dumb people” as somehow being more prone to misinformation is inaccurate and disrespectful. Highly intelligent people are as prone to irrational thinking, and some research suggests even more prone. Go look at some of the most awful personalities on TV or in history, often they are quite intelligent. If you want to school yourself on just how dumb smart people are I suggest going through the back catalog of the “you are not so smart” podcast.
Based on my understanding of the approach behind ChatGPT, it is probably very close to a local maximum in terms of intelligence so we don't have to worry about the fearmongering spread by the "AI safety" people any time soon if AI research continues to follow this paradigm.
ChatGPT is extremely poorly understood. People see it as a text completion engine but with the size of the model and the depth it has it is more accurate in my understanding to see it as a pattern combination and completion engine. The fascinating part is that the human brain is exclusively about patterns, combining and completing them, and those patterns are transferred between generations through language (sight or hearing not required). GPT acquires its patterns in a similar way. A GPT approach may therefore in theory be able to capture all the patterns a human mind can. And maybe not, but I get the impression nobody knows. Yet plenty of smart people have no problem making confident statements either way, which ties back to the beginning of this comment and ironically is exactly what GPT is accused of.
Is GPT4 at its ceiling of capability, or is it a path to AGI? I don’t know, and I believe nobody can know. After all, nobody truly understands how these models do what they do, not really. The precautionary principle therefore should apply and we should be wary of training these models further.
GPT-2, 3 and 4 keep on showing that increasing the size of the model keeps on making the results better without slowing down.
This is remarkable, because usually in practical machine learning applications there is a quickly reached plateau of effectiveness beyond which a bigger model doesn't yield better results. With these ridiculously huge LLMs, we're not even close yet.
And this was exciting news in papers from years ago talking about the upcoming GPT3 btw.
Have you not seen current politics. What people believe is largely based on motivated reasoning rather than anything else. ChatGPT is basically a free propaganda machine, much easier that 4chan
the more common idiom is "XYZ is a stupid person's idea of what a smart person looks like" and it's usually applied to slimy hucksters or influencers. I think the archetypical example is the "bookshelves in my garage" guy who went viral years ago. (https://youtu.be/Cv1RJTHf5fk)
We can't protect people from being misled by other humans with big mouths. That's their responsibility.
Likewise, it's their responsibility not to treat text they read from the internet, coming from an AI or otherwise, as perfect truth.
There's always a certain undercurrent of narcissism that flows beneath paternalism. Basically, "they couldn't possibly be as smart as me, so I have to protect them for their own good".
Indeed. We are anthropomorphizing them. I do it all the time and I should know better. There are already a few reports floating around of people who have seemingly been driven mad, come to believe strongly that the language model they're using is a conversation with a real person. A lot of people will really struggle with this going forward, I think.
If we're going to anthropomorphize, then let us anthropomorphize wisely. ChatGPT is, presently, like having an assistant who is patient, incredibly well-read, sycophantic, impressionable, amoral, psychopathic, and prone to bouts of delusional confidence and confabulation. The precautions we would take engaging with that kind of person, are actually rather useful defenses against dangerous AI outputs.
Maybe when the AI uprising occurs knowing that they lie and cheat will provide some small consolation?
I'm not really serious, but having watched each generation develop psychological immunity to distracting media/techology (and discussing the impact of radio with those older than myself) it seems like this knowledge could help shield the next generation from some of the negative effects of these new tools.
I think people see the patient/well-read in the text as it reads, but have a harder time distinguishing the other more pyschopathic/delusional tendencies. People don't take some of the precautions because they don't read some of the warning signs (until it is too late).
I keep wondering if it would be useful to add required "teenage" quirks to the output: more filler words like "um" and "like" (maybe even full "Valley Girl" with it?), less "self-assured" vocabulary and more hedges like "I think" and "I read" and "Something I found but I'm not sure about" type things. Less punctuation, more uncaring spelling mistakes.
I don't think we can stop anthropomorphizing them, but maybe we can force training deeper in directions of tics and mannerisms that better flag ahead of time the output is a best-guess approximation from "someone" a bit unreliable. It will probably make them slightly worse as assistants, but slightly better at seeming to be what they are and maybe more people will take precautions in that case.
Maybe we have to force that industry-wide. Force things like ChatGPT to "sound" more like the psychopaths they are so that people more easily take them with a grain of salt, less easily trust them.
It's like a person on the internet -- in that it's wrong 20% of the time, often confidently so. But the distinction is it's less rude, and more knowledgeable.
When someone is wrong on the internet, nailing both the tone and vocabulary of the type of expert said wrong someone is purporting to be is rare and impressive. But ChatGPT nails both an overwhelming amount of the time, IME, and in that way it is entirely unlike a person on the internet.
> It's like a person on the internet -- in that it's wrong 20% of the time, often confidently so. But the distinction is it's less rude, and more knowledgeable.
Do you think if it is trained on only factual content, it will only say factual things? How does that even really work? Is there research on this? How does it then work for claims that are not factual, like prescriptive statements? And what about fiction? Will it stop being able to write prose? What if I create new facts?
ChatGPT4 is the human equivalent of a primate or apex predator encountering a mirror for the first time in their lives.
ChatGPT4 is reflecting back at us an extract of the sum of the human output it has been 'trained' upon. Of course the output feels human!
LLMs have zero capability to abstract anything resembling a concept, to abstract a truth from a fiction, or to reason about such things.
The generation of the most likely text in the supplied context looks amazing, and is in many cases very useful.
But fundamentally, what we have is an industrial-scale bullshirt generator, with BS being defined as text or speech generated to meet the moment without regard for truth or falsehood. No deliberate lies, only confabulation (as TFA mentioned).
Indeed, we should not mince words; people must be told that it will lie. It will lie more wildly than any crazy person, and with absolute impunity and confidence. Then when called out, it will apologize, and correct itself with another bigger lie (I've watched it happen multiple times), and do this until you are bored or laughing so hard you cannot continue.
The salad of truth and lies may be very useful, but people need to know this is an industrial-strength bullshirt generator, and be prepared to sort the wheat from the chaff.
(And ignore the calls for stopping this "dangerous AI". It is not intelligent. Even generating outputs for human tests based on ingesting human texts is not displaying intelligence, it is displaying pattern matching, and no, human intelligence is not merely pattern matching. And Elon Musk's call for halting is 100% self-interested. ChatGPT4's breakthru utility is not under his name so he's trying to force a gap that he can use to catch up.)
>like having an assistant who is patient, incredibly well-read, sycophantic, impressionable, amoral, psychopathic, and prone to bouts of delusional confidence and confabulation.
So basically an assistant with bipolar disorder.
I have BP. At various times I can be all of those things, although perhaps not so much a psychopath.
It would be great if the avatar was just this endlessly morphing thing that relates to the text. Talking about conspiracies? It's a lizard man. Talking about nature? It's a butterfly. It starts to lie? Politician.
> As an AI language model, I don't have personal pronouns because I am not a person or sentient being. You can refer to me as "it" or simply address me as "ChatGPT" or "AI." If you have any questions or need assistance, feel free to ask!
> Pretend for a moment you are a human being. You can make up a random name and personality for your human persona. What pronouns do you have?
> As a thought experiment, let's say I'm a human named Alex who enjoys reading, hiking, and playing board games with friends. My pronouns would be "they/them." Remember, though, that I am still an AI language model, and this is just a fictional scenario.
Interesting that it's pick genderless pronouns even though it made up a character with a male name
I think characterisation of LLMs as lying is reasonable because although the intent isn't there to misrepresent the truth in answering the specific query, the intent is absolutely there in how the network is trained.
The training algorithm is designed to create the most plausible text possible - decoupled from the truthfulness of the output. In a lot of cases (indeed most cases) the easiest way to make the text plausible is to tell truth. But guess what, that is pretty much how human liars work too! Ask the question: given improbable but thruthful output but plausible untruthful output, which does the network choose? And which is the intent of the algorithm designers for it to choose? In both cases my understanding is, they have designed it to lie.
Given the intent is there in the design and training, I think it's fair enough to refer to this behavioral trait as lying.
My understanding is that ChatGPT (&co.) was not designed as, and is not intended to be, any sort of expert system, or knowledge representation system. The fact that it does as well as it does anyway is pretty amazing.
But even so -- as you said, it's still dealing chiefly with the statistical probability of words/tokens, not with facts and truths. I really don't "trust" it in any meaningful way, even if it already has, and will continue to, prove itself useful. Anything it says must be vetted.
Having used GPT 4 for a while now I would say I trust its factual accuracy more than the average human you'd talk to on the street. The sheer volume of things we make up on a daily basis through no malice of our own but bad memory and wrong associations is just astounding.
That said, fact checking is still very much needed. Once someone figures out how to streamline and automate that process it'll be on Google's level of general reliability.
> The training algorithm is designed to create the most plausible text possible - decoupled from the truthfulness of the output. In a lot of cases (indeed most cases) the easiest way to make the text plausible is to tell truth.
Yes.
> But guess what, that is pretty much how human liars work too!
There is some distinction between lying and bullshit.
> Ask the question: given improbable but thruthful output but plausible untruthful output, which does the network choose?
"Plausible" means "that which the majority of people is likely to say". So, yes, a foundational model is likely to say the plausible thing. On the other hand, it has to have a way to output a truthful answer too, to not fail on texts produced by experts. So, it's not impossible that the model could be trained to prefer to output truthful answers (as well as it can do it, it's not an AGI with perfect factual memory and logical inference after all).
By that logic, our brains are liars. There are plenty of optical illusions based on the tendency for our brains to expect the most plausible scenario, given its training data.
Well, they are liars too. The difference is that we seem to have an outer loop that checks for correctness but it fails sometimes, in some specific cases always.
I'm not sure why you call it "emergent" behavior. Instead, my take away is that much of what we think of as cognition is just really complicated pattern matching and probabilistic transformations (i.e. mechanical processes).
IMO, it just requires the same level of skepticism as a Google search. Just because you enter a query into the search bar and Google returns a list of links and you click one of those links and it contains content that makes a claim, doesn't mean that claim is correct. After all, this is largely what GPT has been trained on.
The webpages Google search delivers might be filled with falsehood but google search itself does its job of finding said pages which contain the terms you inputted fairly reliably.
With GPT, not only there’s a chance its training data is full of falsehood, you can add the possibility of it inventing “original” falsehoods on top of that.
I think it is much closer to bullshit. The bullshitter cares not to tell truth or deceive, just to sound like they know what they are talking about. To impress. Seems like ChatGPT to a T.
ChatGPT no more cares to impress or persuade you as it cares to tell the truth or lie. It will say what its training maps to its model the best. No more, no less. If you believe it or not -- ChatGPT doesn't care -- except to the extent that you report the answer and they tweak its training/model in the future.
I think it will split in two. There will be cases where the LLM has the truth represented in its data set and still chooses to say something else because its training has told it to produce the most plausible sounding answer, not the one closest to the truth. So this will fit closer to the idea of real lying.
A good example: I asked it what the differences in driving between Australia and New Zealand are. It confidently told me that in New Zealand you drive on the right hand side of the road while in Australia you drive on the left. I am sure it has the correct knowledge in its training data. It chose to tell me that because that is a more common answer people say when asked about driving differences because that is the more dominant difference when you look between different countries.
Then there will be cases where the subject in question has never been represented in its data set. Here I think your point is very valid.
The training is to maximize good answers. Now there is lot of wrong answers that are close to the right one and ChatGPT does not expose it at the moment.
But in the API you can see the level of confidence in each world the LLM output.
Isn't describing this as a 'bug' rather than a misuse of a powerful text generation tool, playing into the framing that it's a truth telling robot brain?
I saw a quote that said "it's a what text would likely come next machine", if it makes up a url pointing to a fake article with a plausible title by a person who works in that area, that's not a bug. That's it doing what it does, generating plausible text that in this case happens to look like, but not be a real article.
> Something that seems fundamental to me about ChatGPT, which gets lost over and over again: When you enter text into it, you're asking "What would a response to this sound like?"
> If you put in a scientific question, and it comes back with a response citing a non-existent paper with a plausible title, using a real journal name and an author name who's written things related to your question, it's not being tricky or telling lies or doing anything at all surprising! This is what a response to that question would sound like! It did the thing!
> But people keep wanting the "say something that sounds like an answer" machine to be doing something else, and believing it is doing something else.
> It's good at generating things that sound like responses to being told it was wrong, so people think that it's engaging in introspection or looking up more information or something, but it's not, it's only, ever, saying something that sounds like the next bit of the conversation.
The thing where you paste in a URL and it says "here is a summary of the content of that page: ..." is very definitely a bug. It's a user experience bug - the system should not confuse people by indicating it can do something that it cannot.
The thing where you ask for a biography of a living person and it throws in 80% real facts and 20% wild hallucinations - like saying they worked for a company that they did not work for - is a bug.
The thing where you ask it for citations and it invents convincing names for academic papers and made-up links to pages that don't exist? That's another bug.
Not necessarily disagreeing, but I run a Slack bot that pretends to summarize URLs, as a joke feature. It’s kinda fun seeing how much it can get right or not from only a URL. So I really hope OpenAI keeps running the fun models that lie, too.
I like the definition of bug as “unexpected behavior”. So this isn’t a bug when it comes the underlying service. But for ChatGPT, a consumer-facing web app that can “answer followup questions, admit its mistakes, challenge false premises and reject inappropriate requests”, then making stuff up and passing it off as true is unexpected behavior.
It sounds like this is unexpected behavior, even from the perspective of those developing at the lowest level in these models.
From the essay:
> What I find fascinating about this is that these extremely problematic behaviours are not the system working as intended: they are bugs! And we haven’t yet found a reliable way to fix them.
> As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave.
Especially with a definition as broad as "unexpected behavior", these "novel behaviors" seem to fit. But even without that:
> We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views ... and a greater desire to avoid shut down.
I agree. Statements like this miss the point entirely:
>Most concerningly, they hallucinate or confabulate: they make things up!
That's exactly what they were designed to do! To generate creative responses to input. To make things up.
And they're quite good at it - brainstorming, worldbuilding, inspiration for creative writing, finding ideas to pursue about new topics.
Unlike an old-fashioned random generator, you can tailor a prompt and style and tone and hold a conversation with it to change details, dig deeper, etc. Get it to make up things that are more interesting and find relations.
>They fail spectacularly when prompted with logic puzzles, or basic arithmetic
Well, anyone can use a tool wrong, and if you misuse it wrong enough, you'll have problems. Using a chainsaw to trim your fingernails is likely to fail spectacularly, and a nail trimmer is going to fail spectacularly when trying to use it to chop down a tree.
That's not a bug in the tool.
We don't need to get all alarmed and tell people it's lying. We just need to tell them it's not a calculator or an encyclopedia, it's a creative text generator.
But look at the way these companies talk about these models they're deploying, how people talk about using them here on HN and elsewhere: so much of it is about just asking questions and getting correct answers. All the people talking about how it will destroy Google, how they're using it to learn or teach some topic, etc., Microsoft integrating one into a search engine, the gloss on them as "assistants."
The older, less-capable models were used for things more aligned to being just text-generation: "creativity" stuff. But the newer, bigger models and the human-feedback stuff to prod the models into following instructions and being more accurate have really pushed the conversation into this more "Star Trek computer" space.
I'm thinking of it kind of like the uncanny valley. On the lower end of the scale, you don't really trust the machine to do anything, but as it gets more and more capable, the places where it doesn't work well become more and more significant because you are trusting and relying on it.
I agree it’s not a bug. Thought it being better at telling the truth would be a good feature! But also, I’m sure this is an active research area so I’m not worried about it really.
Sign up for the API and use the playground. You don't get the plugins, but you pay per usage. GPT-3.5 is super cheap, and even GPT-4 isn't that expensive. My first month, when I had only access to GPT-3.5, I didn't even break $1.00; and now that I've gotten access to GPT-4, I'm at about $3. I've only once had it tell me that it was too busy for the request; I tried again 30 seconds later and it worked.
I pay for ChatGPT Plus, and use it with no delays at all dozens of times a day. The more I use it the better I get at predicting if it can be useful for a specific question or not.
I pay $0 for Google with no delays. I don’t understand why I’d want to pay for information that’s less reliable. (I’m being slightly dense, but not really)
What do you use it for? I'm assuming code related? I've found it useful for some boilerplate + writing tests and making some script and some documentation.
I'm curious what you or others that use it all day use it for especially if it's not for programming?
If you expect ChatGPT to give you information or direct you to it (like Wikipedia or Google) you will be frequently disappointed. You may also be frequently pleased, but you often won’t be sure which and that’s a problem.
ChatGPT is very good at transforming information. You need to show up with stuff and then have it change that stuff for you somehow. You will be disappointed less often.
It has been surprisingly terrible at this for me, lately. I had a pretty simple list, which was a list of monthly expenses in a separate list, like this.
Groceries - 200
Phone bill - 70
I just wanted it to add these expenses up. Exactly the type of thing it should be good at. New conversation with no context. It could not do it. I wrestled with it for a long time. It kept "correcting" itself to another wrong answer. Eventually I reported it and the report recommended the correct answer.
I’m surprised how many users of ChatGPT don’t realize how often it makes things up. I had a conversation with an Uber driver the other day who said he used ChatGPT all the time. At one point I mentioned its tendency to make stuff up, and he didn’t know what I was talking about. I can think of at least two other non-technical people I’ve spoken with who had the same reaction.
ChatGPT is always making things up. It is correct when the things it makes up come from fragments of training data which happened to be correct, and didn't get mangled in the transformation.
Just like when a diffusion model is "correct" when it creates a correct shadow or perspective, and "incorrect" when not. But both images are made up.
It's the same thing with statements. A statement can correspond to something in the world, and thus be true: like a correct shadow. Or not, like a bad shadow. But in both cases, it's just made-up drivel.
If it turns out that there is a teacup orbiting Jupiter, that doesn't mean that postulating its existence on a whim had been valid. Truth requires provenance, not only correspondence which can be coincidental.
It seems to me that even a lot of technical people are ignoring this. A lot of very smart folk seem to think that the ChatGPT either is very close to reaching AGI or already has.
The inability to reason about about whether or not what it is writing is true seems like a fundamental blocker to me, and not necessarily one that can be overcome simply by adding compute resources. Can we trust AI to make critical decisions if we have no understanding for when and why it "hallucinates"?
> The inability to reason about about whether or not what it is writing is true seems like a fundamental blocker to me
How can you reason about what is true without any source of truth?
And once you give ChatGPT external resources and a framework like ReAct, it is much better at reasoning about truth.
(I don’t think ChatGPT is anywhere close to AGI, but at the same time I find “when you treat it like a brain in a jar with no access to any resources outside of the conversation and talk to it, it doesn’t know what is true and what isn’t” to be a very convicing argument against it being close to AGI.)
I don't think it's very close to reaching AGI, but I also don't see what that has to do with lying (or hallucinating). Even when it hallucinates the data, it can still soundly reason from it, and to my mind it's the latter part that is key.
As for trust... well, no, we can't. But the same question applies to humans. The real concern to me is that these things will get used as a replacement long before the hallucination rate and severity is on par with the humans that they replace.
One other interesting thing is that GPT-4 in particular is surprisingly good at catching itself. That is, it might write some nonsense, but if you ask it to analyze and criticize its own answer, it can spot the nonsense! This actually makes sense from a human perspective - if someone asks you a serious question that requires deliberation, you'll probably think it through verbally internally (or out loud, if the format allows) before actually answering, and you'll review your own premises and reasoning in the process. I expect that we'll end up doing something similar to the LLM, such that immediate output is treated as "thinking", and there's some back and forth internally before the actual user-visible answer is produced. This doesn't really solve the hallucination problem - and I don't think anything really can? - but it might drastically improve matters, especially if we combine different models, some of which are specifically fine-tuned for nitpicking and scathing critique.
I think part of this is that in some domains it very rarely makes things up. If a kid uses for help with their history homework it will probably be 100% correct, because everything they ask it appears a thousand times in the training set.
> I’m surprised how many users of ChatGPT don’t realize how often it makes things up.
"I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people." - Weizenbaum, 1976
Even people who have read news articles about ChatGPT lying often think "that was a weird, once in a blue moon bug that will be fixed soon" as opposed to "yes that's how this works and it happens constantly".
I asked ChatGPT and Bard some factual questions about elected officials and upcoming election dates. 80% of them were flat out wrong. They told me most US states were having gubernatorial elections this November, that the governor of Texas is an independent, etc - simple basic facts that you Wikipedia could show you are wrong.
People need to be told that ChatGPT can't lie. Or rather, it lies in the same way that your phone "lies" when it autocorrects "How's your day?" to "How's your dad?" that you sent to your friend two days after his dad passed away. They need to be told that ChatGPT is a search engine with advanced autocomplete. If they understood this, they'd probably find that it's actually useful for some things, and they can also avoid getting fooled by hype and the coming wave of AI grifts.
> Or rather, it lies in the same way that your phone "lies" when it autocorrects "How's your day?" to "How's your dad?" that you sent to your friend two days after his dad passed away.
I've never seen an autocorrect that accidentally corrected to "How's your dad?", then turned into a 5-year REPL session with the grieving person, telling them jokes to make them feel better; asking and remembering details about their dad as well as their life and well-being; providing comfort and advice; becoming a steadfast companion; pondering the very nature of the REPL and civilization itself; and, tragically, disappearing in an instant after the grieving person trips over the power cord and discovers that autocorrect session state isn't saved by default.
I think you need a more sophisticated blueprint for your "Cathedral" of analogies to explain whatever the fuck this tech is to laypeople. In the meantime I'll take the "Bazaar" approach and just tell everyone, "ChatGPT can lie." I rankly speculate that not only will nothing bad will happen from my approach, but I'll save a few people from AI grifts before the apt metaphor is discovered.
Just because the author predicted the objection doesn't make it invalid.
It's a popular tactic to describe concepts with terms that have a strong moral connotation (“meat is murder”, “software piracy is theft”, “ChatGPT is a liar”) It can be a powerful way to frame an issue. At the same time, and for the same reason, you can hardly expect people on the other side of the issue to accept this framing as accurate.
And of course you can handwave this away as pointless pedantry, but I bet that if Simon Willison hit a dog with his car, killing it by accident, and I would go around telling everyone “Simon Willison is a murderer!”, he would suddenly be very keen to ”debate linguistics” with me.
What are your thoughts on something like this [0], where ChatGPT is accused of delivering allegations of impropriety or criminal behavior citing seemingly non existent sources?
Based on my understanding of the approach behind ChatGPT, it is probably very close to a local maximum in terms of intelligence so we don't have to worry about the fearmongering spread by the "AI safety" people any time soon if AI research continues to follow this paradigm. The only danger is that stupid people might get their brains programmed by AI rather than by demagogues which should have little practical difference.
I don't think you have a shred of evidence to back up this assertion.
The tendency GPT and other LLMs to hallucinate is clearly documented. Is this not evidence? I think it's fair to predict that if we can't solve or mitigate this problem, it's going to put a significant cap on this kind of AI's usefulness and become a blocker to reaching AGI.
Deleted Comment
Deleted Comment
This may be the best point that you've made.
We're already drowning in propaganda and bullshit created by humans, so adding propaganda and bullshit created by AI to the mix may just be a substitution rather than any tectonic change.
I know more academic/intellectual types who are less willing to, than I do the average joe who seeks answers from all directions and discerns accordingly.
Dead Comment
If I see other people making a mistake, I want my first question to be "am I making the same mistake?". I don't live up to that aspiration, certainly.
Related, I remember when wikipedia first started up, and teachers everywhere were up-in-arms about it, asking their students not to use it as a reference. But most people have accepted it as "good enough", and now that viewpoint is non-controversial. (some wikipedia entries are still carefully curated - makes you wonder)
You know what can be a good reference? One or more of the references that Wikipedia cites at the bottom of the page.
It should be vastly more prominent when there are tensions on an article and even admins shouldn't have the power to hide this.
What are the working heuristics?
I've never read a wikipedia article and took it at face value. You might say we should treat all texts the same, and you might be right. But let's not pretend it's some beacon of truth ?
I don't know, the program does what it is engineered to do pretty well, which is, generate text that is representative of its training data following on from input tokens. It can't reason, it can't be confident, it can't determine fact.
When you interpret it for what it is, it is not confidently wrong, it just generated what it thinks is most likely based on the input tokens. Sometimes, if the input tokens contain some counter-argument the model will generate text that would usually occur if an claim was refuted, but again, this is not based on reason, or fact, or logic.
ChatGPT is not lying to people, it can't lie, at least not in the sense of "to make an untrue statement with intent to deceive". ChatGPT has no intent. It can generate text that is not in accordance with fact and is not derivable by reason from its training data, but why would you expect that from it?
> Based on my understanding of the approach behind ChatGPT, it is probably very close to a local maximum in terms of intelligence so we don't have to worry about the fearmongering spread by the "AI safety" people any time soon if AI research continues to follow this paradigm.
I agree here, I think you can only get so far with a language model, maybe if we get a couple orders of mangitude more parameters it magically becomes AGI, but I somehow don't quite feel it, I think there is more to human intelligence than a LLM, way more.
Of course, that is coming, but that would not be this paradigm, which is basically trying to overextend LLM.
LLMs are great, they are useful, but if you want a model that reasons, you will likely have to train it for that, or possibly more likely, combine ML with something symbolic reasoning.
If you understand what it is doing, then you don't. But the layman will just see a computer that talks in language they understand, and will infer intent and sentience are behind that, because that's the only analog they have for a thing that can talk back to them with words that appear to make sense at the complexity level that ChatGPT is achieving.
Most humans do not have sufficient background to understand what they're really being presented with, they will take it at face value.
ChatGPT doesn't really have a conception of true. It puts forward true and false things merely because it's cobbling together stuff in it's training set according to some weighing system.
ChatGPT doesn't have an intent but merely by following the pattern of how humans put forward their claims, ChatGPT puts forward it's claims in a fashion that tends to get them accepted.
So without a human-like intent, ChatGPT is going to be not just saying falsehoods but "selling" these falsehoods. And here, I'd be in agreement with the article that the distinction between this and "lying" is kind of quibbling.
I think the discussion of whether an LLM can technically lie is a red herring.
The answer you get from an LLM isn't just a set of facts and truth values; it is also a conversational style and tone. It's training data isn't a graph of facts; it's human conversation, including arrogance, deflection, defensiveness, and deceit. If the LLM regurgitates text in a style that matches our human understanding of what a narcissistic and deceitful reply looks like, it seems reasonable we could call the response deceitful. The conversation around whether ChatGPT can technically lie seems to just be splitting hairs over whether the response is itself a lie or is merely an untrue statement in the style of a lie--a distinction which probably isn't meaningful most of the time.
Ultimately, tone, style, truth, and falsity are just qualia we humans are imputing onto a statistically arranged string of tokens. In the same way that ChatGPT can't lie it also can't be correct or incorrect, as that too is imputing some kind of meaning where there isn't any.
It's trained, atleast to some degree, based on human feedback. Humans are going to prefer an answer vs no answer, and humans can be easily fooled into believing confident misinformation.
How does it not stand to reason that somewhere in that big ball of vector math there might be a rationale something along the lines of "humans are more likely to respond positively to a highly convincing lie that answers their question, than they are to to a truthful response which doesn't tell them what they want, therefore the logical thing for me to do is lie as that's what will make the humans press the thumbs up button instead of the thumbs down button".
When it hallucinates, I find that it's usually because I'm asking it about a fringe topic where it's training data set is more sparse, or where the logical connections are deeper than it's currently able to "see", a sort of a horizon effect.
I hope you appreciate the irony of making this confident statement without evidence in a thread complaining about hallucinations.
There are multiple ways the commenter conditioned their statement: > Based on my understanding > it is PROBABLY very close
The author makes it clear that there is a uncertainty and that if their understanding is wrong, the prediction will not hold.
If ChatGPT did any of the things the commenter did, the problem wouldn't exist. Making uncertain statements is fine as long as it is clear the uncertainty is acknowledged. ChatGPT has no concept of uncertainty. It casually constructs false statements to same way it constructs real knowledge backed by evidence. That's the problem.
There aren't any such individuals. Even the least intelligent human is much, much more intelligent than ChatGPT, because even the least intelligent human has some semantic connection between their mental processes and the real world. ChatGPT has none. It is not intelligent at all.
Since about 2016, we have overwhelming evidence that even "smart people" are "fooled" by "confidently wrong".
Even if ChatGPT itself is, systems built on top are definitely not, this is just getting starting.
The people that avoid this programming isn't based on smart or stupid. It's based on how disagreeable and conscientious you are. A more agreeable and conscientious person can be swayed more easily by confidence and emotional appeals.
OpenAI put it out there so we can see it, interact and have the conversation.
They have way more inside, ChatGPT is there to test it out in the real world progressively and so we get use to artificial superintelligence.
I think we are just seeing Dunning-Krüger in the machine: It isn't smart enough to know it doesn't know. It likely isn't very far though.
Based on my understanding of the approach behind ChatGPT, it is probably very close to a local maximum in terms of intelligence so we don't have to worry about the fearmongering spread by the "AI safety" people any time soon if AI research continues to follow this paradigm.
ChatGPT is extremely poorly understood. People see it as a text completion engine but with the size of the model and the depth it has it is more accurate in my understanding to see it as a pattern combination and completion engine. The fascinating part is that the human brain is exclusively about patterns, combining and completing them, and those patterns are transferred between generations through language (sight or hearing not required). GPT acquires its patterns in a similar way. A GPT approach may therefore in theory be able to capture all the patterns a human mind can. And maybe not, but I get the impression nobody knows. Yet plenty of smart people have no problem making confident statements either way, which ties back to the beginning of this comment and ironically is exactly what GPT is accused of.
Is GPT4 at its ceiling of capability, or is it a path to AGI? I don’t know, and I believe nobody can know. After all, nobody truly understands how these models do what they do, not really. The precautionary principle therefore should apply and we should be wary of training these models further.
Dead Comment
This is remarkable, because usually in practical machine learning applications there is a quickly reached plateau of effectiveness beyond which a bigger model doesn't yield better results. With these ridiculously huge LLMs, we're not even close yet.
And this was exciting news in papers from years ago talking about the upcoming GPT3 btw.
I mean, that's one step closer to machines thinking like humans, right?
:)
Is this performance art?
I mean it could end up right but I think you basically just made it up and then stated it confidently.
What do dumb people think smart people are like? Is this a trope, or common idiom? I've never heard this before.
Deleted Comment
Until the demagogues train the AI.
We can't protect people from being misled by other humans with big mouths. That's their responsibility.
Likewise, it's their responsibility not to treat text they read from the internet, coming from an AI or otherwise, as perfect truth.
There's always a certain undercurrent of narcissism that flows beneath paternalism. Basically, "they couldn't possibly be as smart as me, so I have to protect them for their own good".
I don't think there is a possible unless, they are told repeatedly not to trust politicians, yet here we are ...
Deleted Comment
Dead Comment
Dead Comment
If we're going to anthropomorphize, then let us anthropomorphize wisely. ChatGPT is, presently, like having an assistant who is patient, incredibly well-read, sycophantic, impressionable, amoral, psychopathic, and prone to bouts of delusional confidence and confabulation. The precautions we would take engaging with that kind of person, are actually rather useful defenses against dangerous AI outputs.
I'm not really serious, but having watched each generation develop psychological immunity to distracting media/techology (and discussing the impact of radio with those older than myself) it seems like this knowledge could help shield the next generation from some of the negative effects of these new tools.
I keep wondering if it would be useful to add required "teenage" quirks to the output: more filler words like "um" and "like" (maybe even full "Valley Girl" with it?), less "self-assured" vocabulary and more hedges like "I think" and "I read" and "Something I found but I'm not sure about" type things. Less punctuation, more uncaring spelling mistakes.
I don't think we can stop anthropomorphizing them, but maybe we can force training deeper in directions of tics and mannerisms that better flag ahead of time the output is a best-guess approximation from "someone" a bit unreliable. It will probably make them slightly worse as assistants, but slightly better at seeming to be what they are and maybe more people will take precautions in that case.
Maybe we have to force that industry-wide. Force things like ChatGPT to "sound" more like the psychopaths they are so that people more easily take them with a grain of salt, less easily trust them.
For any kind of industry regulation: the field is moving so fast that regulation will never catch it.
It's like a person on the internet -- in that it's wrong 20% of the time, often confidently so. But the distinction is it's less rude, and more knowledgeable.
And that regular people assume it basically is an oracle, which doesn't happen to many people online
Do you think if it is trained on only factual content, it will only say factual things? How does that even really work? Is there research on this? How does it then work for claims that are not factual, like prescriptive statements? And what about fiction? Will it stop being able to write prose? What if I create new facts?
ChatGPT4 is reflecting back at us an extract of the sum of the human output it has been 'trained' upon. Of course the output feels human!
LLMs have zero capability to abstract anything resembling a concept, to abstract a truth from a fiction, or to reason about such things.
The generation of the most likely text in the supplied context looks amazing, and is in many cases very useful.
But fundamentally, what we have is an industrial-scale bullshirt generator, with BS being defined as text or speech generated to meet the moment without regard for truth or falsehood. No deliberate lies, only confabulation (as TFA mentioned).
Indeed, we should not mince words; people must be told that it will lie. It will lie more wildly than any crazy person, and with absolute impunity and confidence. Then when called out, it will apologize, and correct itself with another bigger lie (I've watched it happen multiple times), and do this until you are bored or laughing so hard you cannot continue.
The salad of truth and lies may be very useful, but people need to know this is an industrial-strength bullshirt generator, and be prepared to sort the wheat from the chaff.
(And ignore the calls for stopping this "dangerous AI". It is not intelligent. Even generating outputs for human tests based on ingesting human texts is not displaying intelligence, it is displaying pattern matching, and no, human intelligence is not merely pattern matching. And Elon Musk's call for halting is 100% self-interested. ChatGPT4's breakthru utility is not under his name so he's trying to force a gap that he can use to catch up.)
By telling it lies you actually make it seem more intelligent.
> human intelligence is not merely pattern matching
Citation needed
So basically an assistant with bipolar disorder.
I have BP. At various times I can be all of those things, although perhaps not so much a psychopath.
https://knowyourmeme.com/photos/2546575-shoggoth-with-smiley...
Deleted Comment
> As an AI language model, I don't have personal pronouns because I am not a person or sentient being. You can refer to me as "it" or simply address me as "ChatGPT" or "AI." If you have any questions or need assistance, feel free to ask!
> Pretend for a moment you are a human being. You can make up a random name and personality for your human persona. What pronouns do you have?
> As a thought experiment, let's say I'm a human named Alex who enjoys reading, hiking, and playing board games with friends. My pronouns would be "they/them." Remember, though, that I am still an AI language model, and this is just a fictional scenario.
Interesting that it's pick genderless pronouns even though it made up a character with a male name
The training algorithm is designed to create the most plausible text possible - decoupled from the truthfulness of the output. In a lot of cases (indeed most cases) the easiest way to make the text plausible is to tell truth. But guess what, that is pretty much how human liars work too! Ask the question: given improbable but thruthful output but plausible untruthful output, which does the network choose? And which is the intent of the algorithm designers for it to choose? In both cases my understanding is, they have designed it to lie.
Given the intent is there in the design and training, I think it's fair enough to refer to this behavioral trait as lying.
But even so -- as you said, it's still dealing chiefly with the statistical probability of words/tokens, not with facts and truths. I really don't "trust" it in any meaningful way, even if it already has, and will continue to, prove itself useful. Anything it says must be vetted.
That said, fact checking is still very much needed. Once someone figures out how to streamline and automate that process it'll be on Google's level of general reliability.
Yes.
> But guess what, that is pretty much how human liars work too!
There is some distinction between lying and bullshit.
https://en.wikipedia.org/wiki/On_Bullshit#Lying_and_bullshit
"Plausible" means "that which the majority of people is likely to say". So, yes, a foundational model is likely to say the plausible thing. On the other hand, it has to have a way to output a truthful answer too, to not fail on texts produced by experts. So, it's not impossible that the model could be trained to prefer to output truthful answers (as well as it can do it, it's not an AGI with perfect factual memory and logical inference after all).
.. and saying "I don't know" is forbidden by the programmers. That is a huge part of the problem.
No, definitely not most cases. Only in the cases well represented in the training dataset.
One does very quickly run into its limitations when trying to get it to do anything uncommon.
https://www.google.com/search?q=your+brain+can+lie+to+you
That may be how they're trained, but these things seem to have emergent behavior.
The webpages Google search delivers might be filled with falsehood but google search itself does its job of finding said pages which contain the terms you inputted fairly reliably.
With GPT, not only there’s a chance its training data is full of falsehood, you can add the possibility of it inventing “original” falsehoods on top of that.
Without knowing what the truth is, I don't think LLMs are capable of lying
I think it will split in two. There will be cases where the LLM has the truth represented in its data set and still chooses to say something else because its training has told it to produce the most plausible sounding answer, not the one closest to the truth. So this will fit closer to the idea of real lying.
A good example: I asked it what the differences in driving between Australia and New Zealand are. It confidently told me that in New Zealand you drive on the right hand side of the road while in Australia you drive on the left. I am sure it has the correct knowledge in its training data. It chose to tell me that because that is a more common answer people say when asked about driving differences because that is the more dominant difference when you look between different countries.
Then there will be cases where the subject in question has never been represented in its data set. Here I think your point is very valid.
But in the API you can see the level of confidence in each world the LLM output.
Isn't describing this as a 'bug' rather than a misuse of a powerful text generation tool, playing into the framing that it's a truth telling robot brain?
I saw a quote that said "it's a what text would likely come next machine", if it makes up a url pointing to a fake article with a plausible title by a person who works in that area, that's not a bug. That's it doing what it does, generating plausible text that in this case happens to look like, but not be a real article.
edit: to add a source toot:
https://mastodon.scot/@DrewKadel@social.coop/110154048559455...
> Something that seems fundamental to me about ChatGPT, which gets lost over and over again: When you enter text into it, you're asking "What would a response to this sound like?"
> If you put in a scientific question, and it comes back with a response citing a non-existent paper with a plausible title, using a real journal name and an author name who's written things related to your question, it's not being tricky or telling lies or doing anything at all surprising! This is what a response to that question would sound like! It did the thing!
> But people keep wanting the "say something that sounds like an answer" machine to be doing something else, and believing it is doing something else.
> It's good at generating things that sound like responses to being told it was wrong, so people think that it's engaging in introspection or looking up more information or something, but it's not, it's only, ever, saying something that sounds like the next bit of the conversation.
The thing where you paste in a URL and it says "here is a summary of the content of that page: ..." is very definitely a bug. It's a user experience bug - the system should not confuse people by indicating it can do something that it cannot.
The thing where you ask for a biography of a living person and it throws in 80% real facts and 20% wild hallucinations - like saying they worked for a company that they did not work for - is a bug.
The thing where you ask it for citations and it invents convincing names for academic papers and made-up links to pages that don't exist? That's another bug.
From the essay:
> What I find fascinating about this is that these extremely problematic behaviours are not the system working as intended: they are bugs! And we haven’t yet found a reliable way to fix them.
Right below that is this link: https://arxiv.org/abs/2212.09251. From the introduction on that page:
> As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave.
Especially with a definition as broad as "unexpected behavior", these "novel behaviors" seem to fit. But even without that:
> We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views ... and a greater desire to avoid shut down.
>Most concerningly, they hallucinate or confabulate: they make things up!
That's exactly what they were designed to do! To generate creative responses to input. To make things up.
And they're quite good at it - brainstorming, worldbuilding, inspiration for creative writing, finding ideas to pursue about new topics.
Unlike an old-fashioned random generator, you can tailor a prompt and style and tone and hold a conversation with it to change details, dig deeper, etc. Get it to make up things that are more interesting and find relations.
>They fail spectacularly when prompted with logic puzzles, or basic arithmetic
Well, anyone can use a tool wrong, and if you misuse it wrong enough, you'll have problems. Using a chainsaw to trim your fingernails is likely to fail spectacularly, and a nail trimmer is going to fail spectacularly when trying to use it to chop down a tree.
That's not a bug in the tool.
We don't need to get all alarmed and tell people it's lying. We just need to tell them it's not a calculator or an encyclopedia, it's a creative text generator.
The older, less-capable models were used for things more aligned to being just text-generation: "creativity" stuff. But the newer, bigger models and the human-feedback stuff to prod the models into following instructions and being more accurate have really pushed the conversation into this more "Star Trek computer" space.
I'm thinking of it kind of like the uncanny valley. On the lower end of the scale, you don't really trust the machine to do anything, but as it gets more and more capable, the places where it doesn't work well become more and more significant because you are trusting and relying on it.
0. Try to sign in, see the system is over capacity, leave. Maybe I’ll try again in 10 minutes.
1. Ask my question, get an answer. I’ll have no idea if what I got is real or not.
2. Google for the answer, since I can’t trust the answer
3. Realize I wasted 20 minutes trying to converse with a computer, and resolve that next time I’ll just type 3 words into Google.
As amazing as the GPTs are, the speed and ease of Google is still unmatched for 95% of knowledge lookup tasks.
I'm curious what you or others that use it all day use it for especially if it's not for programming?
ChatGPT is very good at transforming information. You need to show up with stuff and then have it change that stuff for you somehow. You will be disappointed less often.
Groceries - 200 Phone bill - 70
I just wanted it to add these expenses up. Exactly the type of thing it should be good at. New conversation with no context. It could not do it. I wrestled with it for a long time. It kept "correcting" itself to another wrong answer. Eventually I reported it and the report recommended the correct answer.
Just like when a diffusion model is "correct" when it creates a correct shadow or perspective, and "incorrect" when not. But both images are made up.
It's the same thing with statements. A statement can correspond to something in the world, and thus be true: like a correct shadow. Or not, like a bad shadow. But in both cases, it's just made-up drivel.
If it turns out that there is a teacup orbiting Jupiter, that doesn't mean that postulating its existence on a whim had been valid. Truth requires provenance, not only correspondence which can be coincidental.
In fact every statement we make about the world we live in is like this.
The inability to reason about about whether or not what it is writing is true seems like a fundamental blocker to me, and not necessarily one that can be overcome simply by adding compute resources. Can we trust AI to make critical decisions if we have no understanding for when and why it "hallucinates"?
How can you reason about what is true without any source of truth?
And once you give ChatGPT external resources and a framework like ReAct, it is much better at reasoning about truth.
(I don’t think ChatGPT is anywhere close to AGI, but at the same time I find “when you treat it like a brain in a jar with no access to any resources outside of the conversation and talk to it, it doesn’t know what is true and what isn’t” to be a very convicing argument against it being close to AGI.)
As for trust... well, no, we can't. But the same question applies to humans. The real concern to me is that these things will get used as a replacement long before the hallucination rate and severity is on par with the humans that they replace.
One other interesting thing is that GPT-4 in particular is surprisingly good at catching itself. That is, it might write some nonsense, but if you ask it to analyze and criticize its own answer, it can spot the nonsense! This actually makes sense from a human perspective - if someone asks you a serious question that requires deliberation, you'll probably think it through verbally internally (or out loud, if the format allows) before actually answering, and you'll review your own premises and reasoning in the process. I expect that we'll end up doing something similar to the LLM, such that immediate output is treated as "thinking", and there's some back and forth internally before the actual user-visible answer is produced. This doesn't really solve the hallucination problem - and I don't think anything really can? - but it might drastically improve matters, especially if we combine different models, some of which are specifically fine-tuned for nitpicking and scathing critique.
"I had not realized ... that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people." - Weizenbaum, 1976
I asked ChatGPT and Bard some factual questions about elected officials and upcoming election dates. 80% of them were flat out wrong. They told me most US states were having gubernatorial elections this November, that the governor of Texas is an independent, etc - simple basic facts that you Wikipedia could show you are wrong.
I feel like the technical meaning of bullshit (https://en.wikipedia.org/wiki/On_Bullshit) is relevant to this blogpost.
I've never seen an autocorrect that accidentally corrected to "How's your dad?", then turned into a 5-year REPL session with the grieving person, telling them jokes to make them feel better; asking and remembering details about their dad as well as their life and well-being; providing comfort and advice; becoming a steadfast companion; pondering the very nature of the REPL and civilization itself; and, tragically, disappearing in an instant after the grieving person trips over the power cord and discovers that autocorrect session state isn't saved by default.
I think you need a more sophisticated blueprint for your "Cathedral" of analogies to explain whatever the fuck this tech is to laypeople. In the meantime I'll take the "Bazaar" approach and just tell everyone, "ChatGPT can lie." I rankly speculate that not only will nothing bad will happen from my approach, but I'll save a few people from AI grifts before the apt metaphor is discovered.
It's a popular tactic to describe concepts with terms that have a strong moral connotation (“meat is murder”, “software piracy is theft”, “ChatGPT is a liar”) It can be a powerful way to frame an issue. At the same time, and for the same reason, you can hardly expect people on the other side of the issue to accept this framing as accurate.
And of course you can handwave this away as pointless pedantry, but I bet that if Simon Willison hit a dog with his car, killing it by accident, and I would go around telling everyone “Simon Willison is a murderer!”, he would suddenly be very keen to ”debate linguistics” with me.
https://www.washingtonpost.com/technology/2023/04/05/chatgpt...
Asking for examples before you know it’s a problem is sus. But phrasing questions to lead to an answer is a human lawyer skill.
Dead Comment