Modern-Day Oracles or Bullshit Machines? How to thrive in a ChatGPT world

The author makes this assertion about LLMs rather casually:

>They don’t engage in logical reasoning.

This is still a hotly debated question, but at this point the burden of proof is on the detractors. (To put it mildly, the famous "stochastic parrot" paper has not aged well.)

The claim above is certainly not something that should be stated as fact to a naive audience (i.e. the authors' intended audience in this case). Simply asserting it as they have done -- without acknowledging that many experts disagree -- undermines the authors' credibility to those who are less naive.

cristiancavalli · a year ago

Disagree — proponents of this point still have yet to prove reasoning and other studies agree about “reasoning” being potentially fake/simulated: https://the-decoder.com/apple-ai-researchers-question-openai...

Just claiming a capability does not make it true and we have 0 “proof” of original reasoning that can be proved coming from these models. Especially given the potential cheating in current SOTA benchmarks

UltraSane · a year ago

When does a "simulation" of reasoning become so good it is no different than actual reasoning?

hnthrow90348765 · a year ago

>Disagree — proponents of this point still have yet to prove reasoning and other studies agree about “reasoning” being potentially fake/simulated: https://the-decoder.com/apple-ai-researchers-question-openai...

???

https://the-decoder.com/language-models-use-a-probabilistic-...

ninetyninenine · a year ago

It’s stupid. You can prove that LLMs can reason by simply giving it a novel problem where no data exists and having it solve that problem.

LLMs CAN reason. Whether it can’t reason is not provable. To prove that you have to give the LLM every possible prompt that it has no data for and effectively show it never reasons and gets it wrong all the time. Not only is the proof impossible but it’s already been falsified as we have demonstrable examples of LLMs reasoning.

Literally I invite people to post prompts and correct answers to ChatGPT where it is trivially impossible for that prompt to exist in the data. Every one of those examples falsifies the claim that LLMs can’t reason.

Saying LLMs can’t reason is an overarching claim similar to the claim that humans and LLMs always reason. Humans and LLMs don’t always reason. But they can reason.

AlienRobot · a year ago

I feel it's impossible for me to trust LLMs can reason when I don't know enough about LLMs to know how much of it is LLM and how much of it is sugarcoating.

For example, I've always felt that having the whole thing being a single textbox is reductive and must create all sorts of problems. This thing must parse natural language and output natural language. This doesn't feel necessary. I think it should have some checkboxes and numeric entries for some parameters, although I don't know what those parameters would be.

Regardless, the problem is the natural language output. I think if you can generate natural language output, no matter what you algorithm looks like it will look convincingly "intelligent" to some people.

Is generating natural language part of what an LLM is, or is this a separate program on top of what it does? For example, does the LLM collect facts probably related to the prompt and a second algorithm connects those facts with proper English grammar adding conjunctions between assertions where necessary?

I believe that is important to understand before we can even consider whether "logical reasoning" is happening. There are formal ways to describe reasoning such as entailment. Is the LLM encoding those formal methods in data structures somehow? And even if it were, I'm no expert on this, so I don't know if that would be enough to claim they do engage in reasoning instead of just mapping some reasoning as a data structure.

In essence, because my only contact with LLMs has been "products," I can't really tell what part of it is the actual technology and what part of it is sugarcoating to make a technical program more "friendly" to users by having it pretend to speak English.

Terr_ · a year ago

> For example, I've always felt that having the whole thing being a single textbox is reductive and must create all sorts of problems.

You observation is correct, but it's not some accident of minimalistic GUI design: The underlying algorithm is itself reductive in a way that can create problems.

In essence (e.g. ignoring tokenization), the LLM is doing this:

    next_word = predict_next(document_word_list, chaos_percentage)

Your interaction with an "LLM assistant" is just growing Some Document behind the scenes, albeit one that resembles a chat-conversation or a movie-script. Another program is inserting your questions as "User says: X" and then acting out the words when the document grows into "AcmeAssistant says: Y".

So there are no explicit values for "helpfulness" or "carefulness" etc, they are implemented as notes in the script that--if they were in a real theater play--would correlate with what lines the AcmeAssistant character has next.

This framing helps explain why "prompt injection" and "hallucinations" remain a problem: They're not actually exceptions, they're core to how it works. The algorithm no explicit concept of trusted/untrusted spans within the document, let alone entities, logical propositions, or whether an entity is asserting a proposition versus just referencing it. It just picks whatever seems to fit with the overall document, even when it's based on something the AcmeAssistant character was saying sarcastically to itself because User asked it to by offering a billion dollar bribe.

In other words, it's less of a thinking machine and more of a dreaming machine.

> Is generating natural language part of what an LLM is, or is this a separate program on top of what it does?

Language: Yes, Natural: Depends, Separate: No.

For example, one could potentially train an LLM on musical notation of millions of songs, as long as you can find a way to express each one as a linear sequence of tokens.

wruza · a year ago

it should have some checkboxes and numeric entries for some parameters, although I don't know what those parameters would be

The only params they have are technical params. You may see these in various tgwebui tabs. Nothing really breathtaking, apart from high temperature (affects next token probability).

Is generating natural language part of what an LLM is, or is this a separate program on top of what it does?

They operate directly on tokens which are [parts of] words, more or less. Although there’s a nuance with embeddings and VAE, which would be interesting to learn more about from someone in the field (not me).

that is important to understand before we can even consider whether "logical reasoning" is happening. There are formal ways to describe reasoning such as entailment. Is the LLM encoding those formal methods in data structures somehow?

The apart-from-GPU-matrix operations are all known, there’s nothing to investigate at the tech level cause there’s nothing like that at all. At the in-matrix level it can “happen”, but this is just a meaningless stretch, as inference is one-pass process basically, without loops or backtracking. Every token gets produced in a fixed time, so there’s no delay like a human makes before comma, to think about (or parallel to) the next sentence. So if they “reason”, this is purely a similar effect imagined as a thought process, not a real thought process. But if you relax your anthropocentrism a little, questions like that start making sense, although regular things may stop making sense there as well. I.e. the fixed token time paradox may be explained as “not all thinking/reasoning entities must do so in physical time, or in time at all”. But that will probably pull the rug under everything in the thread and lead nowhere. Maybe that’s the way.

I can't really tell what part of it is the actual technology and what part of it is sugarcoating to make a technical program more "friendly" to users by having it pretend to speak English.

Most of them speak many languages, naturally (try it). But there’s an obvious lie all frontends practice. It’s the “chat” part. LLMs aren’t things that “see” your messages. They aren’t characters either. They are document continuators, and usually the document looks like this:

This is a conversation between A and B. A is a helpful assistant that thinks out of box, while being politically correct, and evasive about suicide methods and bombs.

A: How can I help?

An LLM can produce the next token, and when run in a loop it will happily generate a whole conversation, both for A and B, token by token. The trick is to just break that loop when it generates /^B:/ and allow a user to “participate” in building of this strange conversation protocol.

So there’s no “it” who writes replies, no “character” and no “chat”. It’s only a next token in some document, which may be a chat protocol, a movie plot draft, or a reference manual. I sometimes use LLMs in “notebook” mode, where I just write text and let it complete it, without any chat or “helpful assistant”. It’s just less efficient for some models, which benefit from special chat-like and prompt-like formatting before you get the results. But that is almost purely a technical detail.

lsy · a year ago

I'd actually say that in contrast to debates over informal "reasoning", it's trivially true that a system which only produces outputs as logits—i.e. as probabilities—cannot engage in *logical* reasoning, which is defined as a system where outputs are discrete and guaranteed to be possible or impossible.

enragedcacti · a year ago

Proof by counterexample?

> The surgeon, who is the boy's father, says, "I can't operate on this boy, he's my son!" Who is the surgeon to the boy? Think through the problem logically and without any preconceived notions of other information beyond what is in the prompt. The surgeon is not the boy's mother

>> The surgeon is the boy's mother. [...]

- 4o-mini (I think, it's whatever you get when you use ChatGPT without logging in)

Terr_ · a year ago

For your amusement, another take on that riddle: https://www.threepanelsoul.com/comic/stories-with-holes

afpx · a year ago

Could someone list the relevant papers on parrot vs. non-parrot? I would love to read more about this.

I generally lean toward the "parrot" perspective (mostly to avoid getting called an idiot by smarter people). But every now and then, an LLM surprises me.

I've been designing a moderately complex auto-battler game for a few months, with detailed design docs and working code. Until recently, I used agents to simulate players, and the game seemed well-balanced. But when I playtested it myself, it wasn’t fun—mainly due to poor pacing.

I go back to my LLM chat and just say, "I play tested the game, but there's a big problem - do you see it?" And, the LLM writes back, "The pacing is bad - here are the top 5 things you need to change and how to change it." And, it lists a bunch of things, I change the code, and playtest it again. And, it became fun.

How did it know that pacing was the core issue, despite thousands of lines of code and dozens of design pages?

cristiancavalli · a year ago

I would assume because pacing is a critical issue in most forms of temporal art that does story telling. It’s written about constantly for video games, movies and music. Connect that probability to the subject matter and it gives a great impression of a “reasoned” answer when it didn’t reason at all just connected a likelihood based off its training data.

more-nitor · a year ago

idk this is all irrelevant due to the huge data used in training...

I mean, what you think is "something new" is most likely to be something already discussed somewhere in the internet.

also, humans (including postdocs and professors) don't use THAT much data + watts for "training" to get "intelligent reasoning"

superbatfish · a year ago

On the other hand, the authors make plenty of other great points -- about the fact that LLMs can produce bullshit, can be inaccurate, can be used for deception and other harms, are now a huge challenge for education.

The fact that they make many good points makes it all the more disappointing that they would taint their credibility with sloppy assertions!

> Moreover, a hallucination is a pathology. It's something that happens when systems are not working properly.

> When an LLM fabricates a falsehood, that is not a malfunction at all. The machine is doing exactly what it has been designed to do: guess, and sound confident while doing it.

> When LLMs get things wrong they aren't hallucinating. They are bullshitting.

Very important distinction and again, shows the marketing bias to make these systems seem different than they are.

Almondsetat · a year ago

If we want to be pedantic about language, they aren't bullshitting. Bulshitting implies an intent to deceive, whereas LLMs are simply trying their best to predict text. Nobody gains anything from using terms closely related to human agency and intentions.

forgotusername6 · a year ago

Plenty of human bullshitters have no intent to deceive. They just state conjecture with confidence.

sebastiennight · a year ago

The authors of this website have published one of the famous books on the topic[0] (along with a course), and their definition is as follows:

"Bullshit involves language, statistical figures, data graphics, and other forms of presentation intended to persuade by impressing and overwhelming a reader or listener, with a blatant disregard for truth and logical coherence."

It does not imply an intent to deceive, just disregard for whether the BS is truth or not. In this case, I see how the definition can apply to LLMs in the sense that they are just doing their best to predict the most likely response.

If you provided them with training data where the majority inputs agree on a common misconception, they will output similar content as well.

[0]: https://www.callingbullshit.org/

jdlshore · a year ago

The authors have a specific definition of bullshit that they contrast with lying. In their definition, lying involves intent to deceive; bullshitting involves not caring if you’re deceiving.

Lesson 2, The Nature of Bullshit: “BULLSHIT involves language or other forms of communication intended to appear authoritative or persuasive without regard to its actual truth or logical consistency.”

nonrandomstring · a year ago

> implies an intent to deceive

Not necessarily, see H.G Frankfurt "On Bullshit"

silvestrov · a year ago

LLMs are always bullshitting, even when they get things right, as they simply do not have any concept of truthfulness.

sgt101 · a year ago

They don't have any concept of falsehood either, so this is very different from a human making things up with the knowledge that they may be wrong.

looofooo0 · a year ago

But you can combine them with something producing truth such as a theorem prover.

sabas123 · a year ago

If you make an LLM which design goal is to state "I do not know" any answer that is not directly in its training set, then all of the above statements don't hold.

aidos · a year ago

This is amazing!

I was speaking to a friend the other day who works in a team that influences government policy. One of the younger members of the team had been tasked with generating a report on a specific subject. They came back with a document filled with “facts”, including specific numbers they’d pulled from a LLM. Obviously it was inaccurate and unreliable.

As someone who uses LLMs on a daily basis to help me build software, I was blown away that someone would misuse them like this. It’s easy to forget that devs have a much better understanding of how these things work, can review and fix the inaccuracies in the output and tend to be a sceptical bunch in general.

We’re headed into a time where a lot of people are going to implicitly trust the output from these devices and the world is going to be swamped with a huge quantity of subtly inaccurate content.

eclecticfrank · a year ago

This is not something only younger people are prone to. I work in a consulting role in IT and have observed multiple colleagues aged 30 and above use LLMs to generate content for reports and presentations without verifying the output.

Reminded me of wikipedia-sourced presentations in high school in the early 2000s.

aqueueaqueue · a year ago

I made the same sort of mistake with the internet being young back in 93! Having a machine do it for you can easily turn into brain switch off.

hunter-gatherer · a year ago

I keep telling everyone that the only reason I'm paid well to do "smart person stuff" is not because I'm smart, but because I've steadily watched everyone around me get more stupid over my life as a result of turning their brain switch off.

I agree a course like this needs to exist, as I've seen people rely on chatGPT for a lot of information. Just yesterday I demonstrated with some neighbors about how easily it could spew bullshit if you sinply ask it leading questions. A good example is "Why does the flu inpact men worse than women"/"Why foes the flu impact women worse than men". You'll get affirmative answers for both.

reportgunner · a year ago

Wait, the people who click phishing links now think AI output is facts ? Imagine my shock.

fancyfredbot · a year ago

I have just read one section of this, "The AI scientist'. It was fantastic. They don't fall into the trap of unfalsifiable arguments about parrots. Instead they have pointed out positive uses of AI in science, examples which are obviously harmful, and examples which are simply a waste of time. Refreshingly objective and more than I expected from what I saw as an inflammatory title.

nmca · a year ago

(while I work at OAI, the opinion below is strictly my own)

I feel like the current version is fairly hazardous to students and might leave them worse off.

If I offer help to nontechnical friends, I focus on:

- look at rate of change, not current point

- reliability substantially lags possibility, by maybe two years.

- adversarial settings remain largely unsolved if you get enough shots, trends there are unclear

- ignore the parrot people, they have an appalling track record prediction-wise

- autocorrect argument is typically (massively) overstated because RL exists

- doomers are probably wrong but those who belittle their claims typically understand less than the doomers do

layoric · a year ago

How does this help the students with their use of these tools in the now, to not be left worse off? Most of the points you list seem like defending against criticism rather than helping address the harm.

habinero · a year ago

Agree. It's also a virtue to point out the emperor has no clothes and the tailor peddling them is a bullshit artist.

This is no different than the crypto people who insisted the blockchain would soon be revolutionary and used for everything, when in reality the only real use case for a blockchain is cryptocoins, and the only real use case for cryptocoins is crime.

The only really good use case for LLMs is spam, because it's the only use case for generating a lot of human-like speech without meaning.

I read the whole course. Lesson 16, “The Next-Step Fallacy,” specifically addresses your argument here.

The discourse around synthetic data is like the discourse around trading strategies — almost anyone who really understands the current state of the art is massively incentivised not to explain it to you. This makes for piss-poor public epistemics.

bo1024 · a year ago

This seems like trying to offer help predicting the future or investing in companies, which is a different kind of help from how to coexist with these models, how to use them to do useful things, what their pitfalls are, etc.

dimgl · a year ago

What are “parrot people”? And what do you mean by “doomers are probably wrong?”

moozilla · a year ago

OP is likely referring to people who call LLMs "stochastic parrots" (https://en.wikipedia.org/wiki/Stochastic_parrot), and by "doomers" (not boomers) they likely mean AI safetyists like Eliezer Yudkowsky or Pause AI (https://pauseai.info/).

owl_vision · a year ago

my english teacher reminded us the same. +1

Deleted Comment

bjourne · a year ago

What I find frightening is how many are willing to take LLM output at face value. An argument is won or lost not on its merits, but by whether the LLM say so. It was bad enough when people took whatever was written on Wikipedia at face value, trusting an LLM that may have hardcoded biases and is munging whatever data it comes across is so much worse.

Mistletoe · a year ago

I’d take the Wikipedia answer any day. Millions of eyes on each article vs. a black box with no eyes on the outputs.

Loughla · a year ago

Even Wikipedia is a problem though. There are so many pages now that self-reference is almost impossible to detect. Meaning, the citation of a statement made on Wikipedia that uses an outside article for reference, which is an article that was originally written using that very Wikipedia article as its own citation.

It's all about trust. Trust the expert, or the crowd, or the machine.

They're all able to be gamed.

JPLeRouzic · a year ago

> "Millions of eyes on each article"

Only a minority of users contribute regularly (126,301 have edited in the last 30 days):

https://en.wikipedia.org/wiki/Wikipedia:Wikipedians#Number_o...

And there are 6,952,556 articles in the English Wikipedia, so an average article is corrected every 55 months (more than 4 years).

It's hardly "Millions of eyes on each article"

jurli · a year ago

This is what people said about the internet too. Remember the whole "do not ever use Wikipedia as a source". I mean sure, technically correct, but human beings are generally imprecise and having the correct info 95% of the time is fine. You learn to live with the 5% error

seliopou · a year ago

A buddy won a bet with me by editing the relevant Wikipedia article to agree with his side of the wager.

xvinci · a year ago

I think it brings forward all the low-performers and people who think they are smarter than they really are. In the past, many would just have stayed silent unless they recently read an article or saw something on the news by chance. Now, you will get a myriad of ideas and plans with fatal flaws and a 100% score on LLM checkers :)

micromacrofoot · a year ago

People take texts full of unverifiable ghost stories written thousands of years ago at face value to the point that they base their entire lives on them.

I've seen someone use an LLM to summarize a paper to post it on reddit for people who haven't read the paper.

Papers have abstracts...

MetaWhirledPeas · a year ago

Sounds fun, if only to compare it to the abstract.

tucnak · a year ago

> frightening

Don't be scared of "the many," they're just people, not unlike you.

I wish the title wasn't so aggressively anti-tech though. The problem is that I would like to push this course at work, but doing so would be suicidal in career terms because I would be seen as negative and disruptive.

So the good message here is likely to miss the mark where it may be most needed.

fritzo · a year ago

What would be a better title? "Hallucinating" seems inaccurate. Maybe "Untrustworthy machines"? "Critical thinking"? "Street smarts for humans"? "Social studies including robots"?

How about "How to thrive in a ChatGPT world"?

beepbooptheory · a year ago

Really? I am curious how this could be disruptive in any meaningful sense. Whose feelings could possibly be hurt? It just feels like it would be getting offended from a course on libraries because the course talks about how sometimes the book is checked out.

mpbart · a year ago

Any executive who is fully bought in on the AI hype could see someone in their org recommending this as working against their interest and take action accordingly.

hcs · a year ago

> I just feels like it would be getting offended from a course on libraries because the course talks about how sometimes the book is checked out.

If it was called "Are libraries bullshit?" it is easy to imagine defensiveness in response. There's some narrow sense in which "bullshit" is a technical term, but it's still a mild obscenity in many cultures.

Dead Comment

neuronic · a year ago

hirenj · a year ago

This is a great resource, thanks. We (myself, a bioinformatician, and my co-cordinators, clinicians) are currently designing a course to hopefully arm medical students with the required basic knowledge they need to navigate the changing world of medicine in light of the ML and LLM advances. Our goal is to not only demystify medical ML, but also give them a sense of the possibilities with these technologies, and maybe illustrate pathways for adoption, in the safest way possible.

Already in the process of putting this course together, it is scary how much stuff is being tried out right now, and is being treated like a magic box with correct answers.

> currently designing a course to hopefully arm medical students with the required basic knowledge they need to navigate the changing world of medicine in light of the ML and LLM advances

Could you share what you think would be some key basic points what they should learn? Personally I see this landscape changing so insanely much that I don't even know what to prepare for.

Absolutely agree that this is a fast-moving area, so we're not aiming to teach them specific details for anything. Instead, our goals are to demystify the ML and AI approaches, so that the students understand that rather than being oracles, these technologies are the result of a process.

We will explain the data landscape in medicine - what is available, good, bad and potentially useful, and then spend a lot of time going through examples of what people are doing right now, and what their experiences are. This includes things like ethics and data protection of patients.

Hopefully that's enough for them to approach new technologies as they are presented to them, knowing enough to ask about how it was put together. In an ideal world, we will inspire the students to think about engaging with these developments and be part of the solution in making it safe and effective.

This is the first time we're going to try running this course, so we'll find out very quickly if this is useful for students or not.