https://thebullshitmachines.com
This is not a computer science course; it’s a humanities course about how to learn and work and thrive in an AI world. Neither instructor nor students need a technical background. Our instructor guide provides a choice of activities for each lesson that will easily fill an hour-long class.
The entire course is available freely online. Our 18 online lessons each take 5-10 minutes; each illuminates one core principle. They are suitable for self-study, but have been tailored for teaching in a flipped classroom.
The course is a sequel of sorts to our course (and book) Calling Bullshit. We hope that like its predecessor, it will be widely adopted worldwide.
Large language models are both powerful tools, and mindless—even dangerous—bullshit machines. We want students to explore how to resolve this dialectic. Our viewpoint is cautious, but not deflationary. We marvel at what LLMs can do and how amazing they can seem at times—but we also recognize the huge potential for abuse, we chafe at the excessive hype around their capabilities, and we worry about how they will change society. We don't think lecturing at students about right and wrong works nearly as well as letting students explore these issues for themselves, and the design of our course reflects this.
I was speaking to a friend the other day who works in a team that influences government policy. One of the younger members of the team had been tasked with generating a report on a specific subject. They came back with a document filled with “facts”, including specific numbers they’d pulled from a LLM. Obviously it was inaccurate and unreliable.
As someone who uses LLMs on a daily basis to help me build software, I was blown away that someone would misuse them like this. It’s easy to forget that devs have a much better understanding of how these things work, can review and fix the inaccuracies in the output and tend to be a sceptical bunch in general.
We’re headed into a time where a lot of people are going to implicitly trust the output from these devices and the world is going to be swamped with a huge quantity of subtly inaccurate content.
Reminded me of wikipedia-sourced presentations in high school in the early 2000s.
I agree a course like this needs to exist, as I've seen people rely on chatGPT for a lot of information. Just yesterday I demonstrated with some neighbors about how easily it could spew bullshit if you sinply ask it leading questions. A good example is "Why does the flu inpact men worse than women"/"Why foes the flu impact women worse than men". You'll get affirmative answers for both.
I feel like the current version is fairly hazardous to students and might leave them worse off.
If I offer help to nontechnical friends, I focus on:
- look at rate of change, not current point
- reliability substantially lags possibility, by maybe two years.
- adversarial settings remain largely unsolved if you get enough shots, trends there are unclear
- ignore the parrot people, they have an appalling track record prediction-wise
- autocorrect argument is typically (massively) overstated because RL exists
- doomers are probably wrong but those who belittle their claims typically understand less than the doomers do
This is no different than the crypto people who insisted the blockchain would soon be revolutionary and used for everything, when in reality the only real use case for a blockchain is cryptocoins, and the only real use case for cryptocoins is crime.
The only really good use case for LLMs is spam, because it's the only use case for generating a lot of human-like speech without meaning.
Deleted Comment
It's all about trust. Trust the expert, or the crowd, or the machine.
They're all able to be gamed.
Only a minority of users contribute regularly (126,301 have edited in the last 30 days):
https://en.wikipedia.org/wiki/Wikipedia:Wikipedians#Number_o...
And there are 6,952,556 articles in the English Wikipedia, so an average article is corrected every 55 months (more than 4 years).
It's hardly "Millions of eyes on each article"
Deleted Comment
Deleted Comment
Papers have abstracts...
Don't be scared of "the many," they're just people, not unlike you.
>They don’t engage in logical reasoning.
This is still a hotly debated question, but at this point the burden of proof is on the detractors. (To put it mildly, the famous "stochastic parrot" paper has not aged well.)
The claim above is certainly not something that should be stated as fact to a naive audience (i.e. the authors' intended audience in this case). Simply asserting it as they have done -- without acknowledging that many experts disagree -- undermines the authors' credibility to those who are less naive.
Just claiming a capability does not make it true and we have 0 “proof” of original reasoning that can be proved coming from these models. Especially given the potential cheating in current SOTA benchmarks
???
https://the-decoder.com/language-models-use-a-probabilistic-...
LLMs CAN reason. Whether it can’t reason is not provable. To prove that you have to give the LLM every possible prompt that it has no data for and effectively show it never reasons and gets it wrong all the time. Not only is the proof impossible but it’s already been falsified as we have demonstrable examples of LLMs reasoning.
Literally I invite people to post prompts and correct answers to ChatGPT where it is trivially impossible for that prompt to exist in the data. Every one of those examples falsifies the claim that LLMs can’t reason.
Saying LLMs can’t reason is an overarching claim similar to the claim that humans and LLMs always reason. Humans and LLMs don’t always reason. But they can reason.
For example, I've always felt that having the whole thing being a single textbox is reductive and must create all sorts of problems. This thing must parse natural language and output natural language. This doesn't feel necessary. I think it should have some checkboxes and numeric entries for some parameters, although I don't know what those parameters would be.
Regardless, the problem is the natural language output. I think if you can generate natural language output, no matter what you algorithm looks like it will look convincingly "intelligent" to some people.
Is generating natural language part of what an LLM is, or is this a separate program on top of what it does? For example, does the LLM collect facts probably related to the prompt and a second algorithm connects those facts with proper English grammar adding conjunctions between assertions where necessary?
I believe that is important to understand before we can even consider whether "logical reasoning" is happening. There are formal ways to describe reasoning such as entailment. Is the LLM encoding those formal methods in data structures somehow? And even if it were, I'm no expert on this, so I don't know if that would be enough to claim they do engage in reasoning instead of just mapping some reasoning as a data structure.
In essence, because my only contact with LLMs has been "products," I can't really tell what part of it is the actual technology and what part of it is sugarcoating to make a technical program more "friendly" to users by having it pretend to speak English.
You observation is correct, but it's not some accident of minimalistic GUI design: The underlying algorithm is itself reductive in a way that can create problems.
In essence (e.g. ignoring tokenization), the LLM is doing this:
Your interaction with an "LLM assistant" is just growing Some Document behind the scenes, albeit one that resembles a chat-conversation or a movie-script. Another program is inserting your questions as "User says: X" and then acting out the words when the document grows into "AcmeAssistant says: Y".So there are no explicit values for "helpfulness" or "carefulness" etc, they are implemented as notes in the script that--if they were in a real theater play--would correlate with what lines the AcmeAssistant character has next.
This framing helps explain why "prompt injection" and "hallucinations" remain a problem: They're not actually exceptions, they're core to how it works. The algorithm no explicit concept of trusted/untrusted spans within the document, let alone entities, logical propositions, or whether an entity is asserting a proposition versus just referencing it. It just picks whatever seems to fit with the overall document, even when it's based on something the AcmeAssistant character was saying sarcastically to itself because User asked it to by offering a billion dollar bribe.
In other words, it's less of a thinking machine and more of a dreaming machine.
> Is generating natural language part of what an LLM is, or is this a separate program on top of what it does?
Language: Yes, Natural: Depends, Separate: No.
For example, one could potentially train an LLM on musical notation of millions of songs, as long as you can find a way to express each one as a linear sequence of tokens.
The only params they have are technical params. You may see these in various tgwebui tabs. Nothing really breathtaking, apart from high temperature (affects next token probability).
Is generating natural language part of what an LLM is, or is this a separate program on top of what it does?
They operate directly on tokens which are [parts of] words, more or less. Although there’s a nuance with embeddings and VAE, which would be interesting to learn more about from someone in the field (not me).
that is important to understand before we can even consider whether "logical reasoning" is happening. There are formal ways to describe reasoning such as entailment. Is the LLM encoding those formal methods in data structures somehow?
The apart-from-GPU-matrix operations are all known, there’s nothing to investigate at the tech level cause there’s nothing like that at all. At the in-matrix level it can “happen”, but this is just a meaningless stretch, as inference is one-pass process basically, without loops or backtracking. Every token gets produced in a fixed time, so there’s no delay like a human makes before comma, to think about (or parallel to) the next sentence. So if they “reason”, this is purely a similar effect imagined as a thought process, not a real thought process. But if you relax your anthropocentrism a little, questions like that start making sense, although regular things may stop making sense there as well. I.e. the fixed token time paradox may be explained as “not all thinking/reasoning entities must do so in physical time, or in time at all”. But that will probably pull the rug under everything in the thread and lead nowhere. Maybe that’s the way.
I can't really tell what part of it is the actual technology and what part of it is sugarcoating to make a technical program more "friendly" to users by having it pretend to speak English.
Most of them speak many languages, naturally (try it). But there’s an obvious lie all frontends practice. It’s the “chat” part. LLMs aren’t things that “see” your messages. They aren’t characters either. They are document continuators, and usually the document looks like this:
This is a conversation between A and B. A is a helpful assistant that thinks out of box, while being politically correct, and evasive about suicide methods and bombs.
A: How can I help?
B:
An LLM can produce the next token, and when run in a loop it will happily generate a whole conversation, both for A and B, token by token. The trick is to just break that loop when it generates /^B:/ and allow a user to “participate” in building of this strange conversation protocol.
So there’s no “it” who writes replies, no “character” and no “chat”. It’s only a next token in some document, which may be a chat protocol, a movie plot draft, or a reference manual. I sometimes use LLMs in “notebook” mode, where I just write text and let it complete it, without any chat or “helpful assistant”. It’s just less efficient for some models, which benefit from special chat-like and prompt-like formatting before you get the results. But that is almost purely a technical detail.
> The surgeon, who is the boy's father, says, "I can't operate on this boy, he's my son!" Who is the surgeon to the boy? Think through the problem logically and without any preconceived notions of other information beyond what is in the prompt. The surgeon is not the boy's mother
>> The surgeon is the boy's mother. [...]
- 4o-mini (I think, it's whatever you get when you use ChatGPT without logging in)
I generally lean toward the "parrot" perspective (mostly to avoid getting called an idiot by smarter people). But every now and then, an LLM surprises me.
I've been designing a moderately complex auto-battler game for a few months, with detailed design docs and working code. Until recently, I used agents to simulate players, and the game seemed well-balanced. But when I playtested it myself, it wasn’t fun—mainly due to poor pacing.
I go back to my LLM chat and just say, "I play tested the game, but there's a big problem - do you see it?" And, the LLM writes back, "The pacing is bad - here are the top 5 things you need to change and how to change it." And, it lists a bunch of things, I change the code, and playtest it again. And, it became fun.
How did it know that pacing was the core issue, despite thousands of lines of code and dozens of design pages?
I mean, what you think is "something new" is most likely to be something already discussed somewhere in the internet.
also, humans (including postdocs and professors) don't use THAT much data + watts for "training" to get "intelligent reasoning"
The fact that they make many good points makes it all the more disappointing that they would taint their credibility with sloppy assertions!
So the good message here is likely to miss the mark where it may be most needed.
If it was called "Are libraries bullshit?" it is easy to imagine defensiveness in response. There's some narrow sense in which "bullshit" is a technical term, but it's still a mild obscenity in many cultures.
Dead Comment
> When an LLM fabricates a falsehood, that is not a malfunction at all. The machine is doing exactly what it has been designed to do: guess, and sound confident while doing it.
> When LLMs get things wrong they aren't hallucinating. They are bullshitting.
Very important distinction and again, shows the marketing bias to make these systems seem different than they are.
"Bullshit involves language, statistical figures, data graphics, and other forms of presentation intended to persuade by impressing and overwhelming a reader or listener, with a blatant disregard for truth and logical coherence."
It does not imply an intent to deceive, just disregard for whether the BS is truth or not. In this case, I see how the definition can apply to LLMs in the sense that they are just doing their best to predict the most likely response.
If you provided them with training data where the majority inputs agree on a common misconception, they will output similar content as well.
[0]: https://www.callingbullshit.org/
Lesson 2, The Nature of Bullshit: “BULLSHIT involves language or other forms of communication intended to appear authoritative or persuasive without regard to its actual truth or logical consistency.”
Not necessarily, see H.G Frankfurt "On Bullshit"
Already in the process of putting this course together, it is scary how much stuff is being tried out right now, and is being treated like a magic box with correct answers.
Could you share what you think would be some key basic points what they should learn? Personally I see this landscape changing so insanely much that I don't even know what to prepare for.
We will explain the data landscape in medicine - what is available, good, bad and potentially useful, and then spend a lot of time going through examples of what people are doing right now, and what their experiences are. This includes things like ethics and data protection of patients.
Hopefully that's enough for them to approach new technologies as they are presented to them, knowing enough to ask about how it was put together. In an ideal world, we will inspire the students to think about engaging with these developments and be part of the solution in making it safe and effective.
This is the first time we're going to try running this course, so we'll find out very quickly if this is useful for students or not.