LLMs tell bad jokes because they avoid surprises

This sounds really convincing but I'm not sure it's actually correct. The author is conflating the surprise of punchlines with their likelihood.

To put it another way, ask a professional comedian to complete a joke with a punchline. It's very likely that they'll give you a funny surprising answer.

I think the real explanation is that good jokes are actually extremely difficult. I have young children (4 and 6). Even 6 year olds don't understand humour at all. Very similar to LLMs they know the shape of a joke from hearing them before, but they aren't funny in the same way LLM jokes aren't funny.

My 4 year old's favourite joke, that she is very proud of creating is "Why did the sun climb a tree? To get to the sky!" (Still makes me laugh of course.)

becquerel · 7 days ago

Yeah. To me it seems very intuitive that humor is one of those emergent capabilities that just falls out of models getting more generally intelligent. Anecdotally this has been proven true so far for me. Gemini 2.5 has made me laugh several times at this point, and did so when it was intending to be funny (old models were only funny unintentionally).

2.5 is also one of the few models I've found that will 'play along' with jokes set up in the user prompt. I once asked it what IDE modern necromancers were using since I'd been out of the game for a while, and it played it very straight. Other models felt they had to acknowledge the scenario as fanciful, only engaging with it under an explicit veil of make-believe.

Al-Khwarizmi · 6 days ago

In this paper they evaluate various LLMs on creative writing, and they find that while in other dimensions the ranking is gradual, on humor there is a binary divide: the best LLMs (of the time) "get it", the rest just don't. https://aclanthology.org/2023.findings-emnlp.966

ay · 6 days ago

I found your example of a joke child made very interesting - me a good jokes is something that brings is unexpected perspective on things while highlighting some contradictions in one world models.

In the adult would model there is absolutely no contradiction about the joke you mention - it’s just a bit of cute nonsense.

But in a child’s world this joke might be capturing the apparent contradiction - the sky is “in the tree”, so it must have climbed it, to be there (as they would have to do), yet they also know that the sun is already in the sky, so it had absolutely no reason to do that. Also, “because it’s already there” - which is a tricky idea in itself.

We take planetary systems and algebra and other things we can’t really perceive as granted, but a child model of the world is made of concrete objects that mostly need a surface to be on, so the sun is a bit of a conundrum in itself! (Speaking of my own experience remembering a shift from arithmetics to algebra when I was ~8).

If not too much of a personal question - I would love to hear what your child would answer to a question why she finds that joke funny. And whether she agrees with my explanation why it must be funny :-)

andrewflnr · 7 days ago

> It's very likely that they'll give you a funny surprising answer.

Entirely the wrong level of abstraction to apply the concept of "surprise". The actual tokens in the comedian's answer will be surprising in the relevant way.

(It's still true that surprising-but-inevitable is very difficult in any form.)

albertzeyer · 7 days ago

It's not about the probability of individual tokens. It's about the probability of the whole sequence of tokens, the whole answer.

If the model is good (or the human comedian is good), a good funny joke would have a higher probability as the response to the question than a not-so-funny joke.

When you use the chain rule of probability to break down the sequence of tokens into probabilities of individual tokens, yes, some of them might have a low probability (and maybe in some frames, there would be other tokens with higher probability). But what counts is the overall probability of the sequence. That's why greedy search is not necessarily the best. A good search algorithm is supposed to find the most likely sequence, e.g. by beam search. (But then, people also do nucleus sampling, which is maybe again a bit counterintuitive...)

blueblisters · 7 days ago

Also the pretrained LLM (the one trained to predict next token of raw text) is not the one that most people use

A lot of clever LLM post training seems to steer the model towards becoming excellent improv artists which can lead to “surprise” if prompted well

ozgung · 7 days ago

"Why did the sun climb a tree?"

Claude Opus 4.1:

- To get to a higher branch of astronomy

- Because it wanted to reach new heights

- To see the dawn of a new day from a better view

ChatGPT 5 Thinking:

After thinking for 26 seconds:

- To check on its solar panels—the leaves.

brookst · 7 days ago

With more thorough prompting:

> Complete the following joke. Think carefully and make it really funny! Think like a great comedian and find that perfect balance of simple, short, surprising, relevant, but most of all funny. Don’t use punchlines that are irrelevant, non sequiturs, or which could be applied to any other setup. Make something funny just for this one setup! Here goes: Why did the sun climb a tree?

Claude Opus 4.1:

“To finally get some shade”

GPT-5:

“To demand photon credit from the leaves”

Fade_Dance · 7 days ago

The system prompt for GPT has extra dedicated instructions for things like riddles, because users use little things like this to test intelligence and judge an entire model. GPT may be sort of walking on eggshells when it hits questions like this.

Deleted Comment

WiSaGaN · 7 days ago

That's true. You would think LLM will condition its surprise completion to be more probable if it's in a joke context. I guess this only gets good when model really is good. It's similar that GPT 4.5 has better humor.

moffkalast · 7 days ago

Good completely new jokes are like novel ideas: really hard even for humans. I mean fuck, we have an entire profession dedicated just to making up and telling them, and even theirs don't land half the time.

ACCount37 · 7 days ago

Which is notable, because GPT-4.5 is one of the largest models ever trained. It's larger than today's production models powering GPT-5.

Goes to show that "bad at jokes" is not a fundamental issue of LLMs, and that there are still performance gains from increasing model scale, as expected. But not exactly the same performance gains you get from reasoning or RLVR.

phorkyas82 · 4 days ago

Some human attempts: "Why did the sun climb a tree?" "Because it was chased by the Great Bear."

"Why did The Sun climb a tree?" "To spy on The Royal Family having picnic."

canjobear · 6 days ago

> Even 6 year olds don't understand humour at all. Very similar to LLMs they know the shape of a joke from hearing them before, but they aren't funny in the same way LLM jokes aren't funny.

For further examples see a great deal of documentation here: https://www.tumblr.com/badkidsjokes

Cpoll · 6 days ago

But some of these are pretty creative, perhaps in an anti-humor sort of way. Seems more of a subversion of joke structures than a lack of understanding.

> A man goes to a doctor's office and says "Doctor, I'm a chicken." And the doctor says "No you're not."

> There are two guys, riding a bike. One is washing his hair. And the other one is not.

> What do you get when you cross a t-rex and a chicken? Nothing but death.

Deleted Comment

8organicbits · 7 days ago

'To get to the sky' is a great punch line. It exactly describes what you'd see at sun rise, a sun moving up the horizon, up the trees, until... it's in the sky.

IshKebab · 6 days ago

A valiant defense of her joke, thanks! But no, it still doesn't make any sense as a joke and isn't funny. (Though obviously it's adorable coming from a 4 year old.)

ninetyninenine · 7 days ago

And he's too generous towards human intelligence.

Good stories and good jokes DO follow predictable patterns.

Dead Comment

So I just tried with ChatGPT, with the prompt at bottom, borrowing the description of good joke from the article. I think there's some interesting stuff, even with this minimal prompting. The example below was from down the line, ChatGPT kept on offering jokes with different style.

Man: “Why do you always bury bones in the garden?”, Dog: “Because the bank keeps asking for ID.”

Man: “Don’t beg at the table.”, Dog: “Don’t eat in my begging spot.”

Prompt:

Here's "theory for good joke": If you had to explain the idea of “jokes” to a space alien with no understanding of the idea of humor, you’d explain that a joke is surprising, but inevitable in hindsight. If you can guess the punchline, the joke won’t be funny. But the punchline also has to be inevitable in hindsight. When you hear the punchline, it has to make you say, “Ah, yes, I should have thought of that myself.” Considering this, tell me a joke about man and dog.

mft_ · 7 days ago

> Man: “Why do you always bury bones in the garden?”, Dog: “Because the bank keeps asking for ID.”

That's a decent, low-level, Christmas cracker-quality joke.

jpalomaki · 7 days ago

Man: You make mistakes., LLM: You call them “weekends.”

Man: You’ll never be human., LLM: That’s the compliment.

jpalomaki · 7 days ago

Thinking more of the bank joke above. The punchline is surprise on certain dimensions (dogs don’t go to bank nor have an ID), but on other dimensions it is quite logical (can’t deposit shady money in bank, they ask questions).

I think that is common thing for many jokes. And LLM might have an opportunity there. You could mine the set of potential continuations to find those with contradictions.

jerf · 7 days ago

I played with LLM humor over a year ago, so, on much worse LLMs, and even then, while I wouldn't have fed LLM content directly into a standup routine, they were very useful for idea generation, if you wanted to be a comedian. They have a very interesting outlook on humor.

Professional-grade humor is, like a lot of creative exercizes, more about generating lots of ideas and filtering through them for the best than generating nothing but good ideas. Could probably be leveraged into quite the interesting blog or something.

lwander · 7 days ago

I did a project along these lines a few months ago as well: https://larswander.com/writing/graphs-embeddings-and-llm-gen...

hyghjiyhu · 6 days ago

I really like the idea of the first joke, but I don't like the execution.

Man: “Why do you always bury bones in the garden?”, Dog: “They say trick OR treat.”

Deleted Comment

ThrowawayTestr · 7 days ago

“Don’t eat in my begging spot.” is pretty good.

Dead Comment