Readit News logoReadit News
IshKebab · 7 days ago
This sounds really convincing but I'm not sure it's actually correct. The author is conflating the surprise of punchlines with their likelihood.

To put it another way, ask a professional comedian to complete a joke with a punchline. It's very likely that they'll give you a funny surprising answer.

I think the real explanation is that good jokes are actually extremely difficult. I have young children (4 and 6). Even 6 year olds don't understand humour at all. Very similar to LLMs they know the shape of a joke from hearing them before, but they aren't funny in the same way LLM jokes aren't funny.

My 4 year old's favourite joke, that she is very proud of creating is "Why did the sun climb a tree? To get to the sky!" (Still makes me laugh of course.)

becquerel · 7 days ago
Yeah. To me it seems very intuitive that humor is one of those emergent capabilities that just falls out of models getting more generally intelligent. Anecdotally this has been proven true so far for me. Gemini 2.5 has made me laugh several times at this point, and did so when it was intending to be funny (old models were only funny unintentionally).

2.5 is also one of the few models I've found that will 'play along' with jokes set up in the user prompt. I once asked it what IDE modern necromancers were using since I'd been out of the game for a while, and it played it very straight. Other models felt they had to acknowledge the scenario as fanciful, only engaging with it under an explicit veil of make-believe.

Al-Khwarizmi · 6 days ago
In this paper they evaluate various LLMs on creative writing, and they find that while in other dimensions the ranking is gradual, on humor there is a binary divide: the best LLMs (of the time) "get it", the rest just don't. https://aclanthology.org/2023.findings-emnlp.966
ay · 6 days ago
I found your example of a joke child made very interesting - me a good jokes is something that brings is unexpected perspective on things while highlighting some contradictions in one world models.

In the adult would model there is absolutely no contradiction about the joke you mention - it’s just a bit of cute nonsense.

But in a child’s world this joke might be capturing the apparent contradiction - the sky is “in the tree”, so it must have climbed it, to be there (as they would have to do), yet they also know that the sun is already in the sky, so it had absolutely no reason to do that. Also, “because it’s already there” - which is a tricky idea in itself.

We take planetary systems and algebra and other things we can’t really perceive as granted, but a child model of the world is made of concrete objects that mostly need a surface to be on, so the sun is a bit of a conundrum in itself! (Speaking of my own experience remembering a shift from arithmetics to algebra when I was ~8).

If not too much of a personal question - I would love to hear what your child would answer to a question why she finds that joke funny. And whether she agrees with my explanation why it must be funny :-)

andrewflnr · 7 days ago
> It's very likely that they'll give you a funny surprising answer.

Entirely the wrong level of abstraction to apply the concept of "surprise". The actual tokens in the comedian's answer will be surprising in the relevant way.

(It's still true that surprising-but-inevitable is very difficult in any form.)

albertzeyer · 7 days ago
It's not about the probability of individual tokens. It's about the probability of the whole sequence of tokens, the whole answer.

If the model is good (or the human comedian is good), a good funny joke would have a higher probability as the response to the question than a not-so-funny joke.

When you use the chain rule of probability to break down the sequence of tokens into probabilities of individual tokens, yes, some of them might have a low probability (and maybe in some frames, there would be other tokens with higher probability). But what counts is the overall probability of the sequence. That's why greedy search is not necessarily the best. A good search algorithm is supposed to find the most likely sequence, e.g. by beam search. (But then, people also do nucleus sampling, which is maybe again a bit counterintuitive...)

blueblisters · 7 days ago
Also the pretrained LLM (the one trained to predict next token of raw text) is not the one that most people use

A lot of clever LLM post training seems to steer the model towards becoming excellent improv artists which can lead to “surprise” if prompted well

ozgung · 7 days ago
"Why did the sun climb a tree?"

Claude Opus 4.1:

- To get to a higher branch of astronomy

- Because it wanted to reach new heights

- To see the dawn of a new day from a better view

ChatGPT 5 Thinking:

After thinking for 26 seconds:

- To check on its solar panels—the leaves.

brookst · 7 days ago
With more thorough prompting:

> Complete the following joke. Think carefully and make it really funny! Think like a great comedian and find that perfect balance of simple, short, surprising, relevant, but most of all funny. Don’t use punchlines that are irrelevant, non sequiturs, or which could be applied to any other setup. Make something funny just for this one setup! Here goes: Why did the sun climb a tree?

Claude Opus 4.1:

“To finally get some shade”

GPT-5:

“To demand photon credit from the leaves”

Fade_Dance · 7 days ago
The system prompt for GPT has extra dedicated instructions for things like riddles, because users use little things like this to test intelligence and judge an entire model. GPT may be sort of walking on eggshells when it hits questions like this.

Deleted Comment

WiSaGaN · 7 days ago
That's true. You would think LLM will condition its surprise completion to be more probable if it's in a joke context. I guess this only gets good when model really is good. It's similar that GPT 4.5 has better humor.
moffkalast · 7 days ago
Good completely new jokes are like novel ideas: really hard even for humans. I mean fuck, we have an entire profession dedicated just to making up and telling them, and even theirs don't land half the time.
ACCount37 · 7 days ago
Which is notable, because GPT-4.5 is one of the largest models ever trained. It's larger than today's production models powering GPT-5.

Goes to show that "bad at jokes" is not a fundamental issue of LLMs, and that there are still performance gains from increasing model scale, as expected. But not exactly the same performance gains you get from reasoning or RLVR.

phorkyas82 · 4 days ago
Some human attempts: "Why did the sun climb a tree?" "Because it was chased by the Great Bear."

"Why did The Sun climb a tree?" "To spy on The Royal Family having picnic."

canjobear · 6 days ago
> Even 6 year olds don't understand humour at all. Very similar to LLMs they know the shape of a joke from hearing them before, but they aren't funny in the same way LLM jokes aren't funny.

For further examples see a great deal of documentation here: https://www.tumblr.com/badkidsjokes

Cpoll · 6 days ago
But some of these are pretty creative, perhaps in an anti-humor sort of way. Seems more of a subversion of joke structures than a lack of understanding.

> A man goes to a doctor's office and says "Doctor, I'm a chicken." And the doctor says "No you're not."

> There are two guys, riding a bike. One is washing his hair. And the other one is not.

> What do you get when you cross a t-rex and a chicken? Nothing but death.

Deleted Comment

8organicbits · 7 days ago
'To get to the sky' is a great punch line. It exactly describes what you'd see at sun rise, a sun moving up the horizon, up the trees, until... it's in the sky.
IshKebab · 6 days ago
A valiant defense of her joke, thanks! But no, it still doesn't make any sense as a joke and isn't funny. (Though obviously it's adorable coming from a 4 year old.)
ninetyninenine · 7 days ago
And he's too generous towards human intelligence.

Good stories and good jokes DO follow predictable patterns.

Dead Comment

jpalomaki · 7 days ago
So I just tried with ChatGPT, with the prompt at bottom, borrowing the description of good joke from the article. I think there's some interesting stuff, even with this minimal prompting. The example below was from down the line, ChatGPT kept on offering jokes with different style.

Man: “Why do you always bury bones in the garden?”, Dog: “Because the bank keeps asking for ID.”

Man: “Don’t beg at the table.”, Dog: “Don’t eat in my begging spot.”

Prompt:

Here's "theory for good joke": If you had to explain the idea of “jokes” to a space alien with no understanding of the idea of humor, you’d explain that a joke is surprising, but inevitable in hindsight. If you can guess the punchline, the joke won’t be funny. But the punchline also has to be inevitable in hindsight. When you hear the punchline, it has to make you say, “Ah, yes, I should have thought of that myself.” Considering this, tell me a joke about man and dog.

mft_ · 7 days ago
> Man: “Why do you always bury bones in the garden?”, Dog: “Because the bank keeps asking for ID.”

That's a decent, low-level, Christmas cracker-quality joke.

jpalomaki · 7 days ago
Man: You make mistakes., LLM: You call them “weekends.”

Man: You’ll never be human., LLM: That’s the compliment.

jpalomaki · 7 days ago
Thinking more of the bank joke above. The punchline is surprise on certain dimensions (dogs don’t go to bank nor have an ID), but on other dimensions it is quite logical (can’t deposit shady money in bank, they ask questions).

I think that is common thing for many jokes. And LLM might have an opportunity there. You could mine the set of potential continuations to find those with contradictions.

jerf · 7 days ago
I played with LLM humor over a year ago, so, on much worse LLMs, and even then, while I wouldn't have fed LLM content directly into a standup routine, they were very useful for idea generation, if you wanted to be a comedian. They have a very interesting outlook on humor.

Professional-grade humor is, like a lot of creative exercizes, more about generating lots of ideas and filtering through them for the best than generating nothing but good ideas. Could probably be leveraged into quite the interesting blog or something.

lwander · 7 days ago
I did a project along these lines a few months ago as well: https://larswander.com/writing/graphs-embeddings-and-llm-gen...
hyghjiyhu · 6 days ago
I really like the idea of the first joke, but I don't like the execution.

Man: “Why do you always bury bones in the garden?”, Dog: “They say trick OR treat.”

Deleted Comment

ThrowawayTestr · 7 days ago
“Don’t eat in my begging spot.” is pretty good.

Dead Comment

Applejinx · 7 days ago
Last time this came up, I riffed on the difference between LLMs and Markov chains: didn't actually have a machine write a joke, but made one where the punchline was very much Markov chain style rather than LLM style. The thing is, LLMs will try to have broader context around a word completion, where the simple Markov chain can 'correctly' complete a word, but in such a way that your brain trips over itself and goes splat, having to re-evaluate the whole thing in an absurd way. That's the 'surprise', and also why joke-writers are interested in not only a punch-line but also the punch WORD, and the later it strikes, the better.

"An LLM, a Markov chain, and GPT-4 walk into a bar. The bartender says "We don't serve your kind here." GPT-4 leaves. The LLM stays to debate ethics. The Markov chain orders a coup."

It's a joke because a dictator can certainly order a coup, but the joke's set up that these machines are being scorned and disrespected and treated as the farthest thing from a dictator with the power to order a coup, but up to the last word, all the context demands that the word be something placating and in line with things as they're presented, and then boom, surprise which implies the context is completely different from what was presented. LLMs will tend to stick to what's presented if their ability to contextualize can encompass it.

lupusreal · 7 days ago
I think it would be funnier if coup was pronounced like soup, but unfortunately the p gets dropped.
kazinator · 7 days ago
The mainstream, production LLMs are fine tuned and system prompted toward factuality and safety. Those tunings are diametrically opposed to telling may kinds of good jokes.

Consumers of mainstream LLMs have no idea how good or bad the underlying models actually are at generating jokes, due to the confounding effect of the guard rails.

kens · 7 days ago
If you're interested in the theory behind humor, I recommend "Inside Jokes: Using Humor to Reverse-Engineer the Mind"; cognitive scientist Daniel Dennett is a co-author. It makes a mostly convincing case that humor evolved to encourage people to detect cognitive error. The book also ties this in with (pre-LLM) artificial intelligence. The basic idea is that humor depends on errors in reasoning and the punchline causes you to reevaluate your reasoning and discover your error. Humor evolved to be enjoyable to encourage the discovery of errors.
fluoridation · 7 days ago
One time I was playing around with LLaMA and I injected Senator Stephen Armstrong (with me inputting his lines) into a mundane situation. In response to "I'm using war-as-a-business so I can end war-as-a-business", the model had one of the characters conclude "oh, he's like the Iron Sheik of politics!", which got an honest chuckle out of me. I don't follow wrestling, so I don't know if it's an appropriate response, but I found it so random that it was just funny.
amelius · 7 days ago
I'm sure there is a guy in OpenAI working on the theory of humor and how to make LLMs be comedians. Must be an interesting job.
josephg · 7 days ago
I have no doubt plenty of smart engineers at tech companies would rather reinvent the wheel than read a book on theatre. But if anyone’s interested, there are plenty of great books talking about the philosophy of comedy, and why some things work on stage and some don’t. I highly recommend Keith Johnstone’s “Impro”. He’s the guy who invented modern improv comedy and theatre sports.

He says things are funny if they’re obvious. But not just any obvious. They have to be something in the cloud of expectation of the audience. Like, something they kinda already thought but hadn’t named. If you have a scene where someone’s talking to a frog about love, it’s not funny for the talking frog to suddenly go to space. But it might be funny to ask the frog why it can talk. Or ask about gossip in the royal palace. Or say “if you’re such a catch, how’d you end up as a frog?”.

If good comedy is obvious, you’d think LLMs would be good at it. Honestly I think LLMs fall down by not being specific enough in detail. They don’t have ideas and commit to them. They’re too bland. Maybe their obvious just isn’t the same as ours.

4gotunameagain · 6 days ago
> Maybe their obvious just isn’t the same as ours.

Of maybe they're just stochastic parrots and are devoid of intelligence, a necessity to make other intelligent beings laugh with novel jokes ;)

bhickey · 7 days ago
In the pre-LLM days a friend's lab worked on a joke detector for The New Yorker. One measure they used was trigram surprise. Roughly P(AB) + P(BC) >> P(ABC).

For example, "alleged killer" and "killer whale" are both common, but "alleged killer whale" is surprising.

Fade_Dance · 7 days ago
That reminds me of a joke I liked from Tim Heidecker when he was ribbing Maynard Keenan about his wine making:

"The blood of Christ is essentially wine, correct?"

Yes.

"Who are you to put that in a bottle?"

So a logical spoke can be inferred as well, blood->wine wine->bottle blood->bottle. That uses their own logical inferences against them as a "trick" which is another funny element for people. Using that to vault straight to the punch line makes the joke better, but you have to be sure the audience is on board, which is why there is a bit of reinforcement at the beginning of the joke to force them onboard.

Deleted Comment

jvm___ · 7 days ago
What do you do for a living?

I teach math how to be funny.

golol · 7 days ago
IMO many misrepresentations. - pretraining to predict the next token imposes no bias against surprise, except that low probabilities are more likely to have a large relative error. - using a temperature lower than 1 does impose a direct bias against surprise. - Finetuning of various kinds (instruction, RLHF, safety) may increase or decrease surprise. But certainly the kind of things ained for in finetuning significantly harm the capability to tell jokes.
sigmoid10 · 7 days ago
I think the whole discussion just conflates the ideas of telling a joke and coming up with one. Telling a joke right is of course an art, but the punchline in itself has zero surprise if you studied your lines well - like all good comedians do. The more you study, the more you can also react to impromptu situations. Now, coming up yourself with a completely original joke, that's a different story. For that you actually have to venture outside the likelihood region and find nice spots. But that is something that is also really, really rare among humans and I have only ever observed it in combination with external random influences. Without those, I doubt LLMs will be able to compete at all. But I fully believe a high end comedian level LLM is possible given the right training data. It's just that none of the big players ever cared about building such a model, since there is very little money in it compared to e.g. coding.