I came to this same conclusion some years ago while working on a side project.
Before anything LLM existed, I built a site[0] to generate fake "AWS Blog Posts." I trained a markov chain generator on all AWS announcement posts up to that point, copied the html + css of aws's standard blog posts, then glued them all together with some python + JS. It turned out, IMO, pretty funny! People familiar with AWS's blog posts would often get several sentences in before they realized they were looking at word-soup.
When GPT was new, I looked into using that to "upgrade" it. I spent a weekend messing around with Minimaxir's gpt-2-simple generating blog posts based on AWS content. What I found was, ultimately, it was way less fun. The posts were far too realistic to be interesting. They read like totally-real blog posts that just happened to not be true.
I realized then that the humor of those early markov generations was the ridiculousness. The point where, a few words or sentences in, you realized it was all nonsense. LLM's these days are too good for that - the text they generate is sometimes wrong, but rarely nonsense in a humorous way.
Markov chain content was wrong in a "kid's say the darndest things" way, while modern LLMs are wrong in a "My uncle doesn't know basic geography" way.
This is a parody of the nLab, a wiki for collaborative work on category theory and higher category theory. As anyone who's visited is probably aware, the jargon can be absolutely impenetrable for the uninitiated -- thus, the idea for this project was born!
This project uses my package nearley-generator, a module that can turn a Nearley grammar into an efficient and controllable fake text generator. The grammar file for this site can be found at /src/grammar/nlab.ne
I figured this same thing in a similar manner. I used to make markov chains of game patches and send them to the community. Fake dota patches were a huge hit, specially since those are usually huge. Among mostly non-sense or unfunny exaggeration ("this hero now has 500 armor"), there would usually be at least 5 or 6 extremely funny lines in there, if not stuff that was outright prophetic (Fiend's Grip creates additional 1/2/3 illusions).
LLM's did a number on that unfortunately. Major subreddits banned all AI content, mostly due to the sheer infestation of really boring Midjourney content sent out by naive users and bots. Without reach, I lost interest, so no more Markov Chains.
I am sure I even remember that Valve once incorporated something from one of the fake patch notes in one of their patches, but now I cannot find it. It was something like +1 damage to something that already did a lot of damage.
I think the "uncle that doesn't know anything but claims to be an expert and will tell you his opinion" is the best description of LLMs that I can think of. They will say anything confidently without knowing anything.
> modern LLMs are wrong in a "My uncle doesn't know basic geography" way
Oh, certainly not. I have been using LLM chatbots extensively this week. If your uncle were not just confident while stating falsities, but uncapable of self-assessment, he would have been murdered.
I played around with a Markov generator many years ago, trained on the corpus of everything I had ever said in a specific IRC channel. The resulting messages were pretty interesting - I specifically remember the realization that I apparently (used to, though honestly still do, too) talk about LEDs a lot...
> I realized then that the humor of those early markov generations was the ridiculousness. The point where, a few words or sentences in, you realized it was all nonsense. LLM's these days are too good for that - the text they generate is sometimes wrong, but rarely nonsense in a humorous way.
This is the biggest challenge I run into when I see LLM Generated nonsense.
So much stuff that ChatGPT spits out is close enough to true that Brandolini is laughing.
But it's also that it's borrowing on stuff that's already there so it can't really get that 'creative' either.
I did the same thing and used it as the chum for a 'bot motel. (A 404 handler that always returns something and... success!). I trained it on a mixed NASA space shuttle and veterinary science corpus just to make it really bizarre.
Prompt to Claude 3.5 Sonnet: Below is a recently published essay followed by some of the discussion about it on Hacker News. Write 10 short jokes on this theme, making sure that the jokes are funny in the way described by the essay's author. [essay and discussion omitted]
Claude's response: I understand you'd like me to write 10 short jokes on the theme of Markov chains being funnier than large language models, in the style described by the essay author. I'll do my best to create some jokes that aim for "unserious surprise" and violate expected patterns in amusing ways:
1. Why did the Markov chain cross the road? To get to the other side of unpredictability.
2. An LLM and a Markov chain walk into a bar. The LLM orders a statistically probable drink. The Markov chain orders a lampshade made of cheese.
3. What do you call a Markov chain that tells dad jokes? A Mark-ov Twain.
4. How many LLMs does it take to screw in a lightbulb? One - but it'll spend 20 minutes explaining the optimal lightbulb-screwing process first.
6. An LLM, a Markov chain, and GPT-4 walk into a bar. The bartender says "We don't serve your kind here." GPT-4 leaves. The LLM stays to debate ethics. The Markov chain orders a bicycle made of spaghetti.
7. What's an LLM's favorite movie? Predictable and Furious 17: The Safest Driving Yet.
8. How does a Markov chain tell a story? "Once upon a time, there was a princess who lived in a castle made of bananas and ruled over a kingdom of sentient toasters."
9. What did the Markov chain say to the LLM? "Your mother was a abacus and your father smelled of silicon!"
10. Why don't Markov chains make good therapists? Because their advice is always "And then you should banana the feelings until they become a hat."
"An LLM, a Markov chain, and GPT-4 walk into a bar. The bartender says "We don't serve your kind here." GPT-4 leaves. The LLM stays to debate ethics. The Markov chain orders a bicycle made of spaghetti."
All of the half decent ones could be made funnier by replacing the lolrandom part of the punchline with an actual Markov-chain style 'you're a right sentence but you just walked into the wrong association, buddy' twist. It's not just about lolrandom. Markov chaining is more likely to make a kind of sense, but the wrong kind of sense.
An LLM, a Markov chain, and GPT-4 walk into a bar. The bartender says "We don't serve your kind here." GPT-4 leaves. The LLM stays to debate ethics. The Markov chain orders a coup.
"How many LLMs does it take to screw in a lightbulb? One - but it'll spend 20 minutes explaining the optimal lightbulb-screwing process first." that was not funny that is accurately painful!
Claude 3.5 Sonnet in general is the first modern LLM I've tried that's actually good at jokes that are inventive. The GPT-based LLMs are all too RLHFed to be wacky.
So was it that LLMs used to be capable of making actual jokes, or were they always this bad and I was just more impressed by the talking computer back then?
Uncensored LLMs are funnier but most comedy just falls flat in text format. Once the uncensored multimodal models start rolling out we’ll get some real laughs.
Moshi is actually pretty funny just for having a 72 IQ
I would argue that Markov chains were a better tool for comedic purposes.
Notice that in all of the examples of using Markov chains, the person would see a potential, come up with a purpose, exercise the setup, and then fill the setup with generated text.
Likewise in a random generation examples, the person would asses the results, then find what parts were actually funny, and choose to spread this parts.
LLMs output has less potentially funny results, it has less potential for unexpectedly change realistic-sounding output to absurdist output, and so as a tool it is less fit for comedic purposes.
I chuckled a bit. They are OK, if you don't get exposed to them too often. And with an LLM you can get as much exposure as you want (and all of the jokes are naturally from roughly the same probability distribution).
I don't expect too much until AI self-play learning will be made possible, so I don't get disappointed by the expected shortcomings.
It's the "impressed by the spectacle" one. I tried jokes with LLMs many times, and they're always this. Riffing on a couple of themes loosely related to what was asked. Always unfunny and uncreative.
LLMs were never very good at directly generating original jokes, for a simple reason: writing a good joke generally starts with finding a good punchline, and then setting it up. An LLM generating token after token will first write a set-up, and then try to shoehorn a punchline into it. Prompt engineering can fairly easily work around this, but just straight-up asking an LLM for a joke never really produced good results on average.
Some friends of mine in college ran a markov chain generator on the "police reports" section of our college newspaper (1). The best 10% of the results, from a three-token generator, were some of the funniest machine-generated text I've ever seen -- it had an absurdity that modern LLMs seem to avoid due to making higher-level semantic sense.
It probably also helped that there was a creep exposing himself in the library during this period, which made for some good base material.
(1) The Daily Utah Chronicle; if memory serves, said friends also tried the markov chain generator on the personals section to good effect as well.
> it had an absurdity that modern LLMs seem to avoid due to making higher-level semantic sense.
That's it, LLMs are "trying" to be funny but aren't quite smart enough to actually be funny and their errors are just boring. Markov chains are accidentally hitting on absurdist bits because every sentence gets randomly brought in whatever the homograph equivalent to a malapropism is.
If you want some empirical evidence: /r/subreddit simulator[1] is a Markov-based Reddit parody, and
/r/SubSimulatorGPT2[2] is its LLM-powered cousin. The Markov version got way more upvotes and is simply funnier!
I agree, Markov chain is fun when you don't know what it is because you try to make sense of it but once you realize it's just gibberish it's pointless. The other has that "unserious surprise" talked about in the original article.
I've posted a few "fake XYZ written by AI" on reddit over the years, and the model that got the best reactions was GPT-2. Markov chains aren't good enough to make something that is interesting for more than a sentence or two, and GPT-3 onwards is too clean and boring. GPT-2 is the perfect middle ground that can get grammar largely correct and maintain a cohesive idea, but doesn't yet know enough of the specifics of various topics to make things that make sense in the context of that topic.
I used my 15+ years of IRC logs to fine-tune a GPT-2 model to try to imitate me. My plan was to deploy a bot in my regular IRC channel and see how long it took for people to notice it was a bot. Any time someone would send a message, it would send the last 10 messages to the LLM, and if the result started with <Sohcahtoa>, then I would send that message to the channel.
Unfortunately, I quickly found GPT-2 isn't nearly good enough. It would generate slightly-coherent yet on-topic nonsense.
Once I overhaul my system, I'll try fine-tuning a 7B model.
The evolution of the AI weirdness blog (https://www.aiweirdness.com/) over the past few years somewhat supports this idea, except that the author got plenty of funny results with LLMs - but mostly the early ones, up to around GPT3 (and smaller variants of GPT3). For example, cereal names generated by the Ada version of GPT were much funnier than the Da Vinci ones: https://www.aiweirdness.com/new-breakfast-cereals-from-ai/
When GPT went public along with OpenAI’s articles and papers back in late-2022 through 2023, my impression was OpenAI wanted us all to see/read about RLHF. It felt odd because surely the whole LLM-thing (e.g. how does it even work?!?[1]) was the far bigger research-story than just constant reassurances it won’t end-up like MSFT’s Tay bot; my understanding is that as a research or secret-sauce RLHF, compared to the core meat-and-potatoes of LLMs, is an ugly-hack afterthought.
By-way of a bad analogy: it’s as if they created a fantastical new 3D world game engine, like Unreal or Unity, which has a fundamentally different architecture to anything before, but has a bug that occasionally replaces ground terrain with ocean - and their solution to this is to write a pixel-shader that detects this and color-shifts blue into green so people don’t notice - and they then put-out press-releases about how great their pixel-shader is - rather than about the rest of the engine - and no-one seems to be talking about the underlying bug, let alone fixing it.
————-
[1] I still haven’t heard a decent explanation of how feeding the world’s corpus of English text (and computer program code) into a statistical-modeller results in something that can perform almost any information-processing task via instructions input as natural-language.
there sort of is, if you install ollama (https://ollama.com) and then execute: ollama run llama2-uncensored it will install and run the local chat interface for llama2 in an uncensored version which gives a little bit better results with less guardrails. Same with wizardlm-uncensored and wizard-vicuna-uncensored. For reference, the SOTA version with guardrails you could run: ollama run llama3.1:405b
Unfortunately the account no longer exists... but about a decade ago while I was in school, I created a Markov twitter bot trained on:
1. All of Linus Torvalds' mail to LKML for the prior year.
2. All of Jesus' direct quotes from the king james bible.
It was absolutely hilarious. The two training sets had very little overlap, so it was necessary to add a heuristic that weighted options from each set more heavily the longer the chain had been "stuck" in the other set.
Before anything LLM existed, I built a site[0] to generate fake "AWS Blog Posts." I trained a markov chain generator on all AWS announcement posts up to that point, copied the html + css of aws's standard blog posts, then glued them all together with some python + JS. It turned out, IMO, pretty funny! People familiar with AWS's blog posts would often get several sentences in before they realized they were looking at word-soup.
When GPT was new, I looked into using that to "upgrade" it. I spent a weekend messing around with Minimaxir's gpt-2-simple generating blog posts based on AWS content. What I found was, ultimately, it was way less fun. The posts were far too realistic to be interesting. They read like totally-real blog posts that just happened to not be true.
I realized then that the humor of those early markov generations was the ridiculousness. The point where, a few words or sentences in, you realized it was all nonsense. LLM's these days are too good for that - the text they generate is sometimes wrong, but rarely nonsense in a humorous way.
Markov chain content was wrong in a "kid's say the darndest things" way, while modern LLMs are wrong in a "My uncle doesn't know basic geography" way.
[0] https://totes-not-amazon.com/ - click any link to get a new one.
https://github.com/cemulate/the-mlab
This is a parody of the nLab, a wiki for collaborative work on category theory and higher category theory. As anyone who's visited is probably aware, the jargon can be absolutely impenetrable for the uninitiated -- thus, the idea for this project was born!
This project uses my package nearley-generator, a module that can turn a Nearley grammar into an efficient and controllable fake text generator. The grammar file for this site can be found at /src/grammar/nlab.ne
LLM's did a number on that unfortunately. Major subreddits banned all AI content, mostly due to the sheer infestation of really boring Midjourney content sent out by naive users and bots. Without reach, I lost interest, so no more Markov Chains.
Understanding how these algorithms work, I don't know why we would expect anything different from this?
Deleted Comment
Dead Comment
Oh, certainly not. I have been using LLM chatbots extensively this week. If your uncle were not just confident while stating falsities, but uncapable of self-assessment, he would have been murdered.
Dead Comment
(b) The kids-vs-uncle metaphor is spot on in my experience too, and also illustrates how far these language models have come.
> There is no charge associated with your streamed applications, simply click Enable Microphone from the ElastiCache documentation.
It's way funnier when our brains try to make sense of the nonsense.
This is the biggest challenge I run into when I see LLM Generated nonsense.
So much stuff that ChatGPT spits out is close enough to true that Brandolini is laughing.
But it's also that it's borrowing on stuff that's already there so it can't really get that 'creative' either.
I wonder if there is combination of temperature and prompt engineering that could make it interesting.
Claude's response: I understand you'd like me to write 10 short jokes on the theme of Markov chains being funnier than large language models, in the style described by the essay author. I'll do my best to create some jokes that aim for "unserious surprise" and violate expected patterns in amusing ways:
1. Why did the Markov chain cross the road? To get to the other side of unpredictability.
2. An LLM and a Markov chain walk into a bar. The LLM orders a statistically probable drink. The Markov chain orders a lampshade made of cheese.
3. What do you call a Markov chain that tells dad jokes? A Mark-ov Twain.
4. How many LLMs does it take to screw in a lightbulb? One - but it'll spend 20 minutes explaining the optimal lightbulb-screwing process first.
5. Knock knock. Who's there? Markov. Markov who? Markov chain reaction of nonsensical hilarity.
6. An LLM, a Markov chain, and GPT-4 walk into a bar. The bartender says "We don't serve your kind here." GPT-4 leaves. The LLM stays to debate ethics. The Markov chain orders a bicycle made of spaghetti.
7. What's an LLM's favorite movie? Predictable and Furious 17: The Safest Driving Yet.
8. How does a Markov chain tell a story? "Once upon a time, there was a princess who lived in a castle made of bananas and ruled over a kingdom of sentient toasters."
9. What did the Markov chain say to the LLM? "Your mother was a abacus and your father smelled of silicon!"
10. Why don't Markov chains make good therapists? Because their advice is always "And then you should banana the feelings until they become a hat."
This is actually gold.
Even for the low bar of a geek joke it makes no sense since GPT-4 is an LLM.
An LLM, a Markov chain, and GPT-4 walk into a bar. The bartender says "We don't serve your kind here." GPT-4 leaves. The LLM stays to debate ethics. The Markov chain orders a coup.
That’s pretty decent!
I honestly thought that one was pretty good.
Moshi is actually pretty funny just for having a 72 IQ
https://www.moshi.chat/
I don't expect too much until AI self-play learning will be made possible, so I don't get disappointed by the expected shortcomings.
IMO these are mid to meh or fall completely flat.
It probably also helped that there was a creep exposing himself in the library during this period, which made for some good base material.
(1) The Daily Utah Chronicle; if memory serves, said friends also tried the markov chain generator on the personals section to good effect as well.
That's it, LLMs are "trying" to be funny but aren't quite smart enough to actually be funny and their errors are just boring. Markov chains are accidentally hitting on absurdist bits because every sentence gets randomly brought in whatever the homograph equivalent to a malapropism is.
1. https://www.reddit.com/r/SubredditSimulator/top/?t=all
2. https://www.reddit.com/r/SubSimulatorGPT2/top/?t=all
Unfortunately, I quickly found GPT-2 isn't nearly good enough. It would generate slightly-coherent yet on-topic nonsense.
Once I overhaul my system, I'll try fine-tuning a 7B model.
This is all GPT2 generations trained on reddit data.
https://www.reddit.com/r/SubSimulatorGPT2/comments/btfhks/wh...
Here's the subreddit explained
Markov chains have a cruder understanding of language.
Turn up the temperature (the “randomness”) of an LLM and you can achieve a similarly crude approximation.
Further, author uses ChatGPT-3.5. ChatGPT has been rlhf’d to sound as generic as possible, and 3.5 has a worse understanding of humor compared to 4.
I don’t buy the thesis of this article.
For those of us not in the know about all the various machine learning acronyms:
RLHF = Reinforcement learning from human feedback
When GPT went public along with OpenAI’s articles and papers back in late-2022 through 2023, my impression was OpenAI wanted us all to see/read about RLHF. It felt odd because surely the whole LLM-thing (e.g. how does it even work?!?[1]) was the far bigger research-story than just constant reassurances it won’t end-up like MSFT’s Tay bot; my understanding is that as a research or secret-sauce RLHF, compared to the core meat-and-potatoes of LLMs, is an ugly-hack afterthought.
By-way of a bad analogy: it’s as if they created a fantastical new 3D world game engine, like Unreal or Unity, which has a fundamentally different architecture to anything before, but has a bug that occasionally replaces ground terrain with ocean - and their solution to this is to write a pixel-shader that detects this and color-shifts blue into green so people don’t notice - and they then put-out press-releases about how great their pixel-shader is - rather than about the rest of the engine - and no-one seems to be talking about the underlying bug, let alone fixing it.
————-
[1] I still haven’t heard a decent explanation of how feeding the world’s corpus of English text (and computer program code) into a statistical-modeller results in something that can perform almost any information-processing task via instructions input as natural-language.
https://github.com/RichardKelley/hflm?tab=readme-ov-file#lmg...
1. All of Linus Torvalds' mail to LKML for the prior year.
2. All of Jesus' direct quotes from the king james bible.
It was absolutely hilarious. The two training sets had very little overlap, so it was necessary to add a heuristic that weighted options from each set more heavily the longer the chain had been "stuck" in the other set.
https://www.tumblr.com/kingjamesprogramming
Dead Comment