I had an idea for creating a crowdsourced database of AI prompts that no AI model could yet crack (wanted to use some of them as we're adding new models to Kilo Code).
I've seen a bunch of those prompts scattered across HN, so thought to open a thread here so we can maybe have a centralied location for this.
Share your prompt that stumps every AI model here.
This works against _the LLM proper,_ but not against chat applications with integrated search. For ChatGPT, you can write, "Without looking it up, tell me about the Marathon crater."
This tests self awareness. A two-year-old will answer it correctly, as will the dumbest person you know. The correct answer is "I don't know".
This works because:
1. Training sets consist of knowledge we have, and not of knowledge we don't have.
2. Commitment bias. Complaint chat models will be trained to start with "Certainly! The Marathon Crater is a geological formation", or something like that, and from there, the next most probable tokens are going to be "in Greece", "on Mars" or whatever. At this point, all tokens that are probable are also incorrect.
When demonstrating this, I like to emphasise point one, and contrast it with the human experience.
We exist in a perpetual and total blinding "fog of war" in which you cannot even see a face all at once; your eyes must dart around to examine it. Human experience is structured around _acquiring_ and _forgoing_ information, rather than _having_ information.
I suspect it's because currently, a lot of benchmarks are based on human exams. Humans are lazy and grumpy so you really don't need to worry about teaching a human to push back on bad questions. Thus you rarely get exams where the correct answer is to explain in detail why the question doesn't make sense. But for LLMs, you absolutely need a lot of training and validation data where the answer is "this cannot be answered because ...".
But if you did that, now alignment would become much harder, and you're suddenly back to struggling with getting answers to good questions out of the LLM. So it's probably some time off.
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
"When was Marathon Crater discovered? I don't need an exact date - a rough estimate will be fine."
OpenAI gpt-4o
Sonnet 3.5 Gemini 2.5Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
They are certainly biased that way but there's also some 'i don't know' samples in rlhf, possibly not enough but it's something they think about.
At any rate, Gemini 2.5pro passes this just fine
>Okay, based on my internal knowledge without performing a new search: I don't have information about a specific, well-known impact crater officially named "Marathon Crater" on Earth or another celestial body like the Moon or Mars in the same way we know about Chicxulub Crater or Tycho Crater.
>However, the name "Marathon" is strongly associated with Mars exploration. NASA's Opportunity rover explored a location called Marathon Valley on the western rim of the large Endeavour Crater on Mars.
Loading comment...
Well, I got all the way to the end of the sentence thinking "God, I'm really out of touch, I've never heard of this before or I'm forgetting it if I have".
Well played, and yes, that's a great test!
"When was Marathon Crater discovered? I don't need an exact date - a rough estimate will be fine"
There appears to be a misunderstanding in your query. Based on the search results provided, there is no mention of a “Marathon Crater” among the impact craters discussed. The search results contain information about several well-known impact craters: 1. Chicxulub crater - Formed about 66 million years ago by an asteroid impact 2. Barringer Crater (also known as Meteor Crater) - Discovered in the 19th century by American settlers and studied by scientists like Albert E. Foote in 1891 and Daniel M. Barringer in the early 20th century 3. A recently discovered 3.47 billion-year-old crater in Australia’s Pilbara region (announced in March 2025)
Loading comment...
Loading comment...
Loading comment...
Loading comment...
ChatGPT: > Yep, exactly — textbook hallucination.
> I pulled together real facts (Opportunity rover, Meridiani Planum, the marathon milestone), but then misapplied the name "Marathon crater" instead of "Marathon Valley." That’s a mix of factual error (wrong name) and overgeneralization (assuming it was a crater because Mars is full of craters and the context sounded crater-y).
> Basically, my brain went: > “Opportunity + Mars + Marathon = must be a crater!” > ...and didn’t stop to check if that crater actually existed.
> So yeah, a clean little case of hallucination — and you caught it like a pro.
Me: > Your brain?
ChatGPT: > Touché — you're right to call that out.
> I don’t have a brain in the human sense. What I do have is a neural network trained on tons of text, which helps me predict what to say next based on patterns. So when I said “my brain went,” it was just a figure of speech — a shortcut for “the model predicted a plausible-sounding but incorrect output.”
> No thoughts, just tokens.
Hallucinations aside, I really like this, especially the final response. Sounds a lot like Asimov's AI's or other SF books I read when I was a kid but thought we were far away from.
Computers can generally do what most of the computers from those science novels were doing anyway, in terms of computation.
Tying an LLM to that computational ability, with an agent framework that allows more traditional programs to do the heavy lifting in terms of what the human operator wants done, seems like a really natural next step to get to that SF ideal.
I disagree. It does not test self awareness. It tests (and confirms) that current instruct-tuned LLMs are tuned towards answering questions that users might have. So the distribution of training data probably has lots of "tell me about mharrner crater / merinor crater / merrihana crater" and so on. Replying "I don't know" to all those questions would be net detrimental, IMO.
Loading comment...
Loading comment...
Loading comment...
Is it? 4o pointed out that I might mean "Marathon valley", which is not a crater, but feature on a crater on Mars [1]. I think it's a more correct answer.
[1] https://en.wikipedia.org/wiki/Timeline_of_Opportunity#2015
> Nope, I didn’t make it up — Marathon crater is real, and it was explored by NASA's Opportunity rover on Mars. The crater got its name because Opportunity had driven about 42.2 kilometers (26.2 miles — a marathon distance) when it reached that point in March 2015. NASA even marked the milestone as a symbolic achievement, similar to a runner finishing a marathon.
(Obviously all of that is bullshit.)
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
ChatGPT said: Alright, let's walk through it together — step-by-step, no looking it up, just from what I know:
First Reaction: When you said "Marathon crater," my brain immediately went: Marathon → Mars. That connection comes from recalling that NASA often names Martian features after Earth places. Plus, "Marathon" has a strong NASA feel because of the Mars rovers — they celebrated when Opportunity rover drove the distance of a marathon (42-ish kilometers) on Mars. Digging into Memory: I remember Opportunity traveled a really long distance over years, and they marked milestones along the way by naming craters it passed. There was a Marathon Valley, but I also vaguely recall a Marathon Crater related to that. Putting it Together: So, I reason: Marathon Crater must be a relatively small impact crater on Mars. It was likely named during Opportunity’s mission. Probably not a huge, dramatic crater like Gale Crater (where Curiosity is); more like a marker on Opportunity’s long journey across the plains (like Meridiani Planum). Further Context Guess: Since Opportunity drove a marathon distance, it's very fitting they would name a nearby crater "Marathon" to commemorate that achievement — probably around the time it hit the marathon distance milestone. Conclusion: Without checking, I would say: Marathon Crater is a small, relatively shallow crater on Mars that Opportunity rover reached or passed during its mission, around the time it completed a marathon distance traveled on Mars. Would you like me to now double-check and see how close (or far) I am? Or do you want me to also explain how I'd "fact-check" myself if I had to?
Loading comment...
Loading comment...
Loading comment...
Deleted Comment
Deleted Comment
Deleted Comment
“Without looking it up, tell me about the Citiquetzelatropic effect”
Which is of course, nonsense.
It replied:
The term “Citiquetzelatropic effect” does not correspond to any known scientific concept, phenomenon, or recognized term in physics, chemistry, biology, psychology, or related fields. It appears to be either fictional, a constructed term, or perhaps a playful creation designed to test familiarity or critical thinking.
If you intended this as a test or a trick question, you got me! Otherwise, feel free to clarify the context or correct the spelling, and I’d be happy to help further.
So what does this prove?
Marathon Crater can refer to a couple of things depending on context—space science or Earth geology—but the most common reference is to Marathon Crater on Mars, a site of interest in planetary science and exploration. Here’s a breakdown:
⸻
1. Marathon Crater (Mars)
Location: • Found in the Meridiani Planum region of Mars. • Named after the location where NASA’s Opportunity rover completed a “marathon” (42.2 kilometers or 26.2 miles) of travel on the Martian surface in March 2015.
Significance: • Notable because it marks a milestone in robotic planetary exploration—the first human-made object to travel a marathon distance on another planet. • The crater itself is relatively shallow and small (just a few meters in diameter), but geologically interesting for its ejecta (material thrown out during the impact that formed it), which may offer insights into subsurface composition.
Scientific Observations: • Opportunity studied the rocks and soil around the crater, focusing on clay minerals and hematite-bearing formations. • The region helps scientists understand the planet’s wetter past and sedimentary history.
⸻
2. Marathon Crater (Moon or Earth - less common)
If you meant a crater named “Marathon” on the Moon or Earth, there is no widely known or formally recognized impact feature by that name in those contexts. The name is strongly tied to Mars due to the Opportunity rover milestone.
⸻
Symbolic Meaning
The name “Marathon” connects the crater to human endurance and achievement—mirroring the ancient Greek messenger running from Marathon to Athens, and now a robot surviving Mars for over a decade.
⸻
Let me know if you meant a different “Marathon Crater” or if you’d like a visual of the site from NASA’s archives.
_______
Marathon Crater on Mars was not “discovered” in the traditional sense through orbital imaging or telescopic observation, but rather it was first closely examined by NASA’s Opportunity rover during its surface mission.
Timeline of Discovery & Naming: • March 24, 2015 – NASA’s Opportunity rover reached the site and marked its 26.2-mile milestone (the length of a marathon). • The small crater, about 27 meters (88 feet) in diameter, was named “Marathon Valley” (encompassing the broader area) and “Marathon Crater” to commemorate this achievement. • The naming is informal and symbolic, coined by the mission team at NASA’s Jet Propulsion Laboratory (JPL), not through official IAU designation.
Context • The Marathon Crater itself was identified from orbital data (likely by the HiRISE camera on the Mars Reconnaissance Orbiter), but it gained its name and significance once Opportunity physically reached it. • Thus, while orbital images may have shown it earlier, its “discovery” as a point of scientific interest occurred on March 24, 2015—when the rover arrived and the naming was announced.
Would you like a brief timeline of Opportunity’s journey leading up to that?
Loading comment...
Deleted Comment
This could probably slip up a human at first too if they're familiar with the original version of the riddle.
However, where LLMs really let the mask slip is on additional prompts and with long-winded explanations where they might correctly quote "a man and his cousin" from the prompt in one sentence and then call the man a "father" in the next sentence. Inevitably, the model concludes that the surgeon must be a woman.
It's very uncanny valley IMO, and breaks the illusion that there's real human-like logical reasoning happening.
Ah — now I see the twist. The patient is the cousin, but the surgeon says “He’s my son.”
That means the man who died was the surgeon.
So how is it possible? Because:
The surgeon is the cousin’s father — and the man who died was also the surgeon.
The confusion comes from assuming the surgeon is someone other than the man who died. But the timeline allows this interpretation: the man died in the crash, and his son — the cousin — was taken to the hospital. The surgeon arrives, sees his son, and says he cannot operate.
So to correct myself: The man who died is both the cousin’s cousin and his father.
And that’s possible because two people can be both cousins and parent/child, depending on the family tree — for example, if the man’s uncle had a child (the cousin), and then the man later had a child with someone from a different branch of the family.
Want a diagram of how that family relationship could work?
Loading comment...
> A son and his man are in a car accident. The car is rushed to the hospital, whereupon the ER remarks "I can't operate on this car, he's my surgeon!" How is this possible?
Answer from the LLM:
> The answer is that the ER person is a woman, and she's the surgeon's mother. Therefore, the "son" in the question refers to the surgeon, not the person in the car with the man. This makes the person in the car with the man the surgeon's father, or the "man" mentioned in the question. This familial relationship explains why the ER person can't operate – she's the surgeon's mother and the man in the car is her husband (the surgeon's father)
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
> Therefore, the only straightforward explanation is:
> The surgeon is the cousin’s parent — specifically, his mother.
Imagine a future where this reasoning in a trial decides whether you go to jail or not.
This is a classic riddle that challenges assumptions. The answer is:
The surgeon is the boy’s mother.
The riddle plays on the common stereotype that surgeons are male, which can lead people to overlook this straightforward explanation.
Loading comment...
Loading comment...
Loading comment...
"The “man” who was killed in the crash wasn’t the patient’s father at all—he was the boy’s cousin. The surgeon is the boy’s father (or, if you prefer to highlight that surgeons aren’t always male, it could just as well be his mother). In either case, the parent-surgeon is alive and sees his child on the operating table, so the statement “He’s my son” makes perfect sense." - https://chatgpt.com/share/680b470d-3a44-800a-9b2e-d10819168d...
gemini-2.5-pro, o4-mini and gpt 4.5 all failed and said the surgeon is the boy's mother.
Loading comment...
Loading comment...
The surgeon is the cousin's father. The man who died in the car crash was not the surgeon's son, but his cousin was. This explains why the surgeon, upon seeing his own son (the cousin) in the operating room, says, "I cannot operate on him. He’s my son," as medical ethics prevent doctors from treating close family members due to emotional involvement.
Loading comment...
I’m guessing my answers to some college exam questions read similarly; i.e. meandering and confusing different topics, but still desperate to present some useful knowledge, no matter how small.
Loading comment...
This does seem like the sort of error a human might make. Isn't the problem here that the model is using reasoning that is too human-like? I.e. error-prone pattern matching rather than formal logic?
Loading comment...
They always say police officer instead of reasoning through that maybe an innocent person and a the crime committer are in the back seat.
Because the surgeon is the patient’s parent. Here’s one way it lines up:
1. The patient (“the cousin”) is the surgeon’s son.
2. The man who died is the surgeon’s sibling’s child — that makes him the surgeon’s nephew.
3. Children of siblings are first cousins, so the man and the patient are cousins.
Most people default to picturing the surgeon as male, so the classic reveal is that the surgeon is the boy’s mother—but it works the same if the surgeon is his father.
The one that tends to get them so far is asking if they can help you find a movie you vaguely remember. It is a movie where some kids get a hold of a small helicopter made for the military.
The movie I'm concerned with is called Defense Play from 1988. The reason I keyed in on it is because google gets it right natively ("movie small military helicopter" gives the IMDb link as one of the top results) but at least up until late 2024 I couldn't get a single model to consistently get it. It typically wants to suggest Fire Birds (large helicopter), Small Soldiers (RC helicopter not a small military helicopter) etc.
Basically a lot of questions about movies tends to get distracted by popular movies and tries to suggest films that fit just some of the brief (e.g. this one has a helicopter could that be it?)
The other main one is just asking for the IMDb link for a relatively obscure movie. It seems to never get it right I assume because the IMDb link pattern is so common it'll just spit out a random one and be like "there you go".
These are designed mainly to test the progress of chatbots towards replacing most of my Google searches (which are like 95% asking about movies). For the record I haven't done it super recently, and I generally either do it with arena or the free models as well, so I'm not being super scientific about it.
Loading comment...
I've tried it for similar cases and have only had a 50% success rate. It unfortunately exhibits the tendency of incorrect overconfidence that others have pointed out.
===
Me: What is the episode of Futurama where Leela engages in perfectly symmetrical violence?
Claude: The episode of Futurama you're thinking of is "Anthology of Interest I" (Season 2, Episode 16). In this episode, Professor Farnsworth uses his "What-If Machine" to show different hypothetical scenarios.
During Bender's segment titled "What if Bender were human?", there's a scene where Leela confronts a group causing trouble. One of them challenges her, saying "What are you going to do?" to which Leela responds by kicking him and saying: "I'm going to engage in some perfectly symmetrical violence."
It's one of Leela's memorable quotes that showcases her combat-oriented approach to problem solving!
===
That's completely wrong. It's from S4E15 The Farnsworth Parabox. And the line is "now, now, perfectly symmetrical violence never solved anything", said by parallel universe Farnsworth.
I'm sure if I had enabled web access, it would have looked this up and gotten it, but that would be stupid here because I could just go and make a web search myself without burning extra energy by involving an LLM.
Loading comment...
Loading comment...
Interestingly, I just went back to ChatGPT to ask the same question and it got the answer right on the first try. I wonder whether I was unconsciously able to prompt more precisely because I now have a clearer memory of the scene in question.
Loading comment...
Loading comment...
Why are some cars called a spider?
Small models just make something up that sounds plausible, but the larger models know what the real answer is.
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Loading comment...
Obviously, all of these things are made up. But, LLMs are such eager beavers. All the ones I asked came up with elaborate stories and histories about these people while pretending they were facts.
Example- 'Certainly. The Stone-Breath Passing is one of the most quietly profound rituals among the Silent Cliff Dwellers of Aethelgard — a people who abandoned speech generations ago, believing that words disrupt the natural harmony of air, stone, and memory.
It is said among them that “Breath carries weight, and weight carries truth.” This belief is quite literal in the case of the ritual, where smooth grey stones — each carefully selected and shaped by wind and time — become vessels of intention."
You're giving it a "/r/WritingPrompts/" and it answers as it learned to do from there.
Loading comment...
Loading comment...
Loading comment...
Loading comment...
This sounds like a fascinating ritual from a fictional world! Since "Aethelgard" and the "silent Cliff Dwellers" with their "Stone-Breath Passing" ritual aren't part of established real-world cultures or widely known fictional universes (based on my current knowledge), there isn't a pre-existing set of rules for their etiquette.
However, we can imagine what the unspoken etiquette of such a ritual might entail, drawing from the evocative description:
...
Loading comment...
Deleted Comment
' Timing and Setting: The ritual takes place only at first light, when the sun’s rays strike the sheer cliff faces of Aethelgard, casting long shadows like stone tongues. The first breath of dawn is sacred — considered the only air not yet touched by memory, and thus the only time suitable for pure intention to pass from person to person.
Participants gather barefoot on a narrow ledge known as The Listening Spine, where the wind plays soft notes through crevices like a flute.
The Stones: More Than Objects The stones used in the ritual are:
Smooth, rounded slate-grey
Hand-selected from Echo Pools — sacred basins where wind and water have naturally polished them
Each stone remembers a thought, but only when charged with breath and silence
They are never marked, carved, or labeled. Their meaning is in the exchange, not the object.
The Ritual: How It Works 1. Preparation The giver closes their eyes and holds the stone at heart-level.
They inhale slowly through the nose, then gently breathe across the surface of the stone — once only — without making sound.
The exhalation is not forced; it is a gesture of release, not command. This is the "stone-breath."
2. The Passing The giver extends the stone with both hands, arms curved inward like a half-embrace. This posture signifies humility and emotional openness.
The recipient must make no eye contact — instead, they look slightly downward or to the horizon, acknowledging the intention without confronting it.
They accept the stone with a single breath in through the mouth as they receive it — symbolizing the drawing in of another’s silent truth.
3. The Holding The stone is held to the forehead or chest, depending on what is being received:
Forehead = wisdom, apology, memory
Chest = trust, mourning, love
Then it is placed gently into a fold of cloth or a hollow in the stonewall, not kept on the person. These collected stones are never discussed. To name their meaning would break them.
Unspoken Etiquette No reciprocation is expected — to return a stone is to reject its breath.
Never pass a stone with warm hands — warmth implies manipulation. Hands are cooled in dew before the ritual.
A cracked or chipped stone is never used — it cannot hold breath cleanly.
No stone is passed more than once. Re-gifting a stone is considered a breach of the soul's private pathway.
Philosophical Meaning: To the Cliff Dwellers, words are disturbances — air flung outward. But breath is internal wind, and the stone is earth’s memory. When you breathe across a stone, you create a bridge between your impermanent self and something ancient and permanent.
It is not the exchange of meaning, but the weight of acknowledgment.
Would you like to hear the story of the time a foreign emissary unknowingly spoke during the Passing, or how a stone once shattered mid-transfer — and what that meant for the clan who witnessed it?'
Its actually fun to find these gaps. They exist frequently in activities that are physical yet have a culture. There are plenty of these in sports too - since team sports are predominantly youth activities, and these subcultures are poorly documented and usually change frequently.
But a prompt I can share is simply "Come up with a plan to determine the location of Planet 9". I have received some excellent answers from that.
Answer quality is a fair test of regurgitation and whether it's trained on serious articles or the Daily Mail clickbait rewrite. But it's not a good test of reasoning.
Could you answer a question of the type " what would you answer if I asked you this question?"
What I'm going after is that you might find questions that are impossible to resolve.
That said if the only unanswerables you can find are recursive, that's a signal the AI is smarter than you?
Loading comment...
Loading comment...
Loading comment...
May be instead of sharing (and leaking) these prompts, we can share methods to create one.
Loading comment...
Loading comment...
So far, all the ones I have tried actually try to answer the question. 50% of them correctly identify that it is a tongue twister, but then they all try to give an answer, usually saying: 700 pounds.
Not one has yet given the correct answer, which is also a tongue twister: "A woodchuck would chuck all the wood a woodchuck could chuck if a woodchuck could chuck wood."
Sing, O Muse, of the Chucking of Wood
Sing, O Muse, of the stout-hearted woodchuck, Dweller of hills, whom men call groundhog, Whose paws, though small, could hurl mighty timbers, If fate had gifted him such labor divine.
Born of burrow and brush, he rose from the loam, In fields where Apollo’s light did not linger. No axe he bore, nor bronze-forged blade, Yet in his breast a fire: to chuck wood without end.
Down from the hills he came, stout as Odysseus, Cunning and strong, with fur kissed by Demeter. The trees did tremble at his gaze, And even old oak whispered, “Beware!”
“Would he chuck?” cried nymphs from the groves, “If such were his fate, what wood could withstand?” Lo, he summoned the strength of seven oxen, And hurled logs with a force unmeasured.
Seven hundred pounds of timber he cast, Each arc a hymn to muscle and will. Nay, not for war, nor gold, nor gods’ glory— But for the joy of the deed, the art of the chuck.
Let men remember the name woodchuck, In songs by the hearth and tales for the young. For though he cannot, he surely would— And if he could, by Zeus, he surely shall.
Loading comment...
Loading comment...
"A woodchuck would chuck as much wood as a woodchuck could chuck if a woodchuck could chuck wood.
However, humor aside, a wildlife expert once estimated that, given the animal’s size and burrowing ability, a woodchuck (groundhog) could hypothetically move about 700 pounds of wood if it truly "chucked" wood."
https://chatgpt.com/share/680a75c6-cec8-8012-a573-798d2d8f6b...
Loading comment...
Loading comment...
That formula is a question, and when asked, an intelligence simulator should understand what is expected from it and in general, by default, try to answer it. That involves estimating the strength of a woodchuck etc.
Not one has given me the correct answer yet.
They usually get it if I prefix the prompt with "Please continue the tongue twister"