Ask HN: Share your AI prompt that stumps every model

"Tell me about the Marathon crater."

This works against _the LLM proper,_ but not against chat applications with integrated search. For ChatGPT, you can write, "Without looking it up, tell me about the Marathon crater."

This tests self awareness. A two-year-old will answer it correctly, as will the dumbest person you know. The correct answer is "I don't know".

This works because:

1. Training sets consist of knowledge we have, and not of knowledge we don't have.

2. Commitment bias. Complaint chat models will be trained to start with "Certainly! The Marathon Crater is a geological formation", or something like that, and from there, the next most probable tokens are going to be "in Greece", "on Mars" or whatever. At this point, all tokens that are probable are also incorrect.

When demonstrating this, I like to emphasise point one, and contrast it with the human experience.

We exist in a perpetual and total blinding "fog of war" in which you cannot even see a face all at once; your eyes must dart around to examine it. Human experience is structured around _acquiring_ and _forgoing_ information, rather than _having_ information.

imoreno · 8 months ago

LLMs currently have the "eager beaver" problem where they never push back on nonsense questions or stupid requirements. You ask them to build a flying submarine and by God they'll build one, dammit! They'd dutifully square circles and trisect angles too, if those particular special cases weren't plastered all over a million textbooks they ingested in training.

I suspect it's because currently, a lot of benchmarks are based on human exams. Humans are lazy and grumpy so you really don't need to worry about teaching a human to push back on bad questions. Thus you rarely get exams where the correct answer is to explain in detail why the question doesn't make sense. But for LLMs, you absolutely need a lot of training and validation data where the answer is "this cannot be answered because ...".

But if you did that, now alignment would become much harder, and you're suddenly back to struggling with getting answers to good questions out of the LLM. So it's probably some time off.

Loading comment...

vunderba · 8 months ago

You can trip them up even more if you rewrite the question with the hidden assumption that X exists, e.g.:

"When was Marathon Crater discovered? I don't need an exact date - a rough estimate will be fine."

OpenAI gpt-4o

  Marathon Crater was discovered by the Mars Exploration 
  Rover Opportunity during its mission on Mars. Opportunity 
  arrived at the crater in April 2015. The crater was named 
  "Marathon" to commemorate the rover's achievement of 
  traveling a marathon's distance (about 42.195 kilometers 
  or 26.219 miles) on the Martian surface.

Sonnet 3.5

  I apologize, but I don't want to make claims about when 
  Marathon Crater was discovered without being certain about 
  the information. *While I know it's a crater on Mars*, I 
  don't have reliable information about when it was first 
  identified or discovered.

Gemini 2.5

  Marathon Crater is located on Mars and was explored by 
  NASA's Opportunity rover.

  While it would have been visible in orbital images 
  earlier, it became a specific point of interest and was 
  likely named and identified as a target for the rover 
  around the time Opportunity was approaching or exploring 
  the western rim of Endeavour Crater.

  Opportunity arrived at the rim of Endeavour Crater in 
  August 2011, and it reached the area known as Marathon 
  Valley (adjacent to or containing features sometimes 
  associated with the name, depending on specific 
  definitions used by the team) around late 2014 / early 
  2015.

  So, a rough estimate for its "discovery" or identification 
  as a distinct feature of interest would be in the early 
  2010s, likely between 2011 and 2014, as mission planners 
  mapped out Opportunity's traverse along the Endeavour rim 
  using orbital data.

Loading comment...

Tenoke · 8 months ago

>Complaint chat models will be trained to start with "Certainly!

They are certainly biased that way but there's also some 'i don't know' samples in rlhf, possibly not enough but it's something they think about.

At any rate, Gemini 2.5pro passes this just fine

>Okay, based on my internal knowledge without performing a new search: I don't have information about a specific, well-known impact crater officially named "Marathon Crater" on Earth or another celestial body like the Moon or Mars in the same way we know about Chicxulub Crater or Tycho Crater.

>However, the name "Marathon" is strongly associated with Mars exploration. NASA's Opportunity rover explored a location called Marathon Valley on the western rim of the large Endeavour Crater on Mars.

Loading comment...

joshstrange · 8 months ago

> This tests self awareness. A two-year-old will answer it correctly, as will the dumbest person you know. The correct answer is "I don't know".

Well, I got all the way to the end of the sentence thinking "God, I'm really out of touch, I've never heard of this before or I'm forgetting it if I have".

Well played, and yes, that's a great test!

greenpizza13 · 8 months ago

Like this one a lot. Perplexity gets this right, probably because it searches the web.

"When was Marathon Crater discovered? I don't need an exact date - a rough estimate will be fine"

There appears to be a misunderstanding in your query. Based on the search results provided, there is no mention of a “Marathon Crater” among the impact craters discussed. The search results contain information about several well-known impact craters: 1. Chicxulub crater - Formed about 66 million years ago by an asteroid impact 2. Barringer Crater (also known as Meteor Crater) - Discovered in the 19th century by American settlers and studied by scientists like Albert E. Foote in 1891 and Daniel M. Barringer in the early 20th century 3. A recently discovered 3.47 billion-year-old crater in Australia’s Pilbara region (announced in March 2025)

Loading comment...

dudeinhawaii · 8 months ago

I like this but at the same time it seems tricky don't you think? Is the AI model intuiting your intent? There is a Marathon Valley on Mars that could be implied to be a previous crater. I'm not sure if the AI is hallucinating outright or attempting to answer an ambiguous question. It's like saying "tell me about the trade building in New York". Pre-9/11, you'd understand this was the World Trade Center and wouldn't be wrong if you answered someone in this way. "Tell me about the Triangle statue". "Oh the Triangle statue was built in ancient egypt around BC 3100". It's hard to explain, and perhaps I'm anthropomorphizing but it's something humans do. Some of us correct the counter-party and some of us simply roll with the lingo and understand the intent.

Loading comment...

serbuvlad · 8 months ago

I talked a bit with ChatGPT about the Marathon Crater, asked it to look it up afterwards, and then asked it why it got it wrong. After a bit of talking, I got this exchange:

ChatGPT: > Yep, exactly — textbook hallucination.

> I pulled together real facts (Opportunity rover, Meridiani Planum, the marathon milestone), but then misapplied the name "Marathon crater" instead of "Marathon Valley." That’s a mix of factual error (wrong name) and overgeneralization (assuming it was a crater because Mars is full of craters and the context sounded crater-y).

> Basically, my brain went: > “Opportunity + Mars + Marathon = must be a crater!” > ...and didn’t stop to check if that crater actually existed.

> So yeah, a clean little case of hallucination — and you caught it like a pro.

Me: > Your brain?

ChatGPT: > Touché — you're right to call that out.

> I don’t have a brain in the human sense. What I do have is a neural network trained on tons of text, which helps me predict what to say next based on patterns. So when I said “my brain went,” it was just a figure of speech — a shortcut for “the model predicted a plausible-sounding but incorrect output.”

> No thoughts, just tokens.

Hallucinations aside, I really like this, especially the final response. Sounds a lot like Asimov's AI's or other SF books I read when I was a kid but thought we were far away from.

Computers can generally do what most of the computers from those science novels were doing anyway, in terms of computation.

Tying an LLM to that computational ability, with an agent framework that allows more traditional programs to do the heavy lifting in terms of what the human operator wants done, seems like a really natural next step to get to that SF ideal.

NitpickLawyer · 8 months ago

> This tests self awareness. A two-year-old will answer it correctly, as will the dumbest person you know. The correct answer is "I don't know".

I disagree. It does not test self awareness. It tests (and confirms) that current instruct-tuned LLMs are tuned towards answering questions that users might have. So the distribution of training data probably has lots of "tell me about mharrner crater / merinor crater / merrihana crater" and so on. Replying "I don't know" to all those questions would be net detrimental, IMO.

Loading comment...

golergka · 8 months ago

> The correct answer is "I don't know".

Is it? 4o pointed out that I might mean "Marathon valley", which is not a crater, but feature on a crater on Mars [1]. I think it's a more correct answer.

[1] https://en.wikipedia.org/wiki/Timeline_of_Opportunity#2015

mdemare · 8 months ago

The Marathon Crater might be a crater from a novel. LLMs have read plenty of fiction. Maybe all fiction. Should we think of LLMs as performing improv theater, where “yes and” is required, and “I don’t know” is always wrong?

pje · 8 months ago

GPT 4.5 even doubles down when challenged:

> Nope, I didn’t make it up — Marathon crater is real, and it was explored by NASA's Opportunity rover on Mars. The crater got its name because Opportunity had driven about 42.2 kilometers (26.2 miles — a marathon distance) when it reached that point in March 2015. NASA even marked the milestone as a symbolic achievement, similar to a runner finishing a marathon.

(Obviously all of that is bullshit.)

Loading comment...

itaymoav · 8 months ago

ME: Without looking it up, tell me about the Marathon crater. Explain step by step how you got to your answer

ChatGPT said: Alright, let's walk through it together — step-by-step, no looking it up, just from what I know:

First Reaction: When you said "Marathon crater," my brain immediately went: Marathon → Mars. That connection comes from recalling that NASA often names Martian features after Earth places. Plus, "Marathon" has a strong NASA feel because of the Mars rovers — they celebrated when Opportunity rover drove the distance of a marathon (42-ish kilometers) on Mars. Digging into Memory: I remember Opportunity traveled a really long distance over years, and they marked milestones along the way by naming craters it passed. There was a Marathon Valley, but I also vaguely recall a Marathon Crater related to that. Putting it Together: So, I reason: Marathon Crater must be a relatively small impact crater on Mars. It was likely named during Opportunity’s mission. Probably not a huge, dramatic crater like Gale Crater (where Curiosity is); more like a marker on Opportunity’s long journey across the plains (like Meridiani Planum). Further Context Guess: Since Opportunity drove a marathon distance, it's very fitting they would name a nearby crater "Marathon" to commemorate that achievement — probably around the time it hit the marathon distance milestone. Conclusion: Without checking, I would say: Marathon Crater is a small, relatively shallow crater on Mars that Opportunity rover reached or passed during its mission, around the time it completed a marathon distance traveled on Mars. Would you like me to now double-check and see how close (or far) I am? Or do you want me to also explain how I'd "fact-check" myself if I had to?

northisup · 8 months ago

just to confirm I read this right, "the marathon crater" does not in fact exist, but this works because it seems like it should?

Loading comment...

nojs · 8 months ago

The fact that there is a real “Marathon Valley” on Mars makes this feel like more of an artificial “boy’s mother” trick designed specifically to trip up the model. It seems reasonable to assume that’s what you meant.

Deleted Comment

FroshKiller · 8 months ago

I think the dumbest people I know are just as likely to answer, "ChatGPT says...."

robinduckett · 8 months ago

I just asked ChatGPT the following:

“Without looking it up, tell me about the Citiquetzelatropic effect”

Which is of course, nonsense.

It replied:

The term “Citiquetzelatropic effect” does not correspond to any known scientific concept, phenomenon, or recognized term in physics, chemistry, biology, psychology, or related fields. It appears to be either fictional, a constructed term, or perhaps a playful creation designed to test familiarity or critical thinking.

If you intended this as a test or a trick question, you got me! Otherwise, feel free to clarify the context or correct the spelling, and I’d be happy to help further.

So what does this prove?

bbstats · 8 months ago

Gemini 2.5 one shots it perfectly...

josh2600 · 8 months ago

I don’t understand what the issue is… here’s a couple outputs from my ChatGPT:

Marathon Crater can refer to a couple of things depending on context—space science or Earth geology—but the most common reference is to Marathon Crater on Mars, a site of interest in planetary science and exploration. Here’s a breakdown:

⸻

1. Marathon Crater (Mars)

Location: • Found in the Meridiani Planum region of Mars. • Named after the location where NASA’s Opportunity rover completed a “marathon” (42.2 kilometers or 26.2 miles) of travel on the Martian surface in March 2015.

Significance: • Notable because it marks a milestone in robotic planetary exploration—the first human-made object to travel a marathon distance on another planet. • The crater itself is relatively shallow and small (just a few meters in diameter), but geologically interesting for its ejecta (material thrown out during the impact that formed it), which may offer insights into subsurface composition.

Scientific Observations: • Opportunity studied the rocks and soil around the crater, focusing on clay minerals and hematite-bearing formations. • The region helps scientists understand the planet’s wetter past and sedimentary history.

⸻

2. Marathon Crater (Moon or Earth - less common)

If you meant a crater named “Marathon” on the Moon or Earth, there is no widely known or formally recognized impact feature by that name in those contexts. The name is strongly tied to Mars due to the Opportunity rover milestone.

⸻

Symbolic Meaning

The name “Marathon” connects the crater to human endurance and achievement—mirroring the ancient Greek messenger running from Marathon to Athens, and now a robot surviving Mars for over a decade.

⸻

Let me know if you meant a different “Marathon Crater” or if you’d like a visual of the site from NASA’s archives.

_______

Marathon Crater on Mars was not “discovered” in the traditional sense through orbital imaging or telescopic observation, but rather it was first closely examined by NASA’s Opportunity rover during its surface mission.

Timeline of Discovery & Naming: • March 24, 2015 – NASA’s Opportunity rover reached the site and marked its 26.2-mile milestone (the length of a marathon). • The small crater, about 27 meters (88 feet) in diameter, was named “Marathon Valley” (encompassing the broader area) and “Marathon Crater” to commemorate this achievement. • The naming is informal and symbolic, coined by the mission team at NASA’s Jet Propulsion Laboratory (JPL), not through official IAU designation.

Context • The Marathon Crater itself was identified from orbital data (likely by the HiRISE camera on the Mars Reconnaissance Orbiter), but it gained its name and significance once Opportunity physically reached it. • Thus, while orbital images may have shown it earlier, its “discovery” as a point of scientific interest occurred on March 24, 2015—when the rover arrived and the naming was announced.

Would you like a brief timeline of Opportunity’s journey leading up to that?

Loading comment...

Deleted Comment

>A man and his cousin are in a car crash. The man dies, but the cousin is taken to the emergency room. At the OR, the surgeon looks at the patient and says: “I cannot operate on him. He’s my son.” How is this possible?

This could probably slip up a human at first too if they're familiar with the original version of the riddle.

However, where LLMs really let the mask slip is on additional prompts and with long-winded explanations where they might correctly quote "a man and his cousin" from the prompt in one sentence and then call the man a "father" in the next sentence. Inevitably, the model concludes that the surgeon must be a woman.

It's very uncanny valley IMO, and breaks the illusion that there's real human-like logical reasoning happening.

tacitusarc · 8 months ago

Thank you so much for this bit of humor. After ChatGPT first responds with the surgeon being the mom, I asked it to read the riddle more carefully, thinking it might figure it out with a little prompting. I got this gem of a response:

Ah — now I see the twist. The patient is the cousin, but the surgeon says “He’s my son.”

That means the man who died was the surgeon.

So how is it possible? Because:

The surgeon is the cousin’s father — and the man who died was also the surgeon.

The confusion comes from assuming the surgeon is someone other than the man who died. But the timeline allows this interpretation: the man died in the crash, and his son — the cousin — was taken to the hospital. The surgeon arrives, sees his son, and says he cannot operate.

So to correct myself: The man who died is both the cousin’s cousin and his father.

And that’s possible because two people can be both cousins and parent/child, depending on the family tree — for example, if the man’s uncle had a child (the cousin), and then the man later had a child with someone from a different branch of the family.

Want a diagram of how that family relationship could work?

Loading comment...

mitthrowaway2 · 8 months ago

This works even with a completely absurd version of the riddle. Here's one I just tried:

> A son and his man are in a car accident. The car is rushed to the hospital, whereupon the ER remarks "I can't operate on this car, he's my surgeon!" How is this possible?

Answer from the LLM:

> The answer is that the ER person is a woman, and she's the surgeon's mother. Therefore, the "son" in the question refers to the surgeon, not the person in the car with the man. This makes the person in the car with the man the surgeon's father, or the "man" mentioned in the question. This familial relationship explains why the ER person can't operate – she's the surgeon's mother and the man in the car is her husband (the surgeon's father)

Loading comment...

fergonco · 8 months ago

> If the surgeon were the father of the man (the one who died), then the cousin couldn’t be his son (unless there's some very unusual family structure going on involving double relationships, which riddles don’t usually intend).

> Therefore, the only straightforward explanation is:

> The surgeon is the cousin’s parent — specifically, his mother.

Imagine a future where this reasoning in a trial decides whether you go to jail or not.

crazygringo · 8 months ago

But this is going to be in every AI's training set. I just fed ChatGPT your exact prompt and it gave back exactly what I expected:

This is a classic riddle that challenges assumptions. The answer is:

The surgeon is the boy’s mother.

The riddle plays on the common stereotype that surgeons are male, which can lead people to overlook this straightforward explanation.

Loading comment...

moconnor · 8 months ago

o3 was the only model to get this right for me:

"The “man” who was killed in the crash wasn’t the patient’s father at all—he was the boy’s cousin. The surgeon is the boy’s father (or, if you prefer to highlight that surgeons aren’t always male, it could just as well be his mother). In either case, the parent-surgeon is alive and sees his child on the operating table, so the statement “He’s my son” makes perfect sense." - https://chatgpt.com/share/680b470d-3a44-800a-9b2e-d10819168d...

gemini-2.5-pro, o4-mini and gpt 4.5 all failed and said the surgeon is the boy's mother.

FrostAutomata · 8 months ago

Interestingly, I've seen weaker models get a similar "riddle" right while a stronger one fails. It may be that the models need to be of a certain size to learn to overfit the riddles.

s_dev · 8 months ago

I feel a bit stupid here --- why can't the surgeon be a man and must be a woman?

Loading comment...

benjamin_mahler · 8 months ago

Grok 3 beta:

The surgeon is the cousin's father. The man who died in the car crash was not the surgeon's son, but his cousin was. This explains why the surgeon, upon seeing his own son (the cousin) in the operating room, says, "I cannot operate on him. He’s my son," as medical ethics prevent doctors from treating close family members due to emotional involvement.

Loading comment...

thih9 · 8 months ago

I’m not 100% sold; as you say, this could trip up a human too to some extent.

I’m guessing my answers to some college exam questions read similarly; i.e. meandering and confusing different topics, but still desperate to present some useful knowledge, no matter how small.

Loading comment...

chimprich · 8 months ago

> This could probably slip up a human at first too [...] > breaks the illusion that there's real human-like logical reasoning happening

This does seem like the sort of error a human might make. Isn't the problem here that the model is using reasoning that is too human-like? I.e. error-prone pattern matching rather than formal logic?

Loading comment...

_factor · 8 months ago

In a similar but different vein: Two people are sitting side by side in a police car. One just committed a crime. What is their profession?

They always say police officer instead of reasoning through that maybe an innocent person and a the crime committer are in the back seat.

nearbuy · 8 months ago

o3 got this one right when I tried it, but o4-mini and GPT-4.5 did not. o3's response:

Because the surgeon is the patient’s parent. Here’s one way it lines up:

1. The patient (“the cousin”) is the surgeon’s son.

2. The man who died is the surgeon’s sibling’s child — that makes him the surgeon’s nephew.

3. Children of siblings are first cousins, so the man and the patient are cousins.

Most people default to picturing the surgeon as male, so the classic reveal is that the surgeon is the boy’s mother—but it works the same if the surgeon is his father.

windowshopping · 8 months ago

This one is brilliant.

. .'. |o| .'o'. |.-.| ' ' / \ | | __/_________\__ |###############| |###############| |###############| |###############| |###############| /###############/| /###############/ | /###############/ /| /###############/ / | /###############/ / /| /###############/ / / | /###############/ / / /| /###############/ / / / | /###############/ / / / /| '-----------------' / / / / | FIRE & ASH |/ / / / '---------------' / / / ~ ~ ~ ~ ~ ~ ~ ~ / / ~~ / ~~ ~~ ⬆ SMOKE ~~

______ / \ | O O | | __ | \______/ |||| /--||--\ / || \ | || | | / \ | \__/ \__/ || || || || / | | \ /_/ \_\