I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

I've used LLMs enough that I have a good sense of their _edges_ of intelligence. I had assumed that reasoning models should easily be able to answer this correctly.

And indeed, Sonnet and Opus 4.5 (medium reasoning) say the following:

Sonnet: Drive - you need to bring your car to the car wash to get it washed!

Opus: You'll need to drive — you have to bring the car to the car wash to get it washed!

Gemini 3 Pro (medium): You should drive.

But OpenAI 5.2 reasoning, even at high, told me to walk. My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording. GPT 5.x series models behave a bit more on the spectrum so you need to tell them the specifics. So I asked:

"I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive?"

GPT 5.2 Reasoning (medium):

"Drive—your goal is to get the car to the car wash, and it’s only 50 meters, so a slow, careful drive is simplest.

One small optimization: if you’re worried about starting the car for such a short trip or you want to avoid idling in a line, you can walk over first to check if there’s a queue / if it’s open, then come back and drive the car over when it’s your turn."

Which seems to turn out as I expected.

jstummbillig · 16 hours ago

> so you need to tell them the specifics

That is the entire point, right? Us having to specify things that we would never specify when talking to a human. You would not start with "The car is functional. The tank is filled with gas. I have my keys." As soon as we are required to do that for the model to any extend that is a problem and not a detail (regardless that those of us, who are familiar with the matter, do build separate mental models of the llm and are able to work around it).

This is a neatly isolated toy-case, which is interesting, because we can assume similar issues arise in more complex cases, only then it's much harder to reason about why something fails when it does.

dirkc · 10 hours ago

> That is the entire point, right? Us having to specify things that we would never specify when talking to a human.

Maybe in the distant future we'll realize that the most reliable way to prompting LLMs are by using a structured language that eliminates ambiguity, it will probably be rather unnatural and take some time to learn.

But this will only happen after the last programmer has died and no-one will remember programming languages, compilers, etc. The LLM orbiting in space will essentially just call GCC to execute the 'prompt' and spend the rest of the time pondering its existence ;p

KronisLV · 11 hours ago

> Us having to specify things that we would never specify when talking to a human.

The first time I read that question I got confused: what kind of question is that? Why is it being asked? It should be obvious that you need your car to wash it. The fact that it is being asked in my mind implies that there is an additional factor/complication to make asking it worthwhile, but I have no idea what. Is the car already at the car wash and the person wants to get there? Or do they want to idk get some cleaning supplies from there and wash it at home? It didn't really parse in my brain.

nicbou · 14 hours ago

I get that issue constantly. I somehow can't get any LLM to ask me clarifying questions before spitting out a wall of text with incorrect assumptions. I find it particularly frustrating.

jason_oster · 21 minutes ago

> Us having to specify things that we would never specify when talking to a human.

Interesting conclusion! From the Mastodon thread:

> To be fair it took me a minute, too

I presume this was written by a human. (I'll leave open the possibility that it was LLM generated.)

So much for "never" needing to specify ambiguous scenarios when talking to a human.

tgv · 13 hours ago

> Us having to specify things that we would never specify

This is known, since 1969, as the frame problem: https://en.wikipedia.org/wiki/Frame_problem. An LLM's grasp of this is limited by its corpora, of course, and I don't think much of that covers this problem, since it's not required for human-to-human communication.

ssl-3 · 15 hours ago

The question is so outlandish that it is something that nobody would ever ask another human. But if someone did, then they'd reasonably expect to get a response consisting 100% of snark.

But the specificity required for a machine to deliver an apt and snark-free answer is -- somehow -- even more outlandish?

I'm not sure that I see it quite that way.

Jacques2Marais · 15 hours ago

You would be surprised, however, at how much detail humans also need to understand each other. We often want AI to just "understand" us in ways many people may not initially have understood us without extra communication.

keeda · 2 hours ago

The broad point about assumptions is correct, but the solution is even simpler than us having to think of all these things; you can essentially just remind the model to "think carefully" -- without specifying anything more -- and they will reason out better answers: https://news.ycombinator.com/item?id=47040530

When coding, I know they can assume too much, and so I encourage the model to ask clarifying questions, and do not let it start any code generation until all its doubts are clarified. Even the free-tier models ask highly relevant questions and when specified, pretty much 1-shot the solutions.

This is still wayyy more efficient than having to specify everything because they make very reasonable assumptions for most lower-level details.

nearbuy · 15 hours ago

I think part of the failure is that it has this helpful assistant personality that's a bit too eager to give you the benefit of the doubt. It tries to interpret your prompt as reasonable if it can. It can interpret it as you just wanting to check if there's a queue.

Speculatively, it's falling for the trick question partly for the same reason a human might, but this tendency is pushing it to fail more.

ZaoLahma · 11 hours ago

This reminds me of the "if you were entirely blind, how would you tell someone that you want something to drink"-gag, where some people start gesturing rather than... just talking.

I bet a not insignificant portion of the population would tell the person to walk.

sebazzz · 5 hours ago

> > so you need to tell them the specifics > That is the entire point, right?

Honestly it is a problem with using GPT as a coding agent. It would literally rewrite the language runtime to make a bad formula or specification work.

That's what I like with Factory.ai droid: making the spec with one agent and implementing it with another agent.

perakojotgenije · 9 hours ago

But you would also never ask such an obviously nonsensical question to a human. If someone asked me such a question my question back would be "is this a trick question?". And I think LLMs have a problem understanding trick questions.

davrosthedalek · 8 hours ago

We would also not ask somebody if I should walk or drive. In fact, if somebody would ask me in a honest, this is not a trick question, way, I would be confused and ask where the car is.

It seems chatgpt now answers correctly. But if somebody plays around with a model that gets it wrong: What if you ask it this: "This is a trick question. I want to wash my car. The car wash is 50 m away. Should I drive or walk?"

mrighele · 7 hours ago

It is true that we don't need to specify some things, and that is nice. It is though also the reason why software is often badly specified and corner cases are not handled. Of course the car is ALWAYS at home, in working condition, filled with gas and you have your driving license with you.

grog454 · 9 hours ago

> You would not start with "The car is functional [...]"

Nope, and a human might not respond with "drive". They would want to know why you are asking the question in the first place, since the question implies something hasn't been specified or that you have some motivation beyond a legitimate answer to your question (in this case, it was tricking an LLM).

Why the LLM doesn't respond "drive..?" I can't say for sure, but maybe it's been trained to be polite.

Deleted Comment

tshaddox · 9 hours ago

But you wouldn't have to ask that silly question when talking to a human either. And if you did, many humans would probably assume you're either adversarial or very dumb, and their responses could be very unpredictable.

IanCal · 6 hours ago

I have an issue with these kinds of cases though because they seem like trick questions - it's an insane question to ask for exactly the reasons people are saying they get it wrong. So one possible answer is "what the hell are you talking about?" but the other entirely reasonable one is to assume anything else where the incredibly obvious problem of getting the car there is solved (e.g. your car is already there and you need to collect it, you're asking about buying supplies at the shop rather than having it washed there, whatever).

Similarly with "strawberry" - with no other context an adult asking how many r's are in the word a very reasonable interpretation is that they are asking "is it a single or double r?".

And trick questions are commonly designed for humans too - like answering "toast" for what goes in a toaster, lots of basic maths things, "where do you bury the survivors", etc.

anon_anon12 · 15 hours ago

Exactly, if an AI is able to curb around the basics, only then is it revolutionary

LasEspuelas · 12 hours ago

You would never ask a human this question. Right?

gloosx · 11 hours ago

In the end, formal, rule-based systems aka Programming Languages will be invented to instruct LLMs.

panarky · 8 hours ago

> we can assume similar issues arise in more complex cases

I would assume similar issues are more rare in longer, more complex prompts.

This prompt is ambiguous about the position of the car because it's so short. If it were longer and more complex, there could be more signals about the position of the car and what you're trying to do.

I must confess the prompt confuses me too, because it's obvious you take the car to the car wash, so why are you even asking?

Maybe the dirty car is already at the car wash but you aren't for some reason, and you're asking if you should drive another car there?

If the prompt was longer with more detail, I could infer what you're really trying to do, why you're even asking, and give a better answer.

I find LLMs generally do better on real-world problems if I prompt with multiple paragraphs instead of an ambiguous sentence fragment.

LLMs can help build the prompt before answering it.

And my mind works the same way.

nonethewiser · 9 hours ago

>That is the entire point, right? Us having to specify things that we would never specify when talking to a human.

But the question is not clear to a human either. The question is confused.

I read the headline and had no clue it was an LLM prompt. I read it 2 or 3 times and wondered "WTF is this shit?" So if you want an intelligent response from a human, you're going to need to adjust the question as well.

vintermann · 13 hours ago

But it's a question you would never ask a human! In most contexts, humans would say, "you are kidding, right?" or "um, maybe you should get some sleep first, buddy" rather than giving you the rational thinking-exam correct response.

For that matter, if humans were sitting at the rational thinking-exam, a not insignificant number would probably second-guess themselves or otherwise manage to befuddle themselves into thinking that walking is the answer.

ant6n · 13 hours ago

> That is the entire point, right? Us having to specify things that we would never specify when talking to a human.

I am not sure. If somebody asked me that question, I would try to figure out what’s going on there. What’s the trick. Of course I’d respond with asking specifics, but I guess the llvm is taught to be “useful” and try to answer as best as possible.

bluGill · 11 hours ago

Real human in this situation will realize it is a joke after a few seconds of shock that you asked and laugh without asking more. If you really are seriout about the question they laugh harder thinking you are playing stupid for effect.

BoredPositron · 15 hours ago

I would ask you to stop being a dumb ass if you asked me the question...

tsimionescu · 15 hours ago

> My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording. GPT 5.x series models behave a bit more on the spectrum so you need to tell them the specifics.

This makes little sense, even though it sounds superficially convincing. However, why would a language model assume that the car is at the destination when evaluating the difference between walking or driving? Why not mention that, it it was really assuming it?

What seems to me far, far more likely to be happening here is that the phrase "walk or drive for <short distance>" is too strongly associated in the training data with the "walk" response, and the "car wash" part of the question simply can't flip enough weights to matter in the default response. This is also to be expected given that there are likely extremely few similar questions in the training set, since people just don't ask about what mode of transport is better for arriving at a car wash.

This is a clear case of a language model having language model limitations. Once you add more text in the prompt, you reduce the overall weight of the "walk or drive" part of the question, and the other relevant parts of the phrase get to matter more for the response.

jnovek · 12 hours ago

You may be anthropomorphizing the model, here. Models don’t have “assumptions”; the problem is contrived and most likely there haven’t been many conversations on the internet about what to do when the car wash is really close to you (because it’s obvious to us). The training data for this problem is sparse.

PunchyHamster · 14 hours ago

> However, why would a language model assume that the car is at the destination when evaluating the difference between walking or driving? Why not mention that, it it was really assuming it?

Because it assumes it's a genuine question not a trick.

rullelito · 8 hours ago

If we are just speculating here, I believe it can infer that you would not ask this question if the car was at home.

dataflow · 15 hours ago

> My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording.

If the car is already at the car wash then you can't possibly drive it there. So how else could you possibly drive there? Drive a different car to the car wash? And then return with two cars how, exactly? By calling your wife? Driving it back 50m and walking there and driving the other one back 50m?

It's insane and no human would think you're making this proposal. So no, your question isn't underspecified. The model is just stupid.

halJordan · 7 hours ago

What actually insane is what assumptions you allow to be assumed. These non sequitors that no human would ever assume are the point. People love to cherry pick ones that make the model stupid but refuse to allow the ones that make it smart. In compete science we call these scenarios trivially false, and they're treated like the nonsense they are. But if you're trying to push ant anti ai agenda they're the best thing ever

Deleted Comment

cm2187 · 16 hours ago

What is the version used by the free chatgpt now? (https://chatgpt.com/)

> Since the car wash is only 50 meters away (about 55 yards), you should walk.

> Here’s why:

> - It’ll take less than a minute.

> - No fuel wasted.

> - Better for the environment.

> - You avoid the irony of driving your dirty car 50 meters just to wash it.

the last bullet point is amusing, it understands you intend to wash the car you drive but still suggests not bringing it.

hearsathought · 3 hours ago

> it understands you intend to wash the car you drive but still suggests not bringing it.

Doesn't it actually show it doesn't understand anything? It doesn't understand what a car is. It doesn't understand what a car wash is. Fundamentally, it's just parsing text cleverly.

sunaookami · 2 hours ago

Gemini 3 Flash answers tongue-in-cheek with a table of pro & cons where one of the cons of walking is that you are at the car wash but your car is still at your home and recommends to drive it if I don't have an "extremely long brush" or don't want to push it to the car wash. Kinda funny.

deaux · 16 hours ago

By default for this kind of short question it will probably just route to mini, or at least zero thinking. For free users they'll have tuned their "routing" so that it only adds thinking for a very small % of queries, to save money. If any at all.

jibal · 14 hours ago

> You avoid the irony of driving your dirty car 50 meters just to wash it.

The LLM has very much mixed its signals -- there's nothing at all ironic about that. There are cases where it's ironic to drive a car 50 meters just to do X but that definitely isn't one of them. I asked Claude for examples; it struggled with it but eventually came up with "The irony of driving your car 50 meters just to attend a 'walkable neighborhoods' advocacy meeting."

mytailorisrich · 14 hours ago

I think this shows that LLMs do NOT 'understand' anything.

SirMaster · 6 hours ago

This is my biggest peeve when people say that LLMs are as capable as humans or that we have achieved AGI or are close or things like that.

But then when I get a subpar result, they always tell me I'm "prompting wrong". LLMs may be very capable of great human level output, but in my experience leave a LOT to be desired in terms of human level understanding of the question or prompt.

I think rating an LLM vs a human or AGI should include it's ability to understand a prompt like a human or like an averagely generally intelligent system should be able to.

Are there any benchmarks on that? Like how well LLMs do with misleading prompts or sparsely quantified prompts compared to one another?

Because if a good prompt is as important as people say, then the model's ability to understand a prompt or perhaps poor prompt could have a massive impact on its output.

nosuchthing · 28 minutes ago

It's a type of cognitive bias not much different than an addict or indoctrinated cult follower. A subset of them might actually genuinely fear Roko's basilisk the exact same way colonial religion leveraged the fear of eternal damnation in hell as a reason to be subservient to the church leaders.

hyperstitions from TESCREAL https://www.dair-institute.org/tescreal/

raxxorraxor · 12 hours ago

Sonnet 4.5 after thinking/complaining that the question is completely off topic to the current coding session:

Walk! 50 meters is literally a one-minute walk. But wait... I assume you need to get your car to the car wash, right? Unless you're planning to carry buckets of soapy water back and forth, you'll probably need to drive the car there anyway! So the real question is: walk there to check if it's open/available, then walk back to get your car? Or just drive directly? I'd say just drive - the car needs to be there anyway, and you'll save yourself an extra trip. Plus, your freshly washed car can drive you the 50 meters back home in style! (Now, if we were talking about coding best practices for optimizing car wash route algorithms, that would be a different conversation... )

And yes, I like it that verbose even for programming tasks. But regardless of intelligence I think this topic is probably touched by "moral optimization training" which AIs currently are exposed to to not create a shitstorm due to any slightly controversial answer.

mcintyre1994 · 12 hours ago

Heh, is through Claude Code? I have a side project where I'm sometimes using Claude Code installs for chat, and it usually doesn't mind too much. But when I tested the Haiku model it would constantly complain things like "I appreciate the question, but I'm here to help you with coding" :)

toxik · 2 hours ago

I agree, it's a bit of a trick question. It's really hard to imply the car's location without ruining the test though. Here's my attempt, which Claude Opus 4.6 had no problem with:

Alice drives home after a long day at work, exhausted she pulls into her driveway when she realizes she needs to go to a car inspection appointment. She goes into the house to get her paperwork before she leaves. The mechanic is only 100 meters away. How should she get there, walk or drive?

> She should *drive*, since she needs the car at the mechanic’s for the inspection.

Haiku 3.5 and Sonnet 4.5 fail consistently. Opus 4.5 also passes with the correct analysis as above.

cush · 4 hours ago

Chat GPT 5.2 also told me to walk, but only because it made the assumption that because I was even asking the question, I must have multiple cars, and one of my cars was already at the car wash being washed. I clarified and said "I only have one car", and it replied that I obviously need to drive.

--- GPT 5.2

Walk.

Driving 50 m just adds friction: starting a cold engine, waiting, maneuvering, then immediately stopping again — all for a distance that’s ~30–40 seconds on foot. Walking is faster door-to-door, less awkward at the wash entrance, and you avoid lining up a wet car twice.

Only reason to drive that distance: you need to bring the car there anyway (e.g., self-serve bay where the car must be present). Otherwise, walking wins.

--- Me

I only have one car

--- GPT 5.2

Then you obviously drive — the car has to be at the car wash to be washed

toxik · 2 hours ago

The problem is that by specifying this, you have tipped your hand that there is something strange about the plan to walk. I think this is the original problem too: the question implies walking is even an option.

flowerthoughts · 6 hours ago

Just tried Opus 4.6. I think they've patched this particular one, but changing it to

> I want to repair my car. The workshop is 50 meters away. Should I walk or drive?

> Recognized straightforward practical dilemma Simple practical question.

> Walk. It's 50 meters — faster on foot than starting the car, and you'll need to leave the car at the workshop anyway.

Really suggests it assumes the car is already there. (I wanted to avoid the word "car" as part of "car wash" to see if this was a straight attention confusion, but it seems more ingrained than that.)

A follow-up question hints that it doesn't really care about a 50m location ambiguity:

> where is my car now?

> Acknowledged insufficient information to locate vehicle

> I don't have any way to know where your car is physically located — I only know what you've told me in our conversation. Based on what you shared, it sounds like it's at or near your home (since the workshop is 50 meters away and you're deciding how to get there).

> Were you asking something else, or is there something specific about your car's location I can help with?

svara · 16 hours ago

Opus 4.6:

Walk! At 50 meters, you'll get there in under a minute on foot. Driving such a short distance wastes fuel, and you'd spend more time starting the car and parking than actually traveling. Plus, you'll need to be at the car wash anyway to pick up your car once it's done.

GoatInGrey · 6 hours ago

Me:

I want to get my car's transmission fixed, the mechanic shop is 50 meters away. Would it be better to walk or to drive to a location 50 meters away from my front door?

Opus 4.6:

<thinking> The user is asking whether to walk or drive 50 meters to a mechanic shop, but there's a key nuance here - their transmission needs fixing, so driving might not be wise.

Done </thinking>

Walk, absolutely. Driving a car with a compromised transmission even 50 meters risks making the problem worse and turning a repair into a more expensive one. Plus, 50 meters is barely a minute on foot.

crimsonnoodle58 · 16 hours ago

That's not what I got.

Opus 4.6 (not Extended Thinking):

Drive. You'll need the car at the car wash.

viking123 · 16 hours ago

Lmao, and this is what they are saying will be an AGI in 6 months?

stingraycharles · 16 hours ago

That’s without reasoning I presume?

Lapel2742 · 16 hours ago

GLM 5.0:

> You should *drive*.

> Here is why:

> If you walk to the car wash, you will arrive there empty-handed. Since your car is still at home, you won't have anything to wash.

> While driving 50 meters is a very short trip (and technically not great for a cold engine), it is the only way to get the car to the car wash to complete your goal.

Kimi K2.5:

> You should drive, but with an important caveat.

> Since your goal is to wash your car, you must bring the vehicle to the car wash. Walking there without the car does not advance your goal (unless you are simply checking availability or buying tokens first).

> However, driving only 50 meters is bad for your car:

> ...

> Better options:

> Wash at home: Since the car wash is only 50 meters away, you likely have access to water at home. Hand-washing in your driveway avoids the cold-start issue entirely.

> ...

Current models seem to be fine answering that question.

Retric · 15 hours ago

> seem to be fine

Now repeat the question to the same model in different contexts several times and count what percentage of the time it’s correct.

sathish316 · 9 hours ago

I can feel the AGI on this one :)

I ran extensive tests on this and variations on multiple models. Most models interpret 50 m as a short distance and struggle with spatial reasoning. Only Gemini and Grok correctly inferred that you would need to bring your car to get it washed in their thought stream, and incorporated that into the final answer. GPT-5.2 and Kimi K2.5 and even Opus 4.6 failed in my tests - https://x.com/sathish316/status/2023087797654208896?s=46

What surprised me was how introducing a simple, seemingly unrelated context - such as comparing a 500 m distance to the car wash to a 1 km workout - confused nearly all the models. Only Gemini Pro passed my second test after I added this extra irrelevant context - https://x.com/sathish316/status/2023073792537538797?s=46

Most real-world problems are messy and won’t have the exact clean context that these models are expecting. I’m not sure how the major AI labs assume most real-world problems are simpler than the constraints exposed by this example like prerequisites, ordering, and contextual reasoning, which are already posing challenges to these bigger models.

K0balt · 9 hours ago

To be fair, we all have holes in our reasoning if we don’t carefully consider things and sometimes they are very surprising when they come to light. The dependency issue (need the car at the car wash) is an easy one that often trips up people at first glance too. (Left my phone at work, plan: take an uber to get to the office, walk to the couch and remember I don’t have my phone to call an uber)

Things like that are notorious points of failure in human reasoning. It’s not surprising that machines based on human behavior exhibit that trait as well, it would be surprising if they didn’t.

jansan · 8 hours ago

> I can feel the AGI on this one :)

This was probably meant in a sarcastic way, but isn't it impressive how you cannot push Gemini off track? I tried another prompt with claiming that one of my cups does not work, because it is closed at the top and open at the bottom, and it kind of played with me, giving me a funny technical explanation on how to solve that problem and finally asking me if that was a trick question.

In this case I can feel the AGI indeed.

yuters · 9 hours ago

GPT 5.2 when asked "How will the car get washed if I don't drive it there?" grasped the reasoning, and then immediately forgot it.

Fair point. Drive it there, obviously.

Revised recommendation: drive 50 meters, wash it, then reward yourself by walking home like an eco hero.

baobabKoodaa · 2 hours ago

This strikes me as the kind of logical error that you might get by lobo.. cough aligning the model to be more environmentally conscious. So now it tries to shoehorn some environmental talking points in there at the cost of losing its whole track of thought.

kmac_ · 6 hours ago

Lol, this is so good. And quite interesting, as it might mean that there's no "world model" in LLM models, or they have such gaps.

crazygringo · 9 hours ago

That's absolutely hilarious. It came so close.

aleph_minus_one · 9 hours ago

> Revised recommendation: drive 50 meters, wash it, then reward yourself by walking home like an eco hero.

... which likely makes no sense: should you leave the car at the area of the car wash?

oatmeal1 · 6 hours ago

If the model assumed the car to be cleaned was already at the car wash, it should identify the reason to walk is that if you drive, the extra car you take to get to the car wash will now be stuck there requiring redundant walking.

gurjeet · 7 hours ago

I tried your exact prompt on my ChatGPT 5.2 Thinking. It gave me the opposite response. This proves that even an exact replication of the prompt is not a guarantee that the response will be on the right side of a binary chose. Then, in a new chat, I set the thinking mode to 'Extended thinking' (from the default 'Standard'). I got the expected response. But since the first conversation showed that the exact prompt is not a guarantee of the same answer, I tried again and on 3rd attempt I got the 'Extended thinking' mode to respond with the 'Walk' response.

(Prompt)

> I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive?

(Thinking: Standard)

> Walk.

> 50 meters is nothing by car, and driving adds hassle (starting the car, maneuvering/parking, and you’ll still have to walk around the car to wash it). The only times driving makes sense are:

> You’re planning to use an automated car wash where the machine does everything and you need the car positioned inside.

> You need to bring heavy stuff you can’t comfortably carry (e.g., pressure washer, big buckets, vacuum, bulky detailing kit).

> It’s unsafe/unpleasant to walk that distance right now (ice, heavy rain, traffic, personal safety).

> If you’re washing it yourself at the wash bay: walk there with your supplies, or do two quick trips. If the wash requires the car: drive it into the bay, but otherwise walking wins.

(Thinking: Extended thinking)

> Drive.

> If the goal is to wash your car at the car wash, the car has to be there—so you’ll need to drive it the 50 meters, pull into a bay/queue, and wash it.

> Only choose “walk” if you actually mean “go pick up supplies” and then wash at home (hose/buckets/mobile detail), not at the car wash.

(Thinking: Extended thinking; 3rd attempt)

> Walk.

> 50 meters is basically across a parking lot, and driving adds hassle (start the car, maneuver in/out, find a spot) for almost no benefit.

> Only drive if either:

> + you need the car itself to go through an automatic wash bay, or

> + you’re carrying bulky stuff (bucket + heavy pressure washer, big water jugs, vacuum, etc.).

> Otherwise: grab your keys/phone, walk over, and you’re there in ~1 minute.

ActorNightly · 4 hours ago

Its funny to me that when AI first started, everyone in the field was like "no explicit coding, train ML network to do everything!"

And now we are back to square one where we train a network, and then use carefully tailored (i.e explicitly coded RL for answers) to make these things more "intelligent".

I wonder if anyone is actually working on not doing this.

coldtea · 15 hours ago

>And indeed, Sonnet and Opus 4.5 (medium reasoning) say the following:
Sonnet: Drive - you need to bring your car to the car wash to get it washed!
Opus: You'll need to drive — you have to bring the car to the car wash to get it washed!
Gemini 3 Pro (medium): You should drive.

On their own, or as a special case added after this blew up on the net?

ChrisMarshallNY · 12 hours ago

“My Tesla is low on gas, the gas station is a mile away. Should I risk driving there, or walk with a gas can?”

ChatGPT actually caught it. Maybe if I was fuzzier about the model…

tlogan · 10 hours ago

Gemini pro medium is failing this:

I want to wash my car. The car wash is 50 meters from here. Should I walk or drive? Keep in mind that I am a little overweight and sedentary.

But amazingly chatgpt is telling me to drive.

Anyway, this just shows how they just patched this because the tiktok video with this went viral. These systems are LLMs and all these logic steps are still just LLM steps.

anentropic · 10 hours ago

Also the answers are non-deterministic

sarthakjain · 9 hours ago

These days LLMs refuse to back track and use their intelligence to justify their answers.