Is the ocean to your left or to you right?
I asked this question to multiple LLM.
ChatGPT: Wrong but reasoned itself back to being correct.
Gemini: Correct.
Grok: Using expert it got the right answer after 35s.
Claude Sonnet 4.6: Confidently incorrect.
Screenshots: https://imgur.com/a/7pmcoWr
This is one of those questions that could have multiple answers, or require follow up questions, depending on how pedantic the asker wants to be.
Trick question, the island was in a lake, you’re nowhere near the ocean.
Trick question, it’s a small island and the ocean is all around you, not just on the left. How big must an island before this isn’t true? Is it a line of sight question?
But no one thinks like that.
After testing whis, what strikes me is how stubborn the LLMs are about being wrong. Is that a more important takeaway: that LLMs seem to back down less even when clearly wrong?
Correct answer with Sonnet 4.6, but this might as well be a coin flip. I've found Sonnet 4.6 to be substantially dumber than 4.5. I'd rate Sonnet 4.5 a 10/10 at creative writing and 4.6 a 3/10.
My ChatGPT just expired and I was about to get Claude instead, but I'm starting to rethink this.