I think this is one of the many indicators that even though these models get “version upgrades” it’s closer to switching to a different brain that may or may not understand or process things the way you like. Without a clear jump in performance, people test new models and move back to ones they know work if the new ones aren’t better or are actually worse.
Interesting to use a term like brain in the context of LLMs.