GPT-4 launched with 8k context. It hallucinated regularly. It was slow. One-shotting code was unheard of, you had to iterate and iterate. It fell over even doing basic math problems.
GPT-5 thinking on the other hand is so capable that the average person wouldn't be able to really test it's abilities. It's really only experts operating in their domain who can find it's stumbling blocks.
I think because we have seen these constant incremental updates that it creates a staircase with small steps, but if you really reflect and look back, you'll see the actual capability gap from 3.5 to 4 compared to 4 to 5 is way way smaller. This is echoed in benchmarks too, GPT-5 is solving problems so wildly beyond what GPT-4 was capable of.
When I watch the work of coworkers or friends who have gone these rabbit holes of customization I always learn some interesting new tools to use - lately I've added atuin, fzf, and a few others to my linux install