My strange observation is that Gemini 2.5 Pro is maybe the best model overall for many use cases, but starting from the first chat. In other words, if it has all the context it needs and produces one output, it's excellent. The longer a chat goes, it gets worse very quickly. Which is strange because it has a much longer context window than other models. I have found a good way to use it is to drop the entire huge context of a while project (200k-ish tokens) into the chat window and ask one well formed question, then kill the chat.
This has been the same for every single LLM I've used, ever, they're all terrible at that.
So terrible that I've stopped going beyond two messages in total. If it doesn't get it right at the first try, its more and more unlikely to get it right for every message you add.
Better to always start fresh, iterate on the initial prompt instead.