What is more or less a natural evolution of LLMs... The thing is, where are my benefits as a developer?
If for instance CoPilot charges 1 Premium request for Claude and 1 Premium request for GPT-5, despite that GPT-5 is (with resource usage), supposed to be on a level of GPT 4.1 (a free model). Then (from my point of view) there is no gain.
So far from coding point of view, Claude does coding (often) still better. I made the comparison that Claude feels like a Senior dev, with years of experience, where GPT 5 feels like a academic professor, that is too focus on analytic presentation.
So while its nice to see more competition in the market, i still rank (with Copilot):
Claude > Gemini > GPT5 ... big gap ... GPT4.1 (beast mode) > GPT 4.1
LLM's are following the same progression these days like GPUs, or CPU ... Big jumps at first, then things slow down, you get more power efficiency but only marginal jumps on improvements.
Where we will see benefits, is specialized LLMs, for instance, Anthropic doing a good job for creating a programmer focused LLM. But even those gates are starting to get challenged by Chinese (open source) models, step by step.
GPT5 simply follows a trend. And within a few months, Anthropic will release something probably not much of a improvement over 4.0 but cheaper. Probably better with tool usage. And then comes GPT5.1, 6 months later, and ...
GPT-5.0 in my opinion, for a company with the funding that openAI has, needed to be beat the competition with much more impact.
For example, I want the model to be able to take a basic rule and identify what subset of given text fits into the rule. (E.g. find and extract all last names) 4o and 4.1 we're decent, but still left a lot to be desired. o4-mini was pretty good at not ambiguous cases. Getting a model that runs cheaper and is better at following instructions makes my product better and more profitable with a could lines of code change.
It's not emotionally revolutionary, but it hours a great sweet spot for a lot of business use cases
200k lines of code is a failure state. At this point you have lost control and can only make changes to the codebase through immense effort, and not at a tolerable pace.
Agentic code writers are good at giving you this size of mess and at helping to shovel stuff around to make changes that are hard for humans due to the unusable state of the codebase.
If overgrown barely manageble codebases are all a person's ever known and they think it's normal that changes are hard and time-consuming and needing reams of code, I understand that they believe AI agents are useful as code writers. I think they do not have the foundation to tell mediocre from good code.
I am extremely aware of the judgemental hubris of this comment. I'd not normally huff my own farts in public this obnoxiously, but I honestly feel it is useful for the "AI hater vs AI sucker" discussion to be honest about this type of emotion.
Each integration is hopefully only a few thousand lines of code, but if you have 50 integrations you can easily break 100k loc just dealing with those. They just need to be encapsulated well so that the integration cruft is isolated from the core business logic, and they become relatively simple to reason about