Deleted Comment
I don’t see how the answer can be both.
Three examples for you: - our policy agent extracts all coverage limits and policy details into a data ontology. This saves 10-20 mins per policy. It is more accurate and consistent than our humans - our email drafting agent will pull all relevant context on an account whenever an email comes in. It will draft a reply or an email to someone else based on context and workflow. Over half of our emails are now sent without meaningfully modifying the draft, up from 20% two months ago. Hundreds of hours saved per week, now spent on more valuable work for clients. - our certificates agent will note when a certificate of insurance is requested over email and automatically handle the necessary checks and follow up options or resolution. Will likely save us around $500k this year.
We also now increasingly share prototypes as a way to discuss ideas. Because the cost to vibe code something illustrative is very low, an it’s often much higher fidelity to have the conversation with something visual than a written document
When I watch the work of coworkers or friends who have gone these rabbit holes of customization I always learn some interesting new tools to use - lately I've added atuin, fzf, and a few others to my linux install
GPT-4 launched with 8k context. It hallucinated regularly. It was slow. One-shotting code was unheard of, you had to iterate and iterate. It fell over even doing basic math problems.
GPT-5 thinking on the other hand is so capable that the average person wouldn't be able to really test it's abilities. It's really only experts operating in their domain who can find it's stumbling blocks.
I think because we have seen these constant incremental updates that it creates a staircase with small steps, but if you really reflect and look back, you'll see the actual capability gap from 3.5 to 4 compared to 4 to 5 is way way smaller. This is echoed in benchmarks too, GPT-5 is solving problems so wildly beyond what GPT-4 was capable of.
I suppose you could bake the limits into each service at deploy time, but that's still a lot of code to write to provide a good experience to a customer who is trying to not pay you money.
Not saying this is a good thing, but this feels about right to me.
Bravo, quaternion is the (only) way to go, the sooner UAV/UAS system designer realize this the better.