Readit News logoReadit News
hitradostava commented on GPT5 is worse than 4.1-mini for text and worse than Sonnet 4 for coding    · Posted by u/hitradostava
canerdogan · 19 days ago
GPT-5 isn’t really a brand-new model in the way people think. From what I’ve seen, the goal was more about reducing costs and unifying the interface than releasing a totally different architecture. Under the hood it is still routing to models we already know, just picking what it thinks will give the “best” result for the request.

That can be fine for a lot of general use cases, but if you’re working in specific domains like coding agents or high-precision summarization, that routing can actually make results worse compared to sticking with a model you know performs well for your workload.

hitradostava · 19 days ago
Thats not what OpenAI are claiming. They are claiming that there are two new flagship models and a router that routes between them.

"GPT‑5 is a unified system with a smart, efficient model that answers most questions, a deeper reasoning model (GPT‑5 thinking) for harder problems, and a real‑time router that quickly decides which to use"

hitradostava commented on GPT5 is worse than 4.1-mini for text and worse than Sonnet 4 for coding    · Posted by u/hitradostava
8thcross · 19 days ago
I tried it with cursor-agent, their cli - and it generated better code than expected. YMMV. It was more thoughtful and strategic than the other frontier models.
hitradostava · 19 days ago
Planning was ok for me, much slower than Sonnet, but comparable. But some of the code it produces is just terrible. Maybe the routing layer sends some code-generation tasks to a much smaller model- but then I don't get why it's so slow!

The only thing that seems better to me is the parallel tool calling.

hitradostava commented on GPT5 is worse than 4.1-mini for text and worse than Sonnet 4 for coding    · Posted by u/hitradostava
softwaredoug · 19 days ago
I feel like they should have let GPT 5 overlap in experimental mode for a month or so. It took a while to get the kinks out of GPT-4 until people trusted it. Just switching it on is really hurting their brand.

The fact they didn’t do this makes me think their finances are in very bad shape.

hitradostava · 19 days ago
I agree, I just don't understand how the team at Cursor can say this:

“GPT-5 is the smartest coding model we've used. Our team has found GPT-5 to be remarkably intelligent, easy to steer, and even to have a personality we haven’t seen in any other model. It not only catches tricky, deeply-hidden bugs but can also run long, multi-turn background agents to see complex tasks through to the finish—the kinds of problems that used to leave other models stuck. It’s become our daily driver for everything from scoping and planning PRs to completing end-to-end builds.”

The cynic in me thinks that Cursor had to give positive PR in order to secure better pricing...

hitradostava commented on GPT5 is worse than 4.1-mini for text and worse than Sonnet 4 for coding    · Posted by u/hitradostava
cranberryturkey · 19 days ago
it solved a huge bug i've been struggling with.
hitradostava · 19 days ago
Had Sonnet 4 not been able to?
hitradostava commented on IronCalc – Open-Source Spreadsheet Engine   ironcalc.com/... · Posted by u/kaathewise
nhatcher · 10 months ago
Hey! This is my project! Amazed to see this here. I'll try to answer questions people might have
hitradostava · 10 months ago
Amazing project. The question I have is why rust? Is the compiled WASM significantly faster than JS?
hitradostava commented on GitHub cuts AI deals with Google, Anthropic   bloomberg.com/news/articl... · Posted by u/jbredeche
RheingoldRiver · 10 months ago
> Using LLM based tools effectively requires a change in workflow that a lot of people aren't ready to try

This is a REALLY good summary of it I think. If you lose your patience with people, you'll lose your patience with AI tooling, because AI interaction is fundamentally so similar to interacting with other people

hitradostava · 10 months ago
Exactly, and LLM based tools can be very frustrating right now - but if you view the tooling as a very fast junior developer with very broad but shallow knowledge then you can develop a workflow which for many (but not all) tasks is much much faster writing code by hand.
hitradostava commented on GitHub cuts AI deals with Google, Anthropic   bloomberg.com/news/articl... · Posted by u/jbredeche
geysersam · 10 months ago
I'll take a stab at changing your mind.

AIs are not able to write Redis. That's not their job. AIs should not write complex high performance code that millions of users rely on. If the code does something valuable for a large number of people you can afford humans to write it.

AIs should write low value code that just repeats what's been done before but with some variations. Generic parts of CRUD apps, some fraction of typical frontends, common CI setups. That's what they're good at because they've seen it a million times already. That category constitutes most code written.

This relieves human developers of ballpark 20% of their workload and that's already worth a lot of money.

hitradostava · 10 months ago
In a couple of years time I don't see why AI based tooling couldn't write Redis? Would you get a complete Redis produced with a single prompt? Of course not. but if extreme speed is what you want to optimize for, then the tooling needs to be given the right feedback loop to optimize for that.

I think the question to ask is what do I do as a software engineer that couldn't be done by an AI based tool in a few years time? The answer is scary, but exciting.

hitradostava commented on GitHub cuts AI deals with Google, Anthropic   bloomberg.com/news/articl... · Posted by u/jbredeche
ianbutler · 10 months ago
I'm actually very curious why AI use is such a bi-modal experience. I've used AI to move multi thousand line codebases between languages. I've created new apps from scratch with it.

My theory is the willingness to baby sit and the modality. I'm perfectly fine telling the tool I use its errors and working side by side with it like it was another person. At the end of the day it can belt out lines of code faster than I, or any human, can and I can review code very quickly so the overall productivity boost has been great.

It does fundamentally alter my workflow. I'm very hands off keyboard when I'm working with AI in a way that is much more like working with someone or coaching someone to make something instead of doing the making myself. Which I'm fine with but recognize many developers aren't.

I use AI autocomplete 0% of the time as I found that workflow was not as effective as me just writing code, but most of my most successful work using AI is a chat dialogue where I'm letting it build large swaths of the project a file or parts of a file at a time, with me reviewing and coaching.

hitradostava · 10 months ago
I agree with you and its confusing to me. I do think there is a lot of emotion at play here - rather than cold rationality.

Using LLM based tools effectively requires a change in workflow that a lot of people aren't ready to try. Everyone can share their anecdote of how an LLM has produced stupid or buggy code, but there is way too much focus on what we are now, rather than the direction of travel.

I think existing models are already sufficient, its just we need to improve the feedback loop. A lot of the corrections / direction I make to LLM produced code could 100% be done by a better LLM agent. In the next year I can imagine tooling that: - lets me interact fully via voice - a separate "architecture" agent ensures that any produced code is in line with the patterns in a particular repo - compile and runtime errors are automatically fed back in and automatically fixed - a refactoring workflow mode, where the aim is to first get tests written, then get the code working, and then get the code efficient, clean and with repo patterns

I'm excited by this direction of travel, but I do think it will fundamentally change software engineering in a way that is scary.

u/hitradostava

KarmaCake day64August 9, 2023
About
Lead dev at addmaple.com
View Original