When I am using LLMs, I know exactly what the code should be and just am using it as a way to produce it faster (my Cursor rules are extremely extensive and focused on my personal architecture and code style, and I share them across all my personal projects), rather than producing a whole feature. When I try and use just the agent in Cursor, it always needs significant modifications and reorganization to meet my standards, even with the extensive rules I have set up.
Cursor appeals to me because those QOL features don't take away the actual code writing part, but instead augment it and get rid of some of the tedium.
- it's a GREAT oneshot coding model (in the pod we find out that they specifically finetuned for oneshotting OAI SWE tasks, eg prioritized over being multiturn)
- however comparatively let down by poorer integrations (eg no built in browser, not great github integration - as TFA notes "The current workflow wants to open a fresh pull request for every iteration, which means pushing follow-up commits to an existing branch is awkward at best." - yeah this sucks ass)
fortunately the integrations will only improve over time. i think the finding that you can do 60 concurrent Codex instances per hour is qualitatively different than Devin (5 concurrent) and Cursor (1 before the new "background agents").
btw
> I haven't yet noticed a marked difference in the performance of the Codex model, which OpenAI explains is a descendant of GPT-3 and is proficient in more than 12 programming languages.
incorrect, its an o3 finetune.
This is Open AI's fault (and literally every AI company is guilty of the same horrid naming schemes). Codex was an old model based on GPT-3, but then they reused the same name for both their Codex CLI and this Codex tool...
I mean, just look at the updates to their own blog post, I can see why people are confused.
https://openai.com/index/openai-codex/
Edit:
Google just did it too. "Gemini Ultra" is both a model (https://deepmind.google/models/gemini/ultra/) and their new top-tier subscription plan (a la Open AI's Pro plan). Why is this so difficult?