[1] https://huggingface.co/papers/2402.01030
[2] https://huggingface.co/papers/2401.00812
[3] https://huggingface.co/papers/2411.01747
I am working on a model that goes a step beyond and even makes the distinction between thinking and code execution unnecessary (it is all computation in the end), unfortunately no link to share yet
I use Claude Code with Opus, and had same experience - was pushing it hard to implement complex test, and it gave me an empty test function with test plan inside in a comment (lol).
I do want to try Gemini 2.5 Pro, but I don't know a tool which would make experience compatible to Claude Code. Would it make sense to use with Cursor? Do they try to limit context?