There's basically no reason to run other open source models now that these are available, at least for non-multimodal tasks.
Example: Partition a linked list in linear time. None of these models seems to be able to get, that `reverse` or converting the whole list to a vector are in themselves linear operations and therefore forbid themselves. When you tell them to not use those, they still continue to do so and blatantly claim, that they are not using them. Á la:
"You are right, ... . The following code avoids using `reverse`, ... :
[code that still uses reverse]"
And in languages like Python they will cheat, because Python's list is more like an array, where random access is O(1).
This means they only work well, when you are doing something quite mainstream, where the amount of training data is a significantly strong signal in the noise. But even there they often struggle. For example I found them somewhat useful for doing Django things, but just as often they gave bullshit code, or it took a lot of back and forth to get something useful out of them.
I think it is embarrassing, that with sooo much training data, they are still unable to do much more than going by frequency in training data when suggesting "solutions". They are "learning" differently than a human being. When a human being sees a new concept, they can often apply that new concept, even if that concept does not happen to be needed that often, as long as they remember the concept. But in these LLMs it seems they deem everything that isn't mainstream irrelevant.
Having to tack on top of that 2-4h of work per day is not normal, and again, it's probably unhealthy.
https://winbuzzer.com/2025/01/29/alibabas-new-qwen-2-5-max-m...
Alibaba is not a company whose culture is conducive to earnest acknowledgement that they are behind SOTA.
Or you can rent a newer one for $300/mo on the cloud
e: They did announce smaller variants will be released.
"Today, we're announcing Qwen3-Coder, our most agentic code model to date. Qwen3-Coder is available in multiple sizes, but we're excited to introduce its most powerful variant first: Qwen3-Coder-480B-A35B-Instruct."