We just tested magistral-medium as a replacement for o4-mini in a user-facing feature that relies on JSON generation, where speed is critical.
Depending on the complexity of the JSON, o4-mini runs ranged from 50 to 70 seconds. In our initial tests, Mistral returned results in 34–37 seconds. The output quality was slightly lower but still remain acceptable for us.
We’ll continue testing, but the early results are promising. I'm glad to see Mistral prioritizing speed over raw power, there’s definitely a need for that.
I was recently working on a user facing feature using self-hosted Gemma 27b with VLLM and was getting fully formed JSON results in ~7 seconds (even that I would like to optimize further) - obviously the size of the JSON is important but I’d never use a reasoning model for this because they’re constantly circling and just wasting compute.
I haven’t really found a super convincing use-case for reasoning models yet, other than a chat style interface or an assistant to bounce ideas off of.