Their new pro model seemed to just trade off fluid intelligence and creativity for performance on closed-end coding tasks (and hence benchmarks), which unfortunately seems to be a general pattern for LLM development now.
Their new pro model seemed to just trade off fluid intelligence and creativity for performance on closed-end coding tasks (and hence benchmarks), which unfortunately seems to be a general pattern for LLM development now.
We're really getting close to the point where local models are good enough to handle practically every task that most people need to get done.
Deleted Comment
Deleted Comment
RL is more data-efficient but that may not be relevant now that we can just use Deepseek-R1's responses as the training data.
There are a lot of use cases in business were what's needed is just some basic reasonable-ish forecast. I actually think this new model is really neat because it completely dispenses with the pretense that we're doing some really serious and methodologically-backed thing, and we're really just looking a basic curve fit that seems pretty reasonable with human intuition.
For those interested in transformers with time series, I recommend reading this paper: https://arxiv.org/pdf/2205.13504. There is also plenty of other research showing that transformers-based time series models generally underperform much simpler alternatives like boosted trees.
After looking further it seems like this startup is both trying to publish academic research promoting these models as well as selling it to businesses, which seems like a conflict of interest to me.