If you've built your value on promising imminent AGI then this sort of thing is purely a distraction, and you wouldn't even be considering it... unless you knew you weren't about to shortly offer AGI.
Right now OAI's synthetic data pipeline is very heavily weighted to 1-on-1 conversations.
But models are being deployed into multi-user spaces that OAI doesn't have access to.
If you look at where their products are headed right now, this is very much the right move.
Expect it to be TikTok style media formats.
And as unbelievable as you may think the title to be, I can pretty much guarantee you'll find it much more believable by the end of the post.
Evans definitely had issues with how he went about things and his analysis. For example, the "snake goddess" is holding snakes remarkably similar to wooden snake props found in Egypt 300 years earlier.
But this article is pretty damn empty of actual substance.
Do we really have to choose between wave and particle? What does the "particle" model bring to the table that a localized (wavelength-sized) wave/vibration could not?
But in order to track state changes from free agents, when you get close to that geometry the engine converts it to discrete units.
This duality of continuous foundation becoming discrete units around the point of observation/interaction is not the result of dueling models, but a unified system.
I sometimes wonder if we'd struggle with interpreting QM the same way if there wasn't a paradigm blindness with the interpretations all predating the advances in models in information systems.
Most changes occur with a quick back and forth about top level choices in chat.
Followed with me grabbing appropriate interfaces and files for context so Sonnet doesn't hallucinate API, and then code that I'll glance over and around half the time suggest one or more further changes.
It's been successful enough I'm currently thinking of how to adjust best practices to make things even smoother for that workflow, like better aggregating package interfaces into a single file for context, as well as some notes around encouraging more verbose commenting in a file I can provide as context as well on each generation.
Human-centric best practices aren't always the best fit, and it's finally good enough to start rethinking those for myself.
But making up fake minority stereotype bingo cards may have been the worst idea I've ever seen in AI to date.
But there is more: a key thing with LLMs is that their ability to help, as a tool, changes vastly based on your communication ability. The prompt is the king to make those models 10x better than they are with the lazy one-liner question. Drop your files in the context window; ask very precise questions explaining the background. They work great to explore what is at the borders of your knowledge. They are also great at doing boring tasks for which you can provide perfect guidance (but that still would take you hours). The best LLMs (in my case just Claude Sonnet 3.5, I must admit) out there are able to accelerate you.
Using a few messages to get them out of "I aim to be direct" AI assistant mode gets much better overall results for the rest of the chat.
Haiku is actually incredibly good at high level systems thinking. Somehow when they moved to a smaller model the "human-like" parts fell away but the logical parts remained at a similar level.
Like if you were taking meeting notes from a business strategy meeting and wanted insights, use Haiku over Sonnet, and thank me later.
That is an unreasonable assumption. In case of LLMs it seems wasteful to transform a point from latent space into a random token and lose information. In fact, I think in near future it will be the norm for MLLMs to "think" and "reason" without outputting a single "word".
> Whether AI reasoning is “real” reasoning or just a mirage can be an interesting question, but it is primarily a philosophical question. It depends on having a clear definition of what “real” reasoning is, exactly.
It is not a "philosophical" (by which the author probably meant "practically inconsequential") question. If the whole reasoning business is just rationalization of pre-computed answers or simply a means to do some computations because every token provides only a fixed amount of computation to update the model's state, then it doesn't make much sense to focus on improving the quality of chain-of-thought output from human POV.
If Othello-GPT can build a board in latent space given just the moves, can an exponentially larger transformer build a reasoner in their latent space given a significant number of traces?