This format, or similar formats, seem to be the standard now, I was just reading the "Lessons from Building Manus"[1] post and they discuss the Hermes Format[2] which seems similar in terms of being pseudo-xml.
My initial thought was how hacky the whole thing feels, but then the fact that it works and gives rise to complex behaviour (like coercing specific tool selection in the Manus post) is quite simple and elegant.
Also as an aside, it is good that it appears that each standard tag is a single token in the OpenAI repo.
Prediction: GPT-5 will use a consortium of models for parallel reasoning, possibly including their oss versions. Each using different 'channels' from the harmony spec.
I have a branch of llm-consortium where I was noodling with giving each member model a role. Only problem is it's expensive to evaluate these ideas so I put it on hold. But maybe now with oss models being cheap I can try and it on those.
I tested a consortium of qwens on the brainfuck test and it solved it, while the single models fail.
MOEs are a single model. An 'expert' is a subset of layers chosen by a router model for each token. This makes them run faster. A consortium is a type of parallel reasoning that uses multiple of the same or different models to generate parallel response and find the best one.
All models have a jagged frontier with weird skill gaps. A consortium can bridge those gaps and increase performance on the frontier.
Yesterday I gave a presentation on the role of harmony in AI — as a matter of philosophical interest. I’d previously written a large literature review on the concept of harmony (here: https://www.sciencedirect.com/science/article/pii/S240587262...). If you are curious about the slides, here: Bit.ly/ozora2025
I assume they are using the concept of harmony to refer to the consistent response format? Or is it their intention for an open weights release?
> The format enables the model to output to multiple different channels for chain of thought, and tool calling preambles along with regular responses
That's pretty cool and seems like a logical next step to structure AI outputs. We started out with a stream of plaintext. In the future perhaps we'll have complex typed output.
Humans also emit many channels of information simutaneously. Our speech, tone of voice, body language, our appearance - it all has an impact on how our information is received by another.
Same here - all those links are either broken or asking for auth. Classic case of announcing something before the infrastructure is ready.
This kind of coordination failure is surprisingly common with AI releases lately. Remember when everyone was trying to access GPT-4 on launch day? Or when Anthropic's Claude had those random outages during their big announcements?
Makes you wonder if they're rushing to counter Google's Genie 3 news and got caught with their pants down during the GitHub outage. The timing seems too coincidental.
At least when it does go live, having truly open weights models will be huge for the community. Just wish they'd test their deployment pipeline before hitting 'publish' on the blog post.
My initial thought was how hacky the whole thing feels, but then the fact that it works and gives rise to complex behaviour (like coercing specific tool selection in the Manus post) is quite simple and elegant.
Also as an aside, it is good that it appears that each standard tag is a single token in the OpenAI repo.
[1] https://manus.im/blog/Context-Engineering-for-AI-Agents-Less... [2] https://github.com/NousResearch/Hermes-Function-Calling
I have a branch of llm-consortium where I was noodling with giving each member model a role. Only problem is it's expensive to evaluate these ideas so I put it on hold. But maybe now with oss models being cheap I can try and it on those.
Pardon me but are you thinking that this method is superior than mixture of experts? What are your thoughts?
MOEs are a single model. An 'expert' is a subset of layers chosen by a router model for each token. This makes them run faster. A consortium is a type of parallel reasoning that uses multiple of the same or different models to generate parallel response and find the best one.
All models have a jagged frontier with weird skill gaps. A consortium can bridge those gaps and increase performance on the frontier.
I wish someone would extract the Grok Heavy prompts to confirm, but I guess those jailbreakers don't have the $200 sub.
I assume they are using the concept of harmony to refer to the consistent response format? Or is it their intention for an open weights release?
That's pretty cool and seems like a logical next step to structure AI outputs. We started out with a stream of plaintext. In the future perhaps we'll have complex typed output.
Humans also emit many channels of information simutaneously. Our speech, tone of voice, body language, our appearance - it all has an impact on how our information is received by another.
- https://openai.com/index/introducing-gpt-oss/
- https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7...
Deleted Comment
This kind of coordination failure is surprisingly common with AI releases lately. Remember when everyone was trying to access GPT-4 on launch day? Or when Anthropic's Claude had those random outages during their big announcements?
Makes you wonder if they're rushing to counter Google's Genie 3 news and got caught with their pants down during the GitHub outage. The timing seems too coincidental.
At least when it does go live, having truly open weights models will be huge for the community. Just wish they'd test their deployment pipeline before hitting 'publish' on the blog post.
https://www.bleepingcomputer.com/news/artificial-intelligenc...