While the sub-modules differ (MHA vs GQA, SwiGLU vs GeLU, Mixture-of-Depths, etc.), the core signal propagation in Llama, Gemini, and Claude relies on that additive residual stream.
My point here is that DeepSeek's mHC challenges that fundamental additive assumption by introducing learnable weighted scaling factors to the residual path itself.
Two key takeaways from the reproduction:
Unconstrained Hyper-Connections really do explode (7x amplification even at 10M scale).
I hit a nasty "stream persistence" bug where my tensors were the right shape, but the architecture was functionally broken.
This is Part 1 (10M scale). Part 2 (scaling to 1B on A100s) is coming later this week. Happy to answer questions about the implementation.
Building a commercial product means you pay money (or something they equally value) to people to do your bidding. You don't have to worry about politics, licensing, and all the usual FOSS-related drama. You pay them to set their opinions aside and build what you want, not what they want (and if that doesn't work, it just means you need to offer more money).
In this case it's a company that believes they can make a "good" package manager they can sell/monetize somehow and so built that "good" package manager. Turns out it's at least good enough that other people now like it too.
This would never work in a FOSS world because the project will be stuck in endless planning as everyone will have an opinion on how it should be done and nothing will actually get done.
Similar story with systemd - all the bitching you hear about it (to this day!) is the stuff that would've happened during its development phase had it been developed as a typical FOSS project and ultimately made it go nowhere - but instead it's one guy that just did what he wanted and shared it with the world, and enough other people liked it and started building upon it.
Rust's development used a structured, community RFC process—endless planning by your definition. The result was a famously well-designed toolchain that the entire community praises. FOSS didn't hold it back; it made it good.
So no, commercial backing isn't the only way to ship something good. FOSS is more than capable to ship great software when done right.
[1] https://news.ycombinator.com/item?id=46787165
[1]: https://openai.com/index/openai-appoints-retired-us-army-gen...