Loading comment...
Loading parent story...
Loading comment...
Thank you for sharing on HN!
---
EDIT: The hierarchical composition and routing aspects of this work vaguely remind me of https://github.com/glassroom/heinsen_routing/ but it has been a while since I played with that. UPDATE: After spending a bit more time on the OP, it's different, but the ideas are related, like routing based on similarity.
> 1. H-Nets scale better with data than state-of-the-art Transformers with BPE tokenization, while learning directly from raw bytes. This improved scaling is even more pronounced on domains without natural tokenization boundaries, like Chinese, code, and DNA.
> 2. H-Nets can be stacked together to learn from deeper hierarchies, which further improves performance.
> 3. H-Nets are significantly more robust to small perturbations in input data like casing, showing an avenue for creating models that are more robust and aligned with human reasoning.
> 1. H-Nets scale better with data than state-of-the-art Transformers with BPE tokenization, while learning directly from raw bytes. This improved scaling is even more pronounced on domains without natural tokenization boundaries, like Chinese, code, and DNA.
> 2. H-Nets can be stacked together to learn from deeper hierarchies, which further improves performance.
> 3. H-Nets are significantly more robust to small perturbations in input data like casing, showing an avenue for creating models that are more robust and aligned with human reasoning.
In specific, I'm really proud of "spec driven development", which is based on the internal processes that software development teams at Amazon use to build very large technical projects. Kiro can take your basic "vibe coding" prompt, and expand it into deep technical requirements, a design document (with diagrams), and a task list to break down large projects into smaller, more realistic chunks of work.
I've had a ton of fun not just working on Kiro, but also coding with Kiro. I've also published a sample project I built while working on Kiro. It's a fairly extensive codebase for an infinite crafting game, almost 95% AI coded, thanks to the power of Kiro: https://github.com/kirodotdev/spirit-of-kiro
I wrote more about Spec Driven AI development here: https://lukebechtel.com/blog/vibe-speccing
Loading parent story...
Loading comment...
In fact in Gu's blog post (linked in a post below) it's mentioned that they created a Mamba model that used this in place of the tokenizer.