Readit News logoReadit News
lukebechtel commented on Hierarchical Modeling (H-Nets)   cartesia.ai/blog/hierarch... · Posted by u/lukebechtel
cubefox · 8 months ago
As Mamba didn't make it, will H-Nets replace Transformers?
lukebechtel · 8 months ago
It's meant to replace the BPE tokenizer piece, so it isn't a full Language Model by itself.

In fact in Gu's blog post (linked in a post below) it's mentioned that they created a Mamba model that used this in place of the tokenizer.

Loading parent story...

Loading comment...

marviel commented on Hierarchical Modeling (H-Nets)   cartesia.ai/blog/hierarch... · Posted by u/lukebechtel
gdiamos · 8 months ago
How does it handle images?
marviel · 8 months ago
it mentions native multimodality somewhere in either the Arxiv or post -- seems like it might handle it well?
lukebechtel commented on Hierarchical Modeling (H-Nets)   cartesia.ai/blog/hierarch... · Posted by u/lukebechtel
cs702 · 8 months ago
I've only skimmed the paper, but it looks interesting and credible, so I've added it to my reading list.

Thank you for sharing on HN!

---

EDIT: The hierarchical composition and routing aspects of this work vaguely remind me of https://github.com/glassroom/heinsen_routing/ but it has been a while since I played with that. UPDATE: After spending a bit more time on the OP, it's different, but the ideas are related, like routing based on similarity.

lukebechtel · 8 months ago
No problem! I'm still parsing it myself, but it seems promising in theory, and the result curves are impressive.
lukebechtel commented on Hierarchical Modeling (H-Nets)   cartesia.ai/blog/hierarch... · Posted by u/lukebechtel
lukebechtel · 8 months ago
> H-Net demonstrates three important results on language modeling:

> 1. H-Nets scale better with data than state-of-the-art Transformers with BPE tokenization, while learning directly from raw bytes. This improved scaling is even more pronounced on domains without natural tokenization boundaries, like Chinese, code, and DNA.

> 2. H-Nets can be stacked together to learn from deeper hierarchies, which further improves performance.

> 3. H-Nets are significantly more robust to small perturbations in input data like casing, showing an avenue for creating models that are more robust and aligned with human reasoning.

lukebechtel commented on Hierarchical Modeling (H-Nets)   cartesia.ai/blog/hierarch... · Posted by u/lukebechtel
lukebechtel · 8 months ago
> H-Net demonstrates three important results on language modeling:

> 1. H-Nets scale better with data than state-of-the-art Transformers with BPE tokenization, while learning directly from raw bytes. This improved scaling is even more pronounced on domains without natural tokenization boundaries, like Chinese, code, and DNA.

> 2. H-Nets can be stacked together to learn from deeper hierarchies, which further improves performance.

> 3. H-Nets are significantly more robust to small perturbations in input data like casing, showing an avenue for creating models that are more robust and aligned with human reasoning.

marviel commented on Kiro: A new agentic IDE   kiro.dev/blog/introducing... · Posted by u/QuinnyPig
NathanKP · 8 months ago
Hello folks! I've been working on Kiro for nearly a year now. Happy to chat about some of the things that make it unique in the IDE space. We've added a few powerful things that I think make it a bit different from other similar AI editors.

In specific, I'm really proud of "spec driven development", which is based on the internal processes that software development teams at Amazon use to build very large technical projects. Kiro can take your basic "vibe coding" prompt, and expand it into deep technical requirements, a design document (with diagrams), and a task list to break down large projects into smaller, more realistic chunks of work.

I've had a ton of fun not just working on Kiro, but also coding with Kiro. I've also published a sample project I built while working on Kiro. It's a fairly extensive codebase for an infinite crafting game, almost 95% AI coded, thanks to the power of Kiro: https://github.com/kirodotdev/spirit-of-kiro

marviel · 8 months ago
nice! Can't agree more on Vibe Speccing.

I wrote more about Spec Driven AI development here: https://lukebechtel.com/blog/vibe-speccing

Loading parent story...

Loading comment...

marviel commented on Usability barriers for liquid types [pdf]   catarinagamboa.github.io/... · Posted by u/azhenley
marviel · 8 months ago
have often wanted something like "liquid types", but didn't know what it was called! Thanks for this

u/marviel

KarmaCake day947November 3, 2015
About
lukebechtel.com

Say hi : luke (at) lukebechtel.com

Current: - Founder @ Reasonote (https://reasonote.com) - AI/ML Consulting @ Positive Sum Products (https://positivesum products.com)

Former: - Principal AI/ML Eng. @ Regscale (https://regscale.com) - Dir. of Eng. @ Revaly (now Cadchat) (Acq. in 2023) - Cofounder @ Collider Inc. (Acq. in 2022)

View Original