Readit News logoReadit News
juank10 commented on Breaking Quadratic Barriers: A Non-Attention LLM for Ultra-Long Context Horizons   arxiv.org/abs/2506.01963... · Posted by u/PaulHoule
juank10 · 3 months ago
Funnily enough, the code was deleted in the repo, but can still be seen in the commits. It's what you would expect from the paper :D

On the general topic of non-attention LLMs, I recommend checking out the MesaNet [1], Rodimus [2], Gated DeltaNet [3], or Mamba2 [4]. They are currently SOTA.

However, I have yet to see a compelling non attention based model that achieves good performance on code, math, reasoning, or multi-turn QA tasks. I do not think we are getting rid of attention soon, I believe the ability to look back is crucial in certain tasks. [1] https://arxiv.org/abs/2506.05233 [2] https://arxiv.org/abs/2410.06577 [3] https://arxiv.org/abs/2412.06464 [4] https://arxiv.org/abs/2405.21060

u/juank10

KarmaCake day1August 13, 2024View Original