I don't know what's so special about this paper.
- They claim to use MLA to reduce KV cache by 90%. Yeah, Deepseek invented that for Deepseek V2 (and also V3 and Deepseek R1 etc)
- They claim to use a hybrid linear attention architecture. So does Deepseek V3.2 and that was weeks ago. Or Granite 4, if you want to go even further back. Or Kimi Linear. Or Qwen3-Next.
- They claimed to save a lot of money not doing a full pre-train run for millions of dollars. Well, so did Deepseek V3.2... Deepseek hasn't done a full $5.6mil full pretraining run since Deepseek V3 in 2024. Deepseek R1 is just a $294k post train on top of the expensive V3 pretrain run. Deepseek V3.2 is just a hybrid linear attention post-train run - i don't know the exact price, but it's probably just a few hundred thousand dollars as well.
Hell, GPT-5, o3, o4-mini, and gpt-4o are all post-trains on top of the same expensive pre-train run for gpt-4o in 2024. That's why they all have the same information cutoff date.
I don't really see anything new or interesting in this paper that isn't already something Deepseek V3.2 has already sort of done (just on a bigger scale). Not exactly the same, but is there anything amazingly new that's not in Deepseek V3.2?
From Zebra-Llama's arXiv page: Submitted on 22 May 2025
I would think making sure outside payment links aren’t scams will be more expensive than that because checking that once isn’t sufficient. Scammers will update the target of such links, so you can’t just check this at app submission time. You also will have to check from around the world, from different IP address ranges, outside California business hours, etc, because scammer are smart enough to use such info to decide whether to show their scammy page.
Also, even if it becomes ‘only’ hundreds of dollars, I guess only large companies will be able to afford providing an option for outside payments.
https://store.epicgames.com/en-US/news/introducing-epic-web-...