Readit News logoReadit News
humblyCrazy commented on Qwen3-Next   qwen.ai/blog?id=4074cca80... · Posted by u/tosh
jychang · 3 months ago
Coolest part of Qwen3-Next, in my opinion, (after the linear attention parts) is that they do MTP without adding another un-embedding matrix.

Deepseek R1 also has a MTP layer (layer 61) https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/mod...

But Deepseek R1 adds embed_tokens and shared_head.head tensors, which are [129280, 7168] or about 2GB in size at FP8.

Qwen3-Next doesn't have that: https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct/blob...

So it saves a few GB in active parameters for MTP, which is a Big Deal. This is one of the changes that helps significantly speeds up inference.

humblyCrazy · 3 months ago
How is MTP different from Medusa heads? Also does this mean this model comes "natively" with speculative decoding - meaning if I use this model in vllm, it's throughput should be higher because it is already doing MTP so it should be able to take advantages of speculative decoding?
humblyCrazy commented on The Agent2Agent Protocol (A2A)   developers.googleblog.com... · Posted by u/meetpateltech
humblyCrazy · 8 months ago
i dont understand how it is different from mcp. The blog just says "A2A is an open protocol that complements Anthropic's Model Context Protocol (MCP), which provides helpful tools and context to agents." There is no example or anything on how does it complement it

u/humblyCrazy

KarmaCake day5December 7, 2023View Original