i dont understand how it is different from mcp. The blog just says "A2A is an open protocol that complements Anthropic's Model Context Protocol (MCP), which provides helpful tools and context to agents." There is no example or anything on how does it complement it
Deepseek R1 also has a MTP layer (layer 61) https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/mod...
But Deepseek R1 adds embed_tokens and shared_head.head tensors, which are [129280, 7168] or about 2GB in size at FP8.
Qwen3-Next doesn't have that: https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct/blob...
So it saves a few GB in active parameters for MTP, which is a Big Deal. This is one of the changes that helps significantly speeds up inference.