And for that matter, what is
>mix’n’match capability in Gemma 3n to dynamically create submodels
It seems like mixture-of-experts taken to the extreme, where you actually create an entire submodel instead of routing per token?
> Gemma 3n leverages a Google DeepMind innovation called Per-Layer Embeddings (PLE) that delivers a significant reduction in RAM usage.
Like you I’m also interested in the architectural details. We can speculate but we’ll probably need to wait for some sort of paper to get the details.