Readit News logoReadit News
aurohacker commented on GLM-4.7: Advancing the Coding Capability   z.ai/blog/glm-4.7... · Posted by u/pretext
mft_ · 3 months ago
I’m never clear, for these models with only a proportion active (32B here) to what extentt this reduces the RAM a system needs, if at all?
aurohacker · 3 months ago
Great answers here, in that, for MoE, there's compute saving but no memory savings even tho the network is super-sparse. Turns out, there is a paper on the topic of predicting in advance the experts to be used in the next few layers, "Accelerating Mixture-of-Experts language model inference via plug-and-play lookahead gate on a single GPU". As to its efficacy, I'd love to know...
aurohacker commented on Meta Superintelligence Labs' first paper is about RAG   paddedinputs.substack.com... · Posted by u/skadamat
aurohacker · 5 months ago
Figure 1 in the paper is all about the encoder and how the context and query is packaged and sent to the decoder. I wish it were more complete...

u/aurohacker

KarmaCake day3August 14, 2025
About
ml practitioner
View Original