I'm no expert on these MoE models with "a total of 389 billion parameters and 52 billion active parameters".
Do hobbyists stand a chance of running this model (quantized) at home?
For example on something like a PC with 128GB (or 512GB) RAM and one or two RTX 3090 24GB VRAM GPUs?
RAM for 4-bit is 1GB per 2 billion parameters. So you will want 256GB RAM and at least one GPU. If you only have one server and one user, it's the full parameter count. (If you have multiple GPUs/servers and many users in parallel, you can shard and route it so you only need the active parameter count per GPU/server. So it's cheaper at scale.)
525GB/s to 1000GB/s will double the TPS at best, which is still quite low for large LLMs.