Sure, all the slop code projects I produce get MIT licensed on public repos. It wasn't mine to begin with, so I wouldn't prevent anyone from using it.
Obviously an RTX 5090 with 32GB of VRAM is even better, but they cost around $2000, if you can find one.
What's interesting about this Strix Halo system is that it has 128GB of RAM that is accessible (or mostly accessible) to the CPU/GPU/APU. This means that you can run much larger models on this system than you possibly could on a 3090, or even a 5090. The performance tests tend to show that the Strix Halo's memory bandwidth is a significant bottleneck though. This system might be the most affordable way of running 100GB+ models, but it won't be fast.
That gives us a total TDP of around 150W, 48 GB of VRAM and we can run Qwen 3 Coder 30B A3B at 4bit quantization with up to 32k context at around 60-70 t/s with Ollama. I also tried out vLLM, but the performance surprisingly wasn't much better (maybe under bigger concurrent load). Felt like sharing the data point, because of similarity.
Honestly it's a really good model, even good enough for some basic agentic use (e.g. with Aider, RooCode and so on), MoE seems the way to go for somewhat limited hardware setups.
Ofc obviously not recommending L4 cards cause they have a pretty steep price tag. Most consumer cards feel a bit power hungry and you'll probably need more than one to fit decent models in there, though also being able to game with the same hardware sounds pretty nice. But speaking of getting more VRAM, the Intel Arc Pro B60 can't come soon enough (if they don't insanely overprice it), especially the 48 GB variety: https://www.maxsun.com/products/intel-arc-pro-b60-dual-48g-t...