ericdotlee (u/ericdotlee)

ericdotlee commented on Don't Buy These GPU's for Local AI Inference aiflux.substack.com/p/don... · Posted by u/ericdotlee

ericdotlee · 6 months ago

With the recent release of Qwen-3 Omni I've decided to put together my first local machine. As much as I just want to pick up a beelink and flash it with Omarchy I think I want a bit more horsepower.

However, the internet seems littered with "clever" loca ai monstrosities that gang together 4-6 ancient nVidia GPU's (priced today to seem like overpriced e-waste) to get lackluster performance from piles of nVidia m60's and P100's? In 2025 this kind of seems like a waste or just bad advice to use hardware this old?

Curious if this find seems like a good source of info regarding staying away from Intel and AMD GPU's for local inference? Might do some training but right now more interested in light RAG and maybe some local coding.

Hoping to build something before the holiday season to keep my office warm with GPU's :).

Thanks!

ericdotlee commented on 25L Portable NV-linked Dual 3090 LLM Rig reddit.com/r/LocalLLaMA/c... · Posted by u/tensorlibb

elsombrero · 6 months ago

On my 2x 3090s I am running glm4.5 air q1 and it runs at ~300pp and 20/30 tk/s works pretty well with roo code on vscode, rarely misses tool calls and produces decent quality code.

I also tried to use it with claude code with claude code router and it's pretty fast. Roo code uses bigger contexts, so it's quite slower than claude code in general, but I like the workflow better.

this is my snippet for llama-swap

``` models: "glm45-air": healthCheckTimeout: 300 cmd: | llama.cpp/build/bin/llama-server -hf unsloth/GLM-4.5-Air-GGUF:IQ1_M --split-mode layer --tensor-split 0.48,0.52 --flash-attn on -c 82000 --ubatch-size 512 --cache-type-k q4_1 --cache-type-v q4_1 -ngl 99 --threads -1 --port ${PORT} --host 0.0.0.0 --no-mmap -hfd mradermacher/GLM-4.5-DRAFT-0.6B-v3.0-i1-GGUF:Q6_K -ngld 99 --kv-unified ```

ericdotlee · 6 months ago

What is llama-swap?

Been looking for more details about software configs on https://llamabuilds.ai

ericdotlee commented on 25L Portable NV-linked Dual 3090 LLM Rig reddit.com/r/LocalLLaMA/c... · Posted by u/tensorlibb

jacquesm · 6 months ago

To be fair though, the 4090 and 5090 are much easier capable of saturating PCI express than the 3090 is, even at 4 lanes per card the 3090 rarely manages to saturate the links, it still handsomely pays off to split down to 4 lanes and add more cards.

I used:

https://c-payne.com/

Very high quality and manageable prices.

ericdotlee · 6 months ago

I've purchase 16 of these - cpayne is great! Hope he finds a US distributor to help with tariffs a bit!

ericdotlee commented on 25L Portable NV-linked Dual 3090 LLM Rig reddit.com/r/LocalLLaMA/c... · Posted by u/tensorlibb

Tepix · 6 months ago

OK, here's my quick critique of the article (having built a similar AM4-based system in 2023 for 2300€):

1) [I thought] The page is blocking cut & paste. Super annoying!

2) The exact mainboard is not specified exactly. There are 4 different boards called "ASUS ROG Strix X670E Gaming" and some of them only have one PCIe x16 slot. None of them can do PCIe x8 when using two GPUs.

3) The shopping link for the mainboard leads to the "ASUS ROG Strix X670E-E Gaming" model. This model can use the 2nd PCIe 5.0 port at only x4 speeds. The RTX 3090 can only do PCIe 4.0 of course so it will run at PCIe 4.0 x4. If you choose a desktop mainboard for having two GPUs, make sure it can run at PCIe x8 speeds when using both GPU slots! Having NVLink between the GPUs is not a replacement for having a fast connection between the CPU+RAM and the GPU and its VRAM.

4) Despite having a last-modified date of September 22nd, he is using his rig mostly with rather outdated or small LLMs and his benchmarks do not mention their quantization, which makes them useless. Also they seem not to be benchmarks at all, but "estimates". Perhaps the headline should be changed to reflect this?

ericdotlee · 6 months ago

Any reason you wouldn't opt for the 4090 or 5090?

ericdotlee commented on Is the Nvidia RTX 3060 the best value for entry level local AI? aiflux.substack.com/p/loc... · Posted by u/ericdotlee

ericdotlee · 6 months ago

Hi, I'm looking to transition from renting GPU's from RunPod to hosting some models locally - specifically qwen-2.5 and some lightweight VLM's like Moondream. It looks like the RTX 3060 12gb is a relatively good option but I don't necessarily have a lot of experience with pc hardware, let alone used hardware.

Curious if anyone here has a similar config of 1-4 RTX 3060s? Trying to decide if picking up a few of these is a good value or if I should just continue renting cloud GPU's?