Internet to connect with the provider, install packages, and search.
It's not perfect but it's a start.
> If you spend twice what I did and go Nvidia you should have nearly no issues running any models.
I goodled what a Radeon W7900 costs and the result on Amazon was €2800 a piece. You say "quad" so that's €11200 (and that's just the GPUs).
You also say "spend twice what I did", which would put the total hardware costs at ~€25000 total.
Excuse me, but this is peak HN detachment from the experience of most people. You propose spending the cost of a car on hardware.
The average person will just pay Anthropic €20 or €100 per month and call it a day, for now.
I'm planning a writing a ROCM inference engine anyways, or at least contributing to the rocm vllm or sglang implementations for my cards since I'm interested in the field. Funnily enough, I wouldn't consider myself bullish on AI, I just want to really learn the field so I can evaluate where it's heading.
I spent about 10k on the cards, though the upgrades were piece meal as I found them cheap. I still have to get custom water blocks for them since the original W7900s (which are cheap) are triple slot, so you can't fit 4 of them in any sort of workstation setup (I even looked at rack mount options).
Bought a used thread ripper pro wrx80 motherboard ($600), I bought the cheapest TR Pro CPU for the MB (3945wx, $150), I bought 3 128Gb DDR4-3200 sticks at 230 each before the craze, was planning on populating all 8 channels if prices went down a bit. Each stick is now 900, more than I paid for all 3 combined (730 with S&H and taxes). So the system is staying as is until prices come down a bit.
For AI assisted programming, the best value prop by far is Gemini (free) as the orchestrator + open code using either free models or grok / minimax / glm through their very cheap plans (for minimax or glm) or open router which is very cheap. You can also find some interest providers like Cerebras, who get silly fast token generation, which enables interesting cases.
You went from a 5 minute signup (and 20-200 bucks per month) to probably weeks of research (or prior experience setting up workstations) and probably days of setup. Also probably a few thousand bucks in hardware.
I mean, that's great, but tech companies are a thing because convenience is a thing.
Even paying API pricing it was significantly cheaper than the nearly $500 I was paying monthly (I was spending about $100 month combined between Claude pro, chat gpt plus, and open router credits).
Only when I knew exactly the setup I wanted locally did I start looking at hardware. That part has been a PITA since I went with AMD for budget reasons and it looks like I'll be writing my own inference engine soon, but I could have gone with Nvidia and had much less issues (for double the cost, dual Blackwell's vs quad Radeon W7900s for 192GB of VRAM).
If you spend twice what I did and go Nvidia you should have nearly no issues running any models. But using open router is super easy, there are always free models (grok famously was free for a while), and there are very cheap and decent models.
All of this doesn't matter if you aren't paying for your AI usage out of pocket. I was so Anthropics and OpenAIs value proposition vs basically free Gemini + open router or local models is just not there for me.
Opus might be currently the best model out there, and CC might be the best tool out of the commercial alternatives, but once someone switches to open code + multiple model providers depending on the task, they are going to have difficulty winning them back considering pricing and their locked down ecosystem.
I went from max 20x and chatgpt pro to Claude pro and chat gpt plus + open router providers, and I have now cancelled Claude pro and gpt plus, keeping only Gemini pro (super cheap) and using open router models + a local ai workstation I built using minimax m2.1 and glam 4.7. I use Gemini as the planner and my local models as the churners. Works great, the local models might not be as good as opus 4.5 or sonnet 4.7, but they are consistent which is something I had been missing with all commercial providers.
He's also got Cleartype on and set to RGB stripe even though the OLED is not RGB stripe (though to be fair, Windows doesn't really make it clear what each page of the ClearType tuner does).
But yeah, if you use a _tiny_ font and sit _really_ close to the screen, you see fringing. In practice for me, it's been unnoticeable.
The sub pixel geometry on samsung's qd-oled needs very specific font configuration to be correctly displayed, and even then it just stops looking bad.
Deleted Comment
That should easily run an 8 bit (~360GB) quant of the model. It's probably going to be the first actually portable machine that can run it. Strix Halo does not come with enough memory (or bandwidth) to run it (would need almost 180GB for weights + context even at 4 bits), and they don't have any laptops available with the top end (max 395+) chips, only mini PCs and a tablet.
Right now you only get the performance you want out of a multi GPU setup.
What kind of hardware do you have to be able to run a performant GPT-OSS-120b locally?
There are many platforms out there that can run it decently.
AMD strix halo, Mac platforms. Two (or three without extra ram) of the new AMD AI Pro R9700 (32GB of RAM, $1200), multi consumer gpu setups, etc.