For those talking about breakeven points and cheap cloud compute, you need to factor in the mental difference it makes running a test locally (which feels free) vs setting up a server and knowing you're paying per hour it's running. Even if the cost is low, I do different kinds of experiments knowing I'm not 'wasting money' every minute the GPU sits idle. Once something is working, then sure scaling up on cheap cloud compute makes sense. But it's really, really nice having local compute to get to that state.
Lots of people really underestimate the impact of that mental state and the activation energy it creates towards doing experiments - having some local compute is essential!
This. In the second article, the author touches on this a bit.
With a local setup, I often think, "Might as well run that weird xyz experiment over night" (instead of idling)
On a cloud setup, the opposite is often the case: "Do I really need that experiment or can I shut down the sever to save money?".
Makes a huge difference over longer periods.
For companies or if you just want to try a bit, then the cloud is a good option, but for (Ph.D.) researchers, etc., the frictionless local system is quite powerful.
I have the same attitude towards gym memberships - it really helps to know I can just go in for 30 minutes when I feel like it without worrying whether I’d be getting my money’s worth.
It depends on your local energy prices and your internet speed as well. It may actually be cheaper and faster to spin up a cloud instance. I bought an 80 core server to do some heavy lifting back in 2020. It doesn't even have a GPU, but it costs me around 4 euros per day to run. For that price I can keep a cloud GPU instance running. And even the boot up time isn’t any slower.
What cloud GPU instance are you talking about here? Most of the GPUs cost around 2 to 40 dollars an hour. I would love to know the provider who is offering one for 4 dollars a day...
That's a great point! I'd agree that just the extra emotional motivation from having your own thing is worth a ton. I get some distance down that way by having a large RAM no GPU box, so that things are slow but at least possible for random small one offs.
I was thinking of doing something similar, but I am a bit sceptical about how the economics on this works out. On vast.ai renting a 3x3090 rig is $0.6/hour. The electricity price of operating this in e.g. Germany is somewhere about $0.05/hour. If the OP paid 1700 EUR for the cards, the breakeven point would be around (haha) 3090 hours in, or ~128 days, assuming non-stop usage. It's probably cool to do that if you have a specific goal in mind, but to tinker around with LLMs and for unfocused exploration I'd advise folks to just rent.
> On vast.ai renting a 3x3090 rig is $0.6/hour. The electricity price of operating this in e.g. Germany is somewhere about $0.05/hour.
Are you factoring in the varying power usage in that electricity price?
The electricity cost of operating locally will vary depending on the actual system usage. When idle, it should be much cheaper. Whereas in cloud hosts you pay the same price whether the system is in use or not.
Plus with cloud hosts reliability is not guaranteed. Especially with vast.ai, where you're renting other people's home infrastructure. You might get good bandwidth and availability on one host, but when that host disappears, you should hope that you did a backup, which vast.ai charges for separately, and if so, you need to spend time restoring the backup to another, hopefully equally reliable host, which can take hours depending on the amount of data and bandwidth.
I recently built an AI rig and went with 2x3090s, and am very happy with the setup. I evaluated vast.ai beforehand, and my local experience is much better, while my electricity bill is not much higher (also in EU).
with runpod/vast, you can request a set amount of time - generally if I request from Western EU or North America the availability is fine on the week-to-month timescale.
fwiw I find runpod's vast clone significantly better than vast and there isn't really a price premium.
With the current more-or-less dependency on CUDA and thus Nvidia hardware it's about making sure you actually have the hardware available consistently.
I've had VERY hit-miss results with Vast.ai and I'm convinced people are cheating their evaluation stuff because when the rubber meets the road it's very clear performance isn't what it's claimed to be. Then you still need to be able to actually get them...
Unless you are training, you never hit peak watts. When inferring, the watt is still minimal.
I'm running inference now and using 20%. GPU 0 is using more because I have it as main GPU. Idle watt sits at about 5%.
Device 0 [NVIDIA GeForce RTX 3060] PCIe GEN 3@16x RX: 0.000 KiB/s TX: 55.66 MiB/s
GPU 1837MHz MEM 7300MHz TEMP 43°C FAN 0% POW 43 / 170 W
GPU[|| 5%] MEM[|||||||||||||||||||9.769Gi/12.000Gi]
Device 1 [Tesla P40] PCIe GEN 3@16x RX: 977.5 MiB/s TX: 52.73 MiB/s
GPU 1303MHz MEM 3615MHz TEMP 22°C FAN N/A% POW 50 / 250 W
GPU[||| 9%] MEM[||||||||||||||||||18.888Gi/24.000Gi]
Device 2 [Tesla P40] PCIe GEN 3@16x RX: 164.1 MiB/s TX: 310.5 MiB/s
GPU 1303MHz MEM 3615MHz TEMP 32°C FAN N/A% POW 48 / 250 W
GPU[|||| 11%] MEM[||||||||||||||||||18.966Gi/24.000Gi]
When you compute the break even point did you factor in that you still own the cards and you can resell them? I bought my 3090s for 1000$ and after 1 year I think they go for more in the open market if I resell them now.
I just made a clone of diskprices.com for GPUs specifically for AI training, and it has a power and depreciation calculator: https://gpuprices.us
You can expect a GPU to last 5 years. So for 128 days break even you are only looking at 6.67% utilization. If you are doing training runs, I think you are going to beat it easily.
P.S. coincidentally or not, but shortly after it got mentioned on Hacker News, Best Buy run out of both RTX 4090s and RTX 4080s. They used to top the chart. Turns out at descent utilization they win due to the electricity costs.
the current economics is a low ball to get costumers. it's absolutely not going to be the market price once commercial interests have locked in their products.
but if you're just goofing around and not planning to create anything production worthy, it's a great deal.
Well, almost. GPUs have not be depreciating. The cost of 3090's and 4090's have gone up. Folks are selling it for what they paid for or even more. With the recent 40's SUPER series from Nvidia, I'm not expecting any new releases in a year. AMD & Intel still have ways to go before major adoption. Startups are buying up consumer cards. So I sadly expect prices to stay more or less the same.
He can use these cards for 128days non stop and re-sell, claiming back the purchase price almost fully since OP bought them cheap. Buying doesn't mean you use the GPUs to a point where they end up costing 0, yes there is risk with GPUs going but but c'mon.... Renting is money you will never see again.
This is the new startup from George Hotz. I would like him to succeed, but I'm not so optimistic about their chances of selling a $15k box that is most likely less than $10k in parts. Most people would do much better by buying a second-hand 3090 or similar and connecting them into a rig.
Not necessarily, I'm not sure about AMD GPUs, but he tweeted that AMD supports linking all 6 together. If that's the case, then 6 of those XTX should crush 6 3090's. For us techies we definitely will decide to build vs buy. However businesses would definitely decide to buy vs build.
People complain about the "Nvidia tax". I don't like monopolies and I fully support the efforts of AMD, Intel, Apple, anyone to chip away at this.
That said as-is with ROCm you will:
- Absolutely burn hours/days/weeks getting many (most?) things to work at all. If you get it working you need to essentially "freeze" the configuration because an upgrade means do it all over again.
- In the event you get it to work at all you'll realize performance is nowhere near the hardware specs.
- Throw up your hands and go back to CUDA.
Between what it takes to get ROCm to work and the performance issues the Nvidia tax becomes a dividend nearly instantly once you factor in human time, less-than-optimal performance, and opportunity cost.
Nvidia says roughly 30% of their costs are on software. That's what you need to do to deliver something that's actually usable in the real world. With the "Nvidia tax" they're also reaping the benefit of the ~15 years they've been sinking resources into CUDA.
Resistance is useless! Lets just accept our fate and toe the line. Why feel bad about paying essentially double, or getting half the compute for our money when we can just choose the easy route and accept our fate and feed the monopoly a little more money so they can charge us even more money. Who needs competition!
Which is not unreasonable for that amount of hardware.
You have to ask yourself if you want to drop that kind of money on consumer GPUs, which launched late 2022. But then again, with that kind of money you are stuck with consumer GPUs either way, unless you want to buy Ada workstation cards for 6k each and those are just 4090s with p2p memory enabled. Hardly worth the premium, if you don't absolutely have to have that.
Somewhat tangential question, but I'm wondering if anyone knows of a solution (or Google search terms for this):
I have a 3U supermicro server chassis that I put an AM4 motherboard into, but I'm looking at upgrading the Mobo so that I can run ~6 3090s in it. I don't have enough physical PCIE slots/brackets in the chassis (7 expansion slots), so I either need to try to do some complicated liquid cooling setup to make the cards single slot (I don't want to do this), or I need to get a bunch of riser cables and mount the GPU above the chassis. Is there like a JBOD equivalent enclosure for PCIE cards? I don't really think I can run the risers out the back of the case, so I'll likely need to take off/modify the top panel somehow. What I'm picturing in my head is basically a 3U to 6U case conversion, but I'm trying to minimize cost (let's say $200 for the chassis/mount component) as well as not have to cut metal.
You'll need something like EPYC/Xeon CPUs and motherboards which not only have many more PCIe lanes, but also allow bifurcation. Once you have that, you can get bifurcated risers and have many GPUs. And these risers use normal cables not the typical gamer pcie risers which are pretty hard to arrange. You won't get this for just $200 though.
For the chassis, you could try a 4U rosewill like this: https://www.youtube.com/watch?v=ypn0jRHTsrQ, not sure if 6 3090s would fit though. You're probably better off getting a mining chassis, it's easier to setup and cool down, also cheaper, unless you plan on putting them in a server rack.
I really enjoy and am inspired by the idea that people like Dettmer (and probably this Samsja person) are the spiritual successors to homebrew hackers in the 70s and 80s. They have pretty intimate knowledge of many parts of the whole goddamn stack, from what's going on in each hardware component, to how to assemble all the components into a rig, up to all the software stuff: algorithms, data, orchestration, etc.
Am also inspired by embedded developers for the same reason
For large VRAM models, what about selling one of the 3090s, and putting the money towards an NVLink and a motherboard with two x16 PCIe slots (and preferably spaced so you don't need riser cables)?
IME NVLink would be overkill for this. Model parallelism means you only need bandwidth to transfer the intermediate activations (/gradients + optimizer state) at the seams and inference speed is generally slow enough that even pcie x8 won't be a bottleneck.
full riser cables like they used doesn't impact performance. Hanging it off on open air frame IMO is better, keeps everything cooler, not just the GPU but the motherboard and surrounding components. With only 2 24gb GPU they are not going to be able to run larger models. You can't experiment with 70b models without offloading to CPU which is super slow. The best models are 70b+ models.
48 gb suffice for 4-bit inference and q-lora training of a 70b model. ~80 GB allows you to push it to 8-bit (which is nice of course), but full precision finetuning is completely out of reach either way.
Though you're right of course that pcie will totally suffice for this case.
I've been slowly expanding my HTPC/media server into a gaming server and box for running LLMs (and possibly diffusion models?) locally for playing around with. I think it's becoming clear that the future of LLM's will be local!
My box has a Gigabyte B450M, Ryzen 2700X, 32GB RAM, Radeon 6700XT (for gaming/streaming to steam link on Linux), and an "old" Geforce GTX 1650 with a paltry 6GB of RAM for running models on. Currently it works nicely with smaller models on ollama :) and it's been fun to get it set up. Obviously, now that the software is running I could easily swap in a more modern NVidia card with little hassle!
I've also been eyeing the b450 steel legend as a more capable board for expansion than the Gigabyte board, this article gives me some confidence that it is a solid board.
For those talking about breakeven points and cheap cloud compute, you need to factor in the mental difference it makes running a test locally (which feels free) vs setting up a server and knowing you're paying per hour it's running. Even if the cost is low, I do different kinds of experiments knowing I'm not 'wasting money' every minute the GPU sits idle. Once something is working, then sure scaling up on cheap cloud compute makes sense. But it's really, really nice having local compute to get to that state.
With a local setup, I often think, "Might as well run that weird xyz experiment over night" (instead of idling) On a cloud setup, the opposite is often the case: "Do I really need that experiment or can I shut down the sever to save money?". Makes a huge difference over longer periods.
For companies or if you just want to try a bit, then the cloud is a good option, but for (Ph.D.) researchers, etc., the frictionless local system is quite powerful.
Deleted Comment
Are you factoring in the varying power usage in that electricity price?
The electricity cost of operating locally will vary depending on the actual system usage. When idle, it should be much cheaper. Whereas in cloud hosts you pay the same price whether the system is in use or not.
Plus with cloud hosts reliability is not guaranteed. Especially with vast.ai, where you're renting other people's home infrastructure. You might get good bandwidth and availability on one host, but when that host disappears, you should hope that you did a backup, which vast.ai charges for separately, and if so, you need to spend time restoring the backup to another, hopefully equally reliable host, which can take hours depending on the amount of data and bandwidth.
I recently built an AI rig and went with 2x3090s, and am very happy with the setup. I evaluated vast.ai beforehand, and my local experience is much better, while my electricity bill is not much higher (also in EU).
Agreed on reliability and data transfer, that's a good point.
Out of curiosity, what do you use a 2x3090 rig for? Bulk not time-sensitive inference on down quanted models?
Is there a goto card for low memory (1-2BN) models?
Something with much better flops/$ but purposely crippled with low memory.
fwiw I find runpod's vast clone significantly better than vast and there isn't really a price premium.
- if I have it locally, I'll play with it
- if not, I won't (especially with my data)
- if I have something ready for a long run I may or may not want to send it somewhere (it's not going to be on 3090s for sure if I send it)
- if I have requirement to have something public I'd probably go for per usage with ie [0].
[0] https://www.runpod.io/serverless-gpu
I've had VERY hit-miss results with Vast.ai and I'm convinced people are cheating their evaluation stuff because when the rubber meets the road it's very clear performance isn't what it's claimed to be. Then you still need to be able to actually get them...
Unfortunately my CFO (a.k.a Wife) does not share the same understanding.
(not really, but it is a joke I read someplace and I think it applies to a lot of couples).
Device 0 [NVIDIA GeForce RTX 3060] PCIe GEN 3@16x RX: 0.000 KiB/s TX: 55.66 MiB/s GPU 1837MHz MEM 7300MHz TEMP 43°C FAN 0% POW 43 / 170 W GPU[|| 5%] MEM[|||||||||||||||||||9.769Gi/12.000Gi]
Device 1 [Tesla P40] PCIe GEN 3@16x RX: 977.5 MiB/s TX: 52.73 MiB/s GPU 1303MHz MEM 3615MHz TEMP 22°C FAN N/A% POW 50 / 250 W GPU[||| 9%] MEM[||||||||||||||||||18.888Gi/24.000Gi]
Device 2 [Tesla P40] PCIe GEN 3@16x RX: 164.1 MiB/s TX: 310.5 MiB/s GPU 1303MHz MEM 3615MHz TEMP 32°C FAN N/A% POW 48 / 250 W GPU[|||| 11%] MEM[||||||||||||||||||18.966Gi/24.000Gi]
You can expect a GPU to last 5 years. So for 128 days break even you are only looking at 6.67% utilization. If you are doing training runs, I think you are going to beat it easily.
P.S. coincidentally or not, but shortly after it got mentioned on Hacker News, Best Buy run out of both RTX 4090s and RTX 4080s. They used to top the chart. Turns out at descent utilization they win due to the electricity costs.
[0] https://www.royalgazette.com/general/business/article/202307...
but if you're just goofing around and not planning to create anything production worthy, it's a great deal.
vast.ai is basically a clearinghouse. they are not doing some VC subsidy thing
in general, community clouds are not suitable for commercial use.
https://tinygrad.org/
https://twitter.com/__tinygrad__/status/1760988080754856210
Dead Comment
People complain about the "Nvidia tax". I don't like monopolies and I fully support the efforts of AMD, Intel, Apple, anyone to chip away at this.
That said as-is with ROCm you will:
- Absolutely burn hours/days/weeks getting many (most?) things to work at all. If you get it working you need to essentially "freeze" the configuration because an upgrade means do it all over again.
- In the event you get it to work at all you'll realize performance is nowhere near the hardware specs.
- Throw up your hands and go back to CUDA.
Between what it takes to get ROCm to work and the performance issues the Nvidia tax becomes a dividend nearly instantly once you factor in human time, less-than-optimal performance, and opportunity cost.
Nvidia says roughly 30% of their costs are on software. That's what you need to do to deliver something that's actually usable in the real world. With the "Nvidia tax" they're also reaping the benefit of the ~15 years they've been sinking resources into CUDA.
You have to ask yourself if you want to drop that kind of money on consumer GPUs, which launched late 2022. But then again, with that kind of money you are stuck with consumer GPUs either way, unless you want to buy Ada workstation cards for 6k each and those are just 4090s with p2p memory enabled. Hardly worth the premium, if you don't absolutely have to have that.
I have a 3U supermicro server chassis that I put an AM4 motherboard into, but I'm looking at upgrading the Mobo so that I can run ~6 3090s in it. I don't have enough physical PCIE slots/brackets in the chassis (7 expansion slots), so I either need to try to do some complicated liquid cooling setup to make the cards single slot (I don't want to do this), or I need to get a bunch of riser cables and mount the GPU above the chassis. Is there like a JBOD equivalent enclosure for PCIE cards? I don't really think I can run the risers out the back of the case, so I'll likely need to take off/modify the top panel somehow. What I'm picturing in my head is basically a 3U to 6U case conversion, but I'm trying to minimize cost (let's say $200 for the chassis/mount component) as well as not have to cut metal.
For the chassis, you could try a 4U rosewill like this: https://www.youtube.com/watch?v=ypn0jRHTsrQ, not sure if 6 3090s would fit though. You're probably better off getting a mining chassis, it's easier to setup and cool down, also cheaper, unless you plan on putting them in a server rack.
They have single-slot GPU waterblocks but would want something like $400 or more each for them individually.
Am also inspired by embedded developers for the same reason
Though you're right of course that pcie will totally suffice for this case.
Dead Comment
I would prefer a tutorial on how to do this.
My box has a Gigabyte B450M, Ryzen 2700X, 32GB RAM, Radeon 6700XT (for gaming/streaming to steam link on Linux), and an "old" Geforce GTX 1650 with a paltry 6GB of RAM for running models on. Currently it works nicely with smaller models on ollama :) and it's been fun to get it set up. Obviously, now that the software is running I could easily swap in a more modern NVidia card with little hassle!
I've also been eyeing the b450 steel legend as a more capable board for expansion than the Gigabyte board, this article gives me some confidence that it is a solid board.