Last I looked, NVidia worked well, and AMD was horrible. Right now, it looks like the major limiting factor (if you don't care about a ≈3x difference in performance, which I don't) is RAM. More is better, and good models need >10GB, while LLMs can be up to 350GB.
* Intel Arc A770 has 16GB for <$300. I have no idea about compatibility with Hugging Face, Blender, etc.
* NVidia 4060 has 16GB for <$500. 100% compatible with everything.
* Older NVidia (e.g. Pascal era) can be had with 24GB for <$300 used, without a graphics port. Not clear how CUDA compute capability lines up to what's needed for modern tools, or how well things work without a graphics port.
* Several cards may or may not work together. I'm not sure.
Is there any way to figure this stuff out, and what's reasonable / practical / easy? Something which explains CUDA compute levels, vendor compatibility, multi-card compatibility, and all that jazz. It'd be nice to have a generic enough guide to understand both pro and amateur use, e.g.:
- A770 x21, if someone got it working, could handle Facebook's OPT-175 for <$10k via Alpa. That brings it into "rich hobbyist" or "justifiable business expense" range. Not clear if that's practical.
- Kids learning AI would be much easier if it's cheaper (e.g. A770)
- "General compute" also includes things like Blender or accelerating rendering in kdenlive, etc.
- Etc.
This stuff is getting useful to a broader and broader audience, but it's confusing.
This is sorta _the_ guide on GPUs for DL and has a great decision tree https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni...
Personally, I'm limited to an RTX 2080 for my personal projects at the moment, and I find the constraint pretty rewarding. It forces me to find alternatives to the huge models, and you'd be surprised what you can eek out when you pour in the time to tweak models. Of course, good data is also pinnacle.
Across vendors, generally, Nvidia still dominates currently. People are adding more support into ML libraries for other vendors via (second-class imo) alternate backends but expect to be patient if you're waiting for the day when there is healthy competition.
IMO, I'd say: if you can save up for it, get a 4090; if you can save up for half a 4090, get a 3090 - seen many going for 600-800 now. If you can save up for half a 3090, I'm not sure - depends on if you prefer speed or VRAM. If it were me, I'd pick more VRAM first.
re: compute capability, you can see here:
- which GPUs have what cc: https://developer.nvidia.com/cuda-gpus
- what cc comes with what features: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index....
I think the main qualitative change (beyond bigger numbers in the spec) for an enduser of machine learning libraries from 8.6 -> 8.9 (ie 3090 -> 4090) is this line:
> 4 mixed-precision Fourth-Generation Tensor Cores supporting fp8, fp16, __nv_bfloat16, tf32, sub-byte and fp64 for compute capability 8.9 (see Warp matrix functions for details)
ie new precisions will be builtin to eg pytorch with hw-level/tensor core support
edit: btw you probably ought to stick to a consumer gpu (ie not professional) if you want it to be generally versatile while also easy to use at home.
Cards like the A770 are awesome, but barely even support raster drivers on DirectX. Your best bang-for-buck options are going to be Nvidia-only for now, with a few competing AMD cards that have fast-tracked Pytorch support.
I would say start with the 3060 for 250 bucks, and if you're still loving it after a couple months, drop 10x more on a quadro.
My only word of advice is get docker setup and install the nvidia docker toolkit to passthrough your gpu to docker images -- the package management for all these python ai tools is a hell-scape, especially if you want to try a bunch of different things.
> re: compute capability, you can see here:
My key question is much more pragmatic:
1) If I grab a random model from Hugging Face, will it accelerate?
2) If I run Blender, kdenlive, or DaVinci Resolve, will it accelerate?
Is there a line where things break?
I definitely prefer more VRAM to more speed. As an occasional user, speed doesn't really matter. Things working does.
Probably, it depends more on how you configure the inferencing software. Most software that supports acceleration starts with CUDA or CUBLAS, so you should be good.
> If I run Blender, kdenlive, or DaVinci Resolve, will it accelerate?
Yep. If you're running Linux, some distros might be a little iffy about shipping the proprietary/accelerated versions of this software, but most are fine. The Flatpak versions should all have Nvidia acceleration working out-of-box, if you do encounter any issues.
> Is there a line where things break?
Yes, but you can avoid it by choosing smaller quantizations and giving yourself a few gigs of VRAM headroom. In my experience, it's always better to select a model smaller than you need so you're not risking an OOM crash (I've got a 3070ti).
Lotta other great advice in this thread, though! Good luck picking something out.
That site is a goldmine for perf benchmarks, I actually use that site if I want to do a rough comparison of GPU performance across models for 3D / animation / gaming uses. Even though that is Blender specific, I'm pretty confident the results apply to any usage in the same class of applications.
Unfortunately, only NN software are more or less standardized, so in many cases, you could choose best fit for your pocket, but all other could be tightly coupled not even to one brand, but to one model. For example, I've seen some software which work in Nvidia-960; I'm not sure about 1060; it don't work on 2060 (for some reason, developers avoid this series).
There is almost no way you will make back the $5k for a 40GB+ ram card, so just save yourself all the hassle and go for something that ticks all the rest of your boxes.
Non-CUDA cards may be ok if you have very simple requirements, but I'd expect many hours of debugging if you want something that's not ready to go out of the box.
With that aforementioned A6000 ($5k retail) you'd have to use it for at least six thousand hours to break even on the cloud cost.
[1] https://lambdalabs.com/service/gpu-cloud#pricing
Something people forget too is that if you have no Nvidia GPUs at all locally, you'll need to spend an significant amount of time installing a new node, copying data, and debugging in your cloud instance, each time you want to do something, while being charged for it. It's a pretty big boost in terms of my time to develop locally and then scale to the cloud once something smaller scale is working.
Emotionally, it doesn't. The problem is if I own something, I'll use it freely. If I rent a GPU, I'll be stressing and counting pennies. In practice, I'll use it less.
On the whole, I'd rather buy even if it costs more, because I'll use it, and in the long term, that pays dividends.
That's not everyone. That's me.
I'd love to have others here try it out and give me some feedback on how I could make it useful. It's only a couple weeks in but already seems valuable to me. What am I missing?
AMD is too funky for most still. I have an Mi60 that won’t load drivers due to some PSP (platform security processor) missing firmware on the GPU…
vasti.ai have the best prices I have seen, but comes with limitations, and still not the best deal for an entire month.
It could also be a low power cap - I had a Dell C4140 for a bit with 220V power supplies and 120V power, locking the entire thing to ~50% of the max power cap per GPU basically.
so probably you can expect similar results:
https://old.reddit.com/r/Amd/comments/15t0lsm/i_turned_a_95_...
HN https://news.ycombinator.com/item?id=37162762
One big disadvantage for older Turing card, no bfloat16. But if you run a quantized/mixed precision model or QLoRA, it doesn’t hurt as much.