https://www.reddit.com/r/LocalLLaMA/
Search for terms like hardware build, running large models, multiple GPU’s, etc. Many people there have multiple, consumer GPU’s. There’s probably something about running multiple A100’s.
HuggingFace might have tutorials, too.
Warning: If it’s A100’s, most people say to just rent them from the cloud as needed cuz they’re highly costly upfront. If they’re normally idle, then it’s not as cost effective to own them. Some were using services like vast.ai to get rentals cheaper.
from https://www.asacomputers.com/nvidia-l40s-48gb-graphics-card....
nvidia l40s 48gb graphics card Our price: $7,569.10*
Not arguing against 'great', but cost efficiency is questionable. for 10% you can get two used 3090. The good thing about LLMs is they are sequential and should be easily parallelized. Model can be split in several sub-models, by the number of GPUs. Then 2,3,4.. GPUs should improve performance proportionally on big batches, and make it possible to run bigger model on low end hardware.