Dead Comment
Deleted Comment
There are a variety of other LLM inference implementations that can run on CPU as well.
[0] - https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#su...
[1] - https://docs.vllm.ai/en/v0.6.1/getting_started/cpu-installat...
what model can i run on 1TB and how many tokens per second ?
for instance Nvidia Nemotron Llama 3.1 quantized at what speed ? ill get a GPU too but not sure how much VRAM I need for the best value for your buck
There are a variety of other LLM inference implementations that can run on CPU as well.
[0] - https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#su...
[1] - https://docs.vllm.ai/en/v0.6.1/getting_started/cpu-installat...
what model can i run on 1TB and how many tokens per second ?
for instance Nvidia Nemotron Llama 3.1 quantized at what speed ? ill get a GPU too but not sure how much VRAM I need for the best value for your buck