Readit News logoReadit News

Dead Comment

Dead Comment

Dead Comment

Deleted Comment

Dead Comment

pajeets commented on Ask HN: I have 24 core server with 1TB of DDR4 RAM, what should I run?    · Posted by u/pajeets
kkielhofner · 10 months ago
llama.cpp and others can run purely on CPU[0]. Even production grade serving frameworks like vLLM[1].

There are a variety of other LLM inference implementations that can run on CPU as well.

[0] - https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#su...

[1] - https://docs.vllm.ai/en/v0.6.1/getting_started/cpu-installat...

pajeets · 10 months ago
wait this is crazy

what model can i run on 1TB and how many tokens per second ?

for instance Nvidia Nemotron Llama 3.1 quantized at what speed ? ill get a GPU too but not sure how much VRAM I need for the best value for your buck

pajeets commented on Ask HN: I have 24 core server with 1TB of DDR4 RAM, what should I run?    · Posted by u/pajeets
brodouevencode · 10 months ago
Chrome with 12 tabs open
pajeets · 10 months ago
gonna need quantum computing once you breach in to the mid 20s
pajeets commented on Ask HN: I have 24 core server with 1TB of DDR4 RAM, what should I run?    · Posted by u/pajeets
evanjrowley · 10 months ago
Try llama.cpp with the biggest LLM you can find.
pajeets · 10 months ago
need a 3090 at least for that
pajeets commented on Ask HN: I have 24 core server with 1TB of DDR4 RAM, what should I run?    · Posted by u/pajeets
GiorgioG · 10 months ago
Dokku
pajeets · 10 months ago
so 4 vCPU per customer + 40gigs of RAM ?

u/pajeets

KarmaCake day188August 16, 2024
About
work in law enforcement and enjoy tech, coding, history
View Original