pajeets (u/pajeets) - Readit News

Dead Comment

Deleted Comment

Dead Comment

kkielhofner · 10 months ago

llama.cpp and others can run purely on CPU[0]. Even production grade serving frameworks like vLLM[1].

There are a variety of other LLM inference implementations that can run on CPU as well.

pajeets · 10 months ago

wait this is crazy

what model can i run on 1TB and how many tokens per second ?

for instance Nvidia Nemotron Llama 3.1 quantized at what speed ? ill get a GPU too but not sure how much VRAM I need for the best value for your buck

brodouevencode · 10 months ago

Chrome with 12 tabs open

pajeets · 10 months ago

gonna need quantum computing once you breach in to the mid 20s

evanjrowley · 10 months ago

Try llama.cpp with the biggest LLM you can find.

pajeets · 10 months ago

need a 3090 at least for that

GiorgioG · 10 months ago

Dokku

pajeets · 10 months ago

so 4 vCPU per customer + 40gigs of RAM ?

u/pajeets

KarmaCake day188August 16, 2024

About

work in law enforcement and enjoy tech, coding, history