What does your hardware setup look like?
Not the user above, but I am using the iOS app PrivateLLM when I need offline access or use uncensored models. I use kappa-3-phi-abliterated, models under 6B usually work without crashing. Using Ollama on my Mac Mini 24GB base processor (M4 not M4 pro), I am able to run 7B models. On the mac I am able to set up API access.
Funny enough the mac has almost the same processor as my iPhone 16 Pro, so its just a RAM constraint, and of course PrivateLLM does not let you host an API.
Funny enough the mac has almost the same processor as my iPhone 16 Pro, so its just a RAM constraint, and of course PrivateLLM does not let you host an API.
An M4 Pro would do much better do to the increase in RAM and GPU size.