Readit News logoReadit News
CharlieRuan commented on Local LLM inference – impressive but too hard to work with   medium.com/@aazo11/local-... · Posted by u/aazo11
ijk · 8 months ago
There's two general categories of local inference:

- You're running a personal hosted instance. Good for experimentation and personal use; though there's a tradeoff on renting a cloud server.

- You want to run LLM inference on client machines (i.e., you aren't directly supervising it while it is running).

I'd say that the article is mostly talking about the second one. Doing the first one will get you familiar enough with the ecosystem to handle some of the issues he ran into when attempting the second (e.g., exactly which model to use). But the second has a bunch of unique constraints--you want things to just work for your users, after all.

I've done in-browser neural network stuff in the past (back when using TensorFlow.js was a reasonable default choice) and based on the way LLM trends are going I'd guess that edge device LLM will be relatively reasonable soon; I'm not quite sure that I'd deploy it in production this month but ask me again in a few.

Relatively tightly constrained applications are going to benefit more than general-purpose chatbots; pick a small model that's relatively good at your task and train it on enough of your data and you can get a 1B or 3B model that has acceptable performance, let alone the 7B ones being discussed here. It absolutely won't replace ChatGPT (though we're getting closer to replacing ChatGPT 3.5 with small models). But if you've got a specific use case that will hold still enough to deploy a model it can definitely give you the edge versus relying on the APIs.

I expect games to be one of the first to try this: per-player-action API costs murder per-user revenue, most of the gaming devices have some form of GPU already, and most games are shipped as apps so bundling a few more GB in there is, if not reasonable, at least not unprecedented.

CharlieRuan · 8 months ago
Curious what are some examples of "per-player-action API costs" for games?
CharlieRuan commented on Universal LLM Deployment Engine with ML Compilation   blog.mlc.ai/2024/06/07/un... · Posted by u/ruihangl
CharlieRuan · 2 years ago
From first-hand experience, the all-in-one framework really helps reduce engineering effort!

u/CharlieRuan

KarmaCake day3July 24, 2023View Original