Could not load the model because Error: ArtifactIndexedDBCache failed to fetch: https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-ML...
Also on Mistral 7B again after supposedly full download:
Could not load the model because Error: ArtifactIndexedDBCache failed to fetch: https://huggingface.co/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16...
Maybe memory? But if so it would be good to say so.I'm on a 32GB system btw.
When are people going to realize that their interactions with AIs are likely being analyzed/characterized, and that at some point, that analysis will be monetized?
Also if you click the "New Chat" button while an answer is generating I think some of the output gets fed back into the model, it causes some weird output [0] but was kind of cool/fun. Here is a video of it as well [1], I almost think this should be some kind of special mode you can run. I'd be interested to know what the bug causes, is it just the existing output sent as input or a subset of it? It might be fun to watch a chat bot just randomly hallucinate, especially on a local model.
[0] https://cs.joshstrange.com/07kPLPPW
[1] https://cs.joshstrange.com/4sxvt1Mc
EDIT: Looks like calling `engine.resetChat()` while it's generating will do it, but I'm not sure why it errors after a while (maybe runs out of tokens for output? Not sure) but it would be cool to have this run until you stop it, automatically changing every 10-30 seconds or so.
Chat history shouldn’t be hard to add with local storage and Indexed DB.
Just wrappers all the way down
I tried this on my M1 and ran LLama3, I think it's the quantized 7B version. It ran with around 4-5 tokens per second which was way faster than I expected on my browser.
Would be interesting if there was a web browser that managed the download/install of models so you could go to a site like this, or any other LLM site/app and it detects whether or not you have models, similar to detecting if you have a webcam or mic for a video call. The user can click "Allow" to allow use of GPU and allow running of models in the background.