Readit News logoReadit News
abi commented on Ask HN: Founders who offer free/OS and paid SaaS, how do you manage your code?    · Posted by u/neya
echoangle · 2 years ago
This but unironically
abi · 2 years ago
Exactly.
abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU   github.com/abi/secret-lla... · Posted by u/abi
raylad · 2 years ago
After the model is supposedly fully downloaded (about 4GB) I get:

Could not load the model because Error: ArtifactIndexedDBCache failed to fetch: https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-ML...

Also on Mistral 7B again after supposedly full download:

Could not load the model because Error: ArtifactIndexedDBCache failed to fetch: https://huggingface.co/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16...

Maybe memory? But if so it would be good to say so.I'm on a 32GB system btw.

abi · 2 years ago
I’ve experienced that issue as well. Clearing the cache and redownloading seemed to fix it for me. It’s an issue with the upstream library tvmjs that I need to dig deeper into. You should be totally fine on a 32gb system.
abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU   github.com/abi/secret-lla... · Posted by u/abi
KennyBlanken · 2 years ago
Given user data is folded back into the models, there is a snowball's chance in hell that I would input stuff I'd talk to a therapist about.

When are people going to realize that their interactions with AIs are likely being analyzed/characterized, and that at some point, that analysis will be monetized?

abi · 2 years ago
Use secret llama in a incognito window. Turn off the Internet and close the window when done.
abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU   github.com/abi/secret-lla... · Posted by u/abi
joshstrange · 2 years ago
Very cool! I wish there was chat history.

Also if you click the "New Chat" button while an answer is generating I think some of the output gets fed back into the model, it causes some weird output [0] but was kind of cool/fun. Here is a video of it as well [1], I almost think this should be some kind of special mode you can run. I'd be interested to know what the bug causes, is it just the existing output sent as input or a subset of it? It might be fun to watch a chat bot just randomly hallucinate, especially on a local model.

[0] https://cs.joshstrange.com/07kPLPPW

[1] https://cs.joshstrange.com/4sxvt1Mc

EDIT: Looks like calling `engine.resetChat()` while it's generating will do it, but I'm not sure why it errors after a while (maybe runs out of tokens for output? Not sure) but it would be cool to have this run until you stop it, automatically changing every 10-30 seconds or so.

abi · 2 years ago
Thanks for the bug report. Yeah, it’s a bug with not resetting the state properly when new chat is clicked. Will fix tomorrow.

Chat history shouldn’t be hard to add with local storage and Indexed DB.

abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU   github.com/abi/secret-lla... · Posted by u/abi
low_tech_punk · 2 years ago
abi · 2 years ago
Yes. Web-llm is a wrapper of tvmjs: https://github.com/apache/tvm

Just wrappers all the way down

abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU   github.com/abi/secret-lla... · Posted by u/abi
NikhilVerma · 2 years ago
This is absolutely wonderful, I am a HUGE fan of local first apps. Running models locally is such a powerful thing I wish more companies could leverage it to build smarter apps which can run offline.

I tried this on my M1 and ran LLama3, I think it's the quantized 7B version. It ran with around 4-5 tokens per second which was way faster than I expected on my browser.

abi · 2 years ago
Appreciate the kind words :)
abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU   github.com/abi/secret-lla... · Posted by u/abi
_akhe · 2 years ago
Amazing work, feels like a step forward for LLM usability.

Would be interesting if there was a web browser that managed the download/install of models so you could go to a site like this, or any other LLM site/app and it detects whether or not you have models, similar to detecting if you have a webcam or mic for a video call. The user can click "Allow" to allow use of GPU and allow running of models in the background.

abi · 2 years ago
Window AI (https://windowai.io/) is an attempt to do something like this with a browser extension.
abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU   github.com/abi/secret-lla... · Posted by u/abi
andrewfromx · 2 years ago
i asked it "what happens if you are bit by a radio active spider?" and it told me all about radiation poisoning. Then I asked a follow up question: "would you become spiderman?" and it told me it was unable to become anything but an AI assistant. I also asked if time machines are real and how to build one. It said yes and told me! (Duh, you use a flux capacitor, basic physics.)
abi · 2 years ago
Try to switch models to something other than tinyllama (default only because it’s the fastest to load). Mistral and Llama 3 are great.
abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU   github.com/abi/secret-lla... · Posted by u/abi
bastawhiz · 2 years ago
My pixel 6 was able to run tinyllama and answer questions with alarming accuracy. I'm honestly blown away.
abi · 2 years ago
This is amazing. Thanks both for sharing your stories. Made my day.
abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU   github.com/abi/secret-lla... · Posted by u/abi
dannyw · 2 years ago
This need wasn’t super prevalent in the pre LLM days. It’s rare to have a multi-GB blob that should be commonly used across sites.
abi · 2 years ago
Well, it should be possible to just drag and drop a file/folder

u/abi

KarmaCake day1118March 1, 2009
About
@_abi_ abimanyuraja@gmail.com
View Original