abi (u/abi) - Readit News

abi commented on Ask HN: Founders who offer free/OS and paid SaaS, how do you manage your code? · Posted by u/neya

echoangle · 2 years ago

This but unironically

abi · 2 years ago

Exactly.

abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU github.com/abi/secret-lla... · Posted by u/abi

raylad · 2 years ago

After the model is supposedly fully downloaded (about 4GB) I get:

Could not load the model because Error: ArtifactIndexedDBCache failed to fetch: https://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-ML...

Also on Mistral 7B again after supposedly full download:

Could not load the model because Error: ArtifactIndexedDBCache failed to fetch: https://huggingface.co/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16...

Maybe memory? But if so it would be good to say so.I'm on a 32GB system btw.

abi · 2 years ago

I’ve experienced that issue as well. Clearing the cache and redownloading seemed to fix it for me. It’s an issue with the upstream library tvmjs that I need to dig deeper into. You should be totally fine on a 32gb system.

abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU github.com/abi/secret-lla... · Posted by u/abi

KennyBlanken · 2 years ago

Given user data is folded back into the models, there is a snowball's chance in hell that I would input stuff I'd talk to a therapist about.

When are people going to realize that their interactions with AIs are likely being analyzed/characterized, and that at some point, that analysis will be monetized?

abi · 2 years ago

Use secret llama in a incognito window. Turn off the Internet and close the window when done.

abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU github.com/abi/secret-lla... · Posted by u/abi

joshstrange · 2 years ago

Very cool! I wish there was chat history.

Also if you click the "New Chat" button while an answer is generating I think some of the output gets fed back into the model, it causes some weird output [0] but was kind of cool/fun. Here is a video of it as well [1], I almost think this should be some kind of special mode you can run. I'd be interested to know what the bug causes, is it just the existing output sent as input or a subset of it? It might be fun to watch a chat bot just randomly hallucinate, especially on a local model.

[0] https://cs.joshstrange.com/07kPLPPW

[1] https://cs.joshstrange.com/4sxvt1Mc

EDIT: Looks like calling `engine.resetChat()` while it's generating will do it, but I'm not sure why it errors after a while (maybe runs out of tokens for output? Not sure) but it would be cool to have this run until you stop it, automatically changing every 10-30 seconds or so.

abi · 2 years ago

Thanks for the bug report. Yeah, it’s a bug with not resetting the state properly when new chat is clicked. Will fix tomorrow.

Chat history shouldn’t be hard to add with local storage and Indexed DB.

abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU github.com/abi/secret-lla... · Posted by u/abi

low_tech_punk · 2 years ago

It's a wrapper of https://github.com/mlc-ai/web-llm

abi · 2 years ago

Yes. Web-llm is a wrapper of tvmjs: https://github.com/apache/tvm

Just wrappers all the way down

abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU github.com/abi/secret-lla... · Posted by u/abi

NikhilVerma · 2 years ago

This is absolutely wonderful, I am a HUGE fan of local first apps. Running models locally is such a powerful thing I wish more companies could leverage it to build smarter apps which can run offline.

I tried this on my M1 and ran LLama3, I think it's the quantized 7B version. It ran with around 4-5 tokens per second which was way faster than I expected on my browser.

abi · 2 years ago

Appreciate the kind words :)

abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU github.com/abi/secret-lla... · Posted by u/abi

_akhe · 2 years ago

Amazing work, feels like a step forward for LLM usability.

Would be interesting if there was a web browser that managed the download/install of models so you could go to a site like this, or any other LLM site/app and it detects whether or not you have models, similar to detecting if you have a webcam or mic for a video call. The user can click "Allow" to allow use of GPU and allow running of models in the background.

abi · 2 years ago

Window AI (https://windowai.io/) is an attempt to do something like this with a browser extension.

abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU github.com/abi/secret-lla... · Posted by u/abi

andrewfromx · 2 years ago

i asked it "what happens if you are bit by a radio active spider?" and it told me all about radiation poisoning. Then I asked a follow up question: "would you become spiderman?" and it told me it was unable to become anything but an AI assistant. I also asked if time machines are real and how to build one. It said yes and told me! (Duh, you use a flux capacitor, basic physics.)

abi · 2 years ago

Try to switch models to something other than tinyllama (default only because it’s the fastest to load). Mistral and Llama 3 are great.

abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU github.com/abi/secret-lla... · Posted by u/abi

bastawhiz · 2 years ago

My pixel 6 was able to run tinyllama and answer questions with alarming accuracy. I'm honestly blown away.

abi · 2 years ago

This is amazing. Thanks both for sharing your stories. Made my day.

abi commented on Show HN: I built a free in-browser Llama 3 chatbot powered by WebGPU github.com/abi/secret-lla... · Posted by u/abi

dannyw · 2 years ago

This need wasn’t super prevalent in the pre LLM days. It’s rare to have a multi-GB blob that should be commonly used across sites.

abi · 2 years ago

Well, it should be possible to just drag and drop a file/folder