nkaz123 (u/nkaz123) - Readit News

nkaz123 commented on ChronJob: Automate tasks by having someone from the future come back and do it chron-job.vercel.app/... · Posted by u/nkaz123

nkaz123 · 5 months ago

Inspired by Bill and Ted’s Excellent Adventure (https://www.youtube.com/watch?v=GiynF8NQzgo), I created an alternative to AI agents where to do a task, simply provide a future worker with some details like description, time, and place. Then assuming time travel is invented, someone will come back and do it and be paid based on the interest accrued over the original payment.

nkaz123 commented on Show HN: Pi-C.A.R.D, a Raspberry Pi Voice Assistant github.com/nkasmanoff/pi-... · Posted by u/nkaz123

ghnws · a year ago

I didn't see a mention of languages in the readme. Does this understand languages other than english?

nkaz123 · a year ago

There are two models at use here, whisper tiny for transcribing audio, and then llama 3 for responding.

Whisper tiny is multi lingual (though I am using the english specific variant) and I believe llama 3 is technically capable of multi-lingual, but not sure of any benchmarks.

I think it could be made better, but for now focus is english. I'll add this to the readme though. Thanks!

nkaz123 commented on Show HN: Pi-C.A.R.D, a Raspberry Pi Voice Assistant github.com/nkasmanoff/pi-... · Posted by u/nkaz123

ethagnawl · a year ago

I'm looking forward to trying this. Hopefully this gains traction, as (AFAIK) an open, reliable, flexible, privacy focused voice assistant is still sorely needed.

About a year ago, my family was really keen on getting an Alexa. I don't want Bezos spy devices in our home, so I convinced them to let me try making our own. I went with Mycroft on a Pi 4 and it did not go well. The wake word detection was inconsistent, the integrations were lacking and I think it'd been effectively abandoned by that point. I'd intended to contribute to the project and some of the integrations I was struggling with but life intervened and I never got back to it. Also, thankfully, my family forgot about the Alexa.

nkaz123 · a year ago

I'm really inspired reading this and hope it can help! I'm planning to put more work into this. I have a few rough demos of it in action on youtube (https://www.youtube.com/watch?v=OryGVbh5JZE) which should give you an idea of the quality of it at the moment.

nkaz123 commented on Show HN: Pi-C.A.R.D, a Raspberry Pi Voice Assistant github.com/nkasmanoff/pi-... · Posted by u/nkaz123

squarefoot · a year ago

+1. That's the #1 feature I want in any "assistant".

A question: does it run only on the Pi5 or other (also non Raspberry Pi) boards?

nkaz123 · a year ago

I think so. I plan to update the README in the coming days with more info, but realistically this is something you could run on your laptop too, barring some changes to how it accesses the microphone and camera. I assume this is the case too for other boards.

The only thing which might pose an issue is the total RAM size needed for whatever LLM is responsible for responding to you, but there's a wide variety of those available on ollama, Hugging Face, etc. that can work.

nkaz123 commented on Show HN: Pi-C.A.R.D, a Raspberry Pi Voice Assistant github.com/nkasmanoff/pi-... · Posted by u/nkaz123

knodi123 · a year ago

me too, but I bricked mine when flashing the bios. just a fluke, nothing to be done about it.

nkaz123 · a year ago

I watched the demo, to be honest if I saw it sooner I probably would have tried to start this as a fork from there. Any idea what the issue was?

nkaz123 commented on Show HN: Pi-C.A.R.D, a Raspberry Pi Voice Assistant github.com/nkasmanoff/pi-... · Posted by u/nkaz123

dasl · a year ago

What latency do you get? I'd be interested in seeing a demo video.

nkaz123 · a year ago

Fully depends on the model, how much conversational context you provide, but if you keep things to a bare minimum, ~< 5 seconds from message received to starting the response using Llama 3 8B. I'm also using a vision language model, https://moondream.ai/, but that takes around 45 seconds so the next idea is to take a more basic image captioning model and insert it's output into context and try to cut that time down even more.

I also tried using Vulkan, which is supposedly faster, but the times were a bit slower than normal CPU for Llama CPP.

nkaz123 commented on Show HN: Pi-C.A.R.D, a Raspberry Pi Voice Assistant github.com/nkasmanoff/pi-... · Posted by u/nkaz123

orthecreedence · a year ago

Just configure it to respond to "Computer" and you're good to go.

nkaz123 · a year ago

The wake word detection is an interesting problem here. As you can see in the repo, I have a lot of mis-heard versions of the wake word in place, in this case being "Raspberry". Since the system heats up fast you need a fan, and with the microphone directly on a USB port next to the fan, I needed something distinct, and computer wasn't cutting it for this.

Changing the transcription model to something a bit better or moving the mic away from the fan could help this happen.

nkaz123 commented on Show HN: Pi-C.A.R.D, a Raspberry Pi Voice Assistant github.com/nkasmanoff/pi-... · Posted by u/nkaz123

harwoodr · a year ago

I see that a speaker is in the hardware list - does this speak back?

nkaz123 · a year ago

Yes! I'm currently using https://espeak.sourceforge.net/, so it isn't especially fun to listen to though.

Additionally, since I'm streaming the LLM response, it won't take long to get your reply. Since it does it a chunk at a time, there's occasionally only parts of words that are said momentarily. Also of course depends on what model you use or what the context size is for how long you need to wait.