On key press, start recording microphone to /tmp/dictate.mp3:
# Save up to 10 mins. Minimize buffering. Save pid
ffmpeg -f pulse -i default -ar 16000 -ac 1 -t 600 -y -c:a libmp3lame -q:a 2 -flush_packets 1 -avioflags direct -loglevel quiet /tmp/dictate.mp3 &
echo $! > /tmp/dictate.pid
On key release, stop recording, transcribe with whisper.cpp, trim whitespace and print to stdout: # Stop recording
kill $(cat /tmp/dictate.pid)
# Transcribe
whisper-cli --language en --model $HOME/.local/share/whisper/ggml-large-v3-turbo-q8_0.bin --no-prints --no-timestamps /tmp/dictate.mp3 | tr -d '\n' | sed 's/^[[:space:]]*//;s/[[:space:]]*$//'
I keep these in a dictate.sh script and bind to press/release on a single key. A programmable keyboard helps here. I use https://git.sr.ht/%7Egeb/dotool to turn the transcription into keystrokes. I've also tried ydotool and wtype, but they seem to swallow keystrokes. bindsym XF86Launch5 exec dictate.sh start
bindsym --release XF86Launch5 exec echo "type $(dictate.sh stop)" | dotoolc
This gives a very functional push-to-talk setup.I'm very impressed with https://github.com/ggml-org/whisper.cpp. Transcription quality with large-v3-turbo-q8_0 is excellent IMO and a Vulkan build is very fast on my 6600XT. It takes about 1s for an average sentence to appear after I release the hotkey.
I'm keeping an eye on the NVidia models, hopefully they work on ggml soon too. E.g. https://github.com/ggml-org/whisper.cpp/issues/3118.
Whisper on windows, the openai-whisper, doesn't have these q8_0 models, it has like 8 models, and i always get an error about triton cores (something about timeptamping i guess), which windows doesn't have. I've transcribed >1000 hours of audio with this setup, so i'm used to the workflow.
But to anyone complaining, I want to know, when was the last you pulled out a profiler? When was the last time you saw anyone use a profiler?
People asking for performance aren't pissed you didn't write Microsoft Word in assembly we're pissed it takes 10 seconds to open a fucking text editor.
I literally timed it on my M2 Air. 8s to open and another 1s to get a blank document. Meanwhile it took (neo)vim 0.1s and it's so fast I can't click my stopwatch fast enough to properly time it. And I'm not going to bother checking because the race isn't even close.
I'm (we're) not pissed that the code isn't optional, I'm pissed because it's slower than dialup. So take that Knuth quote you love about optimization and do what he actually suggested. Grab a fucking profiler, it is more important than your Big O
i've managed to track down powershell in windows terminal taking forever to fully launch down to "100% gpu usage in the powershell process", but i'd really like to know what it thinks it's doing.
also: 4 seconds to blank document in Word. the splash screen is most of that. notepad++ ~2 seconds. notepad.exe slightly over 1 second before the text content loads. Edge: 2 seconds to page render from startup. Oh and powershell loads instantly if i switch to the "old" terminal, but the old terminal doesn't have tabs, so that's a non-starter. "forever" above means 45-60 seconds.
It would be philanthropy (not-for-profit), but i imagine if you get a few big names behind it (some around here, some not), you could probably have a fair group of developers, translators, UI experts, API experts, hardware bug experts, etc. Like a Think Tank that outputs bugfixes and incremental improvements to software. The sort that reduce electricity usage - you could get some government funding out of that, i bet. How much did Britain pay for those "delete old emails to save water" marketeering?
I guess the downside is meta, google, microsoft, et al will benefit the most, but whatever.
Also someone the other day mentioned something like this, where users could subscribe for $50 to pursue legal avenues - sorry i don't have the link, but:
> What we actually need is a Consumer Protection Alliance that is made by and funded by people who want protection from this and are willing to pay for the lawyers needed to run all of the cases and bring these cases before a judge over and over and over again until they win.
> This would mean people like you and me and a million others of us paying $20-$50/month out of pocket to hire people to sue companies that do this [...]
Investors want otherwise.
The software that the synology uses on the backend is open source, so you could set this all up with proxmox or a debian server or something, too.
you need to ensure your cameras support either direct access inside the network, or onvif or something like that. IDK, i don't use it anymore, but i did for a good long while, with wifi and wired IP cameras. My synology had a "license" for 12 cameras, but lightning took it out (something about a bunch of ethernet cables in trees), and my new synology doesn't have enough licenses to bother with.
anyhow, just thought you should know - "software" "NVR" is available, and has been for over a decade.
This advice surprises me. With one foot in the classical music world when I was younger, there are absolutely music skills that take many years if not decades to get to 90% on. And those that have put the work in are absolutely and obviously competent.
Similarly, when I'm working with someone who started off as a machinist, then a designer, then went to school and became an engineer, I find it baffling to think that I can absorb 90% of their knowledge in 6 weeks.
i've written and recorded about a dozen hours worth of music in my life and i assuredly did not go to school for it. The quote is about education, not practice. It also mentions "half-assed job" which is what you get in "six weeks" of work.
In a hypothetical "master dump", a mix of all the dumps ever leaked, you'd expect dozens if not more entries for every "real person" out there. Think about how many people had a yahoo account, then how many had several yahoo accounts, and then multiply it with hundreds of leaks out there. I can see the number getting into billions easily, just because of how many accounts people have on many platforms that got hacked in the past ~20 years.
Sure, 99% of those won't be active accounts anymore, but the passwords used serve as a signal, at least for "what kinds of passwords do people use". There's lots to be learned about wordwordnumber wordnumbernumber, and so on.
i had a plan to do statistical studies of some password dumps to try and make a "compressed password list" that could generate password guesses on the fly, and i forgot why i didn't do it, but i'm sure it's because the "model" - the statistical dataset upon which the program would generate output, wouldn't really be that much smaller; at least not with my poor maths skills.
I'm assuming that someone who really knew what they were doing could get close to 20% - 15% of the full password list. I doubt i could do better than just compressing the dataset and extracting it on the fly.
I have one and have found it to be quite easy to hunt down ham repeaters that you can get to transmit more or less non-stop... but relatively hard to use for intermittent transmitters.
I need to see if I can figure out how to plub in my GNSS compass output because inferring orientation from motion requires an awful lot of moving around and is less reliable than I'd like.
also the "kraken" may be $700, but there was kerberosdr/hydrasdr which was much cheaper. Furthermore, trunking is usually done within the bandwidth of a typical SDR so it doesn't really obfuscate it as much as one would think. Also i bought one; not to find repeaters, but to find trolls who were using repeaters. I'd monitor the input frequency to the repeater, apparently the same as mitnick would.
there were trunking scanners in the late 90s / early 2000s, as well. my neighbor had one.
Would you be bothered if someone was following you all day, making recordings every time you didn't signal for exactly 5000ms before a lane change, or 300' before a turn? how about not stopping for a full three seconds, or driving an extra 100' in the left lane than necessary?
three felonies a day