I've been using this for a while (the voice input, not their keyboard) and it's so refreshing to be able to just speak and have the output come out as fully formed, well punctuated sentences with proper capitalization.
This looks great! I've been wanting to drop the Swipe keyboard ever since I saw sneaky ads on it (like me typing "Google Maps" and getting "Bing Maps" as a "suggestion").
It uses VAD and processes after it detects no speech for 3 seconds, so only batch. Examples for integrating with Android apps? Like apps that can use it? Pretty much any app that uses Android's SpeechRecognizer class if you set Transcribro as the user-selected speech recognizer or if the app uses Transcribro explicitly. For example, Google Maps uses the user-selected speech recognizer when it doesn't detect Google's speech services on the system.
I did some core work on TTS at Google, at several layers, and I've never quite understood what people mean by streaming vs. not.
In each and every case I'm familiar with, streaming means "send the whole audio thus far to the inference engine, inference it, and send back the transcription"
I have a Flutter library that does the same flow as this (though via ONNX, so I can cover all platforms), and Whisper + Silero is ~identical to the interfaces I used at Google.
If the idea is streaming is when each audio byte is only sent once to the server, there's still an audio buffer accumulated -- its just on the server.
Seems like Gboard is incompatible with it. Is there a good enough open source alternative to Gboard in 2024 that has smooth glide-typing and a similar layout?
Not sure what I'm doing wrong, but I tried installing it on a GrapheneOS device with Play Services installed and nothing happened. When I pushed the mic button, it changed to look pressed for a second, and went back to normal. Nothing happened when I spoke. Tried holding it down while speaking. Still nothing.
I'm very interested in using this, but I can't even find a way to try to troubleshoot it. I'm not finding usage instructions, never mind any kind of error messages. It just doesn't do anything.
This is especially interesting to me because the screenshot on the repo is from Vanadium, which strongly suggests to me that it's from a GrapheneOS device itself.
You're correct I do use GrapheneOS. Hm do you have the global microphone toggle off? There's an upstream issue that causes SpeechRecognizer implementations to silently fail when the microphone toggle is off. You may have to force-stop Transcribro after turning it on.
I didn't think I did, but cycling it a couple times and restarting did fix! Great guess!
The thing I'm tripping over now is just that I keep pressing the button more than once when I'm done speaking because it's not clear that it registered the first time. If it could even just stay "pressed" or something while it processes the text, I think that would make it clearer. Any third state for the button would do I think.
I looked in the GitHub issues and there's a closed issue for F-droid inclusion. The author states that F-droid "Doesn't meet their requirements" but doesn't elaborate. I wonder what F-droid is missing that they need so much?
F-Droid only packages open-source software and rebuilds it from source, while installing from Accrescent would move all trust to the developer, even if the license changes to proprietary.
I understand that the author trusts itself more than F-Droid, but as a user the opposite seems more relevant.
Years ago it got sent to the cloud, but as long as you have an iPhone from the past few years it's on-device.
In each and every case I'm familiar with, streaming means "send the whole audio thus far to the inference engine, inference it, and send back the transcription"
I have a Flutter library that does the same flow as this (though via ONNX, so I can cover all platforms), and Whisper + Silero is ~identical to the interfaces I used at Google.
If the idea is streaming is when each audio byte is only sent once to the server, there's still an audio buffer accumulated -- its just on the server.
https://github.com/Helium314/HeliBoard
https://github.com/openboard-team/openboard
https://github.com/rkkr/simple-keyboard (guessing, since AOSP Keyboard works and this is a fork)
Not open source: https://www.microsoft.com/en-us/swiftkey
Does not have glide/swipe (reserved for symbols), but I just installed and giving it a shot: https://github.com/Julow/Unexpected-Keyboard
It does have glide typing, even.though I don't use it.
It rather uses long-tap to access multiple symbols, and can be split or pushed to a corner on devices with a big screen.
I'm very interested in using this, but I can't even find a way to try to troubleshoot it. I'm not finding usage instructions, never mind any kind of error messages. It just doesn't do anything.
This is especially interesting to me because the screenshot on the repo is from Vanadium, which strongly suggests to me that it's from a GrapheneOS device itself.
https://github.com/soupslurpr/Transcribro/issues/3
The thing I'm tripping over now is just that I keep pressing the button more than once when I'm done speaking because it's not clear that it registered the first time. If it could even just stay "pressed" or something while it processes the text, I think that would make it clearer. Any third state for the button would do I think.
Looking forward to using this! Thanks!
I would pay for an app that did this.
This is an unaffiliated version looks like https://apps.apple.com/us/app/live-transcribe/id1471473738
I understand that the author trusts itself more than F-Droid, but as a user the opposite seems more relevant.
I see the features listed[0] which seems like a reasonable feature set, but nothing unusual afaict.
If there has been a lot of hype can you tell me what people find compelling about it?
[0] https://accrescent.app/
Deleted Comment