Transcribro: On-device Accurate Speech-to-text

Documentation severely lacking. I wanted to know whether this does streaming or only batch, as well as examples for integrating with Android apps.

soupslurpr · 2 years ago

It uses VAD and processes after it detects no speech for 3 seconds, so only batch. Examples for integrating with Android apps? Like apps that can use it? Pretty much any app that uses Android's SpeechRecognizer class if you set Transcribro as the user-selected speech recognizer or if the app uses Transcribro explicitly. For example, Google Maps uses the user-selected speech recognizer when it doesn't detect Google's speech services on the system.

pants2 · 2 years ago

Considering it uses Whisper, it's probably not streaming

refulgentis · 2 years ago

I did some core work on TTS at Google, at several layers, and I've never quite understood what people mean by streaming vs. not.

In each and every case I'm familiar with, streaming means "send the whole audio thus far to the inference engine, inference it, and send back the transcription"

I have a Flutter library that does the same flow as this (though via ONNX, so I can cover all platforms), and Whisper + Silero is ~identical to the interfaces I used at Google.

If the idea is streaming is when each audio byte is only sent once to the server, there's still an audio buffer accumulated -- its just on the server.

Not sure what I'm doing wrong, but I tried installing it on a GrapheneOS device with Play Services installed and nothing happened. When I pushed the mic button, it changed to look pressed for a second, and went back to normal. Nothing happened when I spoke. Tried holding it down while speaking. Still nothing.

I'm very interested in using this, but I can't even find a way to try to troubleshoot it. I'm not finding usage instructions, never mind any kind of error messages. It just doesn't do anything.

This is especially interesting to me because the screenshot on the repo is from Vanadium, which strongly suggests to me that it's from a GrapheneOS device itself.

soupslurpr · 2 years ago

You're correct I do use GrapheneOS. Hm do you have the global microphone toggle off? There's an upstream issue that causes SpeechRecognizer implementations to silently fail when the microphone toggle is off. You may have to force-stop Transcribro after turning it on.

https://github.com/soupslurpr/Transcribro/issues/3

smeej · 2 years ago

I didn't think I did, but cycling it a couple times and restarting did fix! Great guess!

The thing I'm tripping over now is just that I keep pressing the button more than once when I'm done speaking because it's not clear that it registered the first time. If it could even just stay "pressed" or something while it processes the text, I think that would make it clearer. Any third state for the button would do I think.

Looking forward to using this! Thanks!

james2doyle · 2 years ago

Looks similar to the new FUTO keyboard: https://voiceinput.futo.org/

iamjackg · 2 years ago

I've been using this for a while (the voice input, not their keyboard) and it's so refreshing to be able to just speak and have the output come out as fully formed, well punctuated sentences with proper capitalization.

I agree. No more "speaking punctuation". Just talk as normal and it comes out fully formed

leobg · 2 years ago

Anything like that available for iOS?

crazygringo · 2 years ago

iOS already has on-device dictation built into the standard keyboard.

Years ago it got sent to the cloud, but as long as you have an iPhone from the past few years it's on-device.

brylie · 2 years ago

Aiko, mentioned elsewhere, includes a local copy of the OpenAI Whisper model: https://apps.apple.com/app/aiko/id1672085276

b33f · 2 years ago

Aiko is a free app for iOS and macOS that also uses whisper for local TTS

gala8y · 2 years ago

There is also Sayboard (open-source, multiple languages): https://github.com/ElishaAz/Sayboard

kolme · 2 years ago

This looks great! I've been wanting to drop the Swipe keyboard ever since I saw sneaky ads on it (like me typing "Google Maps" and getting "Bing Maps" as a "suggestion").

yjftsjthsd-h · 2 years ago

But open source, which is a pretty big difference

grandma_tea · 2 years ago

FUTO and Transcribro are open source.

flax · 2 years ago

yewenjie · 2 years ago

Seems like Gboard is incompatible with it. Is there a good enough open source alternative to Gboard in 2024 that has smooth glide-typing and a similar layout?

SparkyMcUnicorn · 2 years ago

Any of these should work.

https://github.com/Helium314/HeliBoard

https://github.com/openboard-team/openboard

https://github.com/rkkr/simple-keyboard (guessing, since AOSP Keyboard works and this is a fork)

Not open source: https://www.microsoft.com/en-us/swiftkey

Does not have glide/swipe (reserved for symbols), but I just installed and giving it a shot: https://github.com/Julow/Unexpected-Keyboard

Grimblewald · 2 years ago

Unexpected keyboard is unexpectedly awesome. Looks a bit dated, but boy does it have some functionality packed into it.

nine_k · 2 years ago

My choice is https://github.com/AnySoftKeyboard/AnySoftKeyboard/

It does have glide typing, even.though I don't use it.

It rather uses long-tap to access multiple symbols, and can be split or pushed to a corner on devices with a big screen.

lawgimenez · 2 years ago

This is cool, I get to read another Jetpack Compose codebase since I am halfway through migrating our app to Jetpack. So this helps a lot.

tmaly · 2 years ago

I wish there was something where I could transcribe iPhone voice memos to text.

I would pay for an app that did this.

cee_el123 · 2 years ago

Google has an app called live transcribe on Android but there's no iPhone version

This is an unaffiliated version looks like https://apps.apple.com/us/app/live-transcribe/id1471473738

hidelooktropic · 2 years ago

The microphone icon on the keyboard does this.

swyx · 2 years ago

is there an iPhone version of this? custom keyboard?

crancher · 2 years ago

Accrescent hype is comically overdone.

free_bip · 2 years ago

I looked in the GitHub issues and there's a closed issue for F-droid inclusion. The author states that F-droid "Doesn't meet their requirements" but doesn't elaborate. I wonder what F-droid is missing that they need so much?

okso · 2 years ago

F-Droid only packages open-source software and rebuilds it from source, while installing from Accrescent would move all trust to the developer, even if the license changes to proprietary.

I understand that the author trusts itself more than F-Droid, but as a user the opposite seems more relevant.

ementally · 2 years ago

Reason https://www.privacyguides.org/en/android/#f-droid

Link: https://github.com/soupslurpr/Transcribro/issues/9

mijoharas · 2 years ago

I only just saw it from this project.

I see the features listed[0] which seems like a reasonable feature set, but nothing unusual afaict.

If there has been a lot of hype can you tell me what people find compelling about it?

[0] https://accrescent.app/

Deleted Comment