Speaking without vocal cords, thanks to a new AI-assisted wearable device

This is a super cool device. Note that the decoding is highly limited: they decode into one of five different sentences. This is easier than five words for example as there is more information to distinguish.

Unfortunately the media is blowing this way of out proportion as the larynx alone does not contain sufficient information to decode silent speech.

If you also sense the lips, tongue articulators, and jaw, then general English decoding becomes possible with high accuracy (eg see our recent work here: https://x.com/tbenst/status/1767952614157848859). It’s not in the preprint but I’ve done experiments with only the larynx recorded and performance is pretty abysmal on even a 10 word vocabulary—-hence why they did a five sentence task.

ImHereToVote · 2 years ago

I bet if you listened to the feedback you could teach yourself to talk using the larynx and surrounding muscles.

jvanderbot · 2 years ago

Why can't the muscles of the larnex and perhaps chest / diaphragm, be monitored and mapped to vocal chord noises, rather than full speech? Just put the noise in the throat and let the rest of the body make it work.

irviss · 2 years ago

> If you also sense the lips, tongue articulators, and jaw, then general English decoding becomes possible with high accuracy

A bit OT but I see this frequently and I'm curious. Why do you English speakers (or just a US phenomenon?) tend to use the word "English" instead of "language", "linguistic" or one of its related words to refer to a general concept?

x1798DE · 2 years ago

Not OP, but as a native English speaker and former scientist (though not in this area), I would interpret "x does y on English tasks" to mean "we tested this in English and don't know if the effect generalizes to other languages".

roenxi · 2 years ago

I'd speculate English speakers are used to being part of a society where non-English speakers are present and politically important. It is polite not to assume that English = language. Even on the British Isles English isn't a universal thing. Let alone somewhere like America where it isn't even native.

"Language" just doesn't mean "English". In Australia if someone is talking about "language" on its own I'd assume they're Aboriginal advocates.

khazhoux · 2 years ago

This is your misperception.

In the instances where a person says "English" in this kind of context, it catches your attention and you infer that the person is an English-speaker, and possibly American.

But when a person uses the generic word "language", you don't notice it.

This leads you to believe that English speakers "tend to use the word English," when that's not the case necessarily.

I don't know what this perceptual fallacy is called, but there's probably a word. In English :-)

atopal · 2 years ago

There are about 6000 spoken languages around the world with an extreme variety in how they produce meaning. How could you make sweeping statements about all of them?

johnisgood · 2 years ago

I have not noticed this. I just assume that they are specifically talking about a language, in this case: English.

i'm really excited to see progress being made in this space. subvocal speech recognition seems to be an underfunded area of research.

my sense is that it has the potential to make hands free interaction with our devices in public spaces less obnoxious and, consequently, more socially acceptable.

however, i notice that the article doesn't mention anything about dictionary size, which is a very important consideration for a tool of this kind.

thorum · 2 years ago

> The research team demonstrated the system’s accuracy by having the participants pronounce five sentences — both aloud and voicelessly — including “Hi, Rachel, how are you doing today?” and “I love you!” (…) Going forward, the research team plans to continue enlarging the vocabulary of the device through machine learning and to test it in people with speech disorders.

It's a proof of concept at this stage but very cool.

dontreact · 2 years ago

While subvocal is cool and would allow for speech in more places, something that’s earlier on the tech tree and that I would like to see is just robust lipreading.

I already am comfortable talking to my phone quietly using my AirPods while looking at my screen, but it seems like in loud public places the accuracy becomes unusable. I imagine it could be easily recovered by the additional signal of lipreading.

graphe · 2 years ago

https://en.m.wikipedia.org/wiki/Basic_english this communicates English efficiently with 850 words. I don't think it's basic English is any good but I can see them making simplified English the lingua franca to boost 'literacy rates' in the future.

tbenst · 2 years ago

croemer · 2 years ago

"Speaking" is a hyperbole. It allows you to say exactly 5 phrases with 95% accuracy, after repeating each sentence 100 times. In other words, it's totally useless. The sentences are so different that they can be distinguished almost entirely by length. I'm very surprised anyone thinks this is useful.

Excerpt: "A brief demonstration was made with five sentences that we had selected for training the algorithm (S1: “Hi Rachel, how you are doing today?”, S2: “Hope your experiments are going well!”, S3: “Merry Christmas!”, S4: “I love you!”, S5: “I don’t trust you.”). Each participant repeated each sentence 100 times for data collection."

I never read a press release from a university, it's always exaggerated.

Original study: https://www.nature.com/articles/s41467-024-45915-7

light_hue_1 · 2 years ago

Exactly. They took a neat device and wrapped it in a BS story that's wildly unscientific.

Nature and Science don't mind if people outright lie about what their research means as long as it gets hits. The paper is pretty much just as bad.

This is how mistrust for science slowly builds up when people publish obvious falsehoods.

zharknado · 2 years ago

Very cool! This is an insanely impressive sensor, but the proposed application is still in dream phase.

> Going forward, the research team plans to continue enlarging the vocabulary of the device through machine learning and to test it in people with speech disorders.

They haven’t tried giving it to a person with a voice disorder. So it just might not work in that application at all. That will likely depend on the degree to which laryngeal muscles are implicated in a given person’s disorder.

That’s certainly a valid starting place for research purposes, but it’s very early days.

And I imagine you’ll need some very interesting cabling attached to a somewhat beefy device to actually run live inference from this data, plus to drive the speech synthesis.

khimaros · 2 years ago

anonylizard · 2 years ago

This seems only useful to people who once had a voice, then lost their voice. Because only this way, would they have a unified mapping of voice cord movements to actual voices. Deaf and mutes can't really use this.

It also basically mandates a patch to your throat, because no way of detecting vibrations otherwise.

I wonder if there are visual based ways, like sign->text, expression->text, that would benefit from the larger developments in LLMs. Like an LLM that has access to your conversation history, so when you give your smartphone camera a hand sign and a smile, it can guess and output an entire intended speech.

feverzsj · 2 years ago

How it compares to electrolarynx, which give you robotic sound.

joshspankit · 2 years ago

Is it just me, or does anyone else think it would be amazing to use upcoming voice assistants with something like this letting you “talk” silently?

I'll take the whole lot.