OpenVoice: Instant Voice Cloning

We're really in an era were laws and their enforcement will have a lot of catching up to do very fast.

Fake historical proof, fake leaks, fake endorsments, fake ads ... People couldn't be bothered to double check when it was mere random text article on facetok, it's going to be so much worse ...

jorvi · 2 years ago

From hypernormalisation to hyperreality.

I’ve been telling my friends that 5-10 years from now, the only things that you’ll be able to ~100% trust is what happens in front of your eyes, in that very moment. You can elect to trust reliable news organizations to vet things for you, but even if you do, due to polarization a huge subset of the world will think you’ve been got and discard everything as fake.

Look at stuff like Sora, or all the new voice models coming out. Just a few days ago there was a high school athletic coach (!) arrested for cloning the principal’s voice using that to say vile stuff. He only got caught because he used his own e-mail.

Now combine that with the fact that Microsoft’s new Phi-mini model approaches GPT-3.5 performance using 3.8 billion parameters, whilst GPT-3.5 uses 175 billion. And we’re only ~5 years of optimization into this tech.

I want to get off Mr Bones’ wild ride.

pfannkuchen · 2 years ago

> the only things that you’ll be able to ~100% trust is what happens in front of your eyes

Won't this just be a return to the historical norm?

Prior to photography being invented, there was no guarantee that any retelling of events (whether spoken, written or drawn) was true.

It will be weird for people alive today, but it doesn't seem that risky from a societal perspective.

bombcar · 2 years ago

We already know you cannot trust what you see with your eyes (check any "compare eye-witness reports with trusted video recordings" or watch Penn & Teller).

We're in for a fun and wild ride.

dudefeliciano · 2 years ago

> you’ll be able to ~100% trust is what happens in front of your eyes

5-10 years from now we may have vision pro gen 5, or whatever system achieves market dominance, between our eyes and what happens in front of us.

bparsons · 2 years ago

You can trust what happens in front of your eyes until everyone starts wearing augmented reality contact lenses.

jareklupinski · 2 years ago

> I want to get off Mr Bones’ wild ride.

the ride never ends

giantg2 · 2 years ago

"I’ve been telling my friends that 5-10 years from now, the only things that you’ll be able to ~100% trust is what happens in front of your eyes, in that very moment. You can elect to trust reliable news organizations to vet things for you,"

The time is now. Even the mainstream news organizations are probably high 90s% reliable as they've been caught selectively editing, not vetting sources or facts, and displaying biases.

andsoitis · 2 years ago

Trust is a dependency for human existence. Not just for civilization, but also very small communities and basic exchange of ideas, goods, and services.

I cannot foretell how the risk of trust destruction from GenAI will unfold, but I'm optimistic our creativity will win out.

cthalupa · 2 years ago

If you think about it, though, the window for there being something resembling objective universal truth has been a very very short period of human history. It really didn't exist before the internet and ubiquitous smartphones.

Before the internet, TV, radio, newspapers were our sources of truth beyond just trusting people in our immediate vicinity, and these were all heavily filtered by what stories they decided to run, the amount of detail they focused on, any human bias that crept into their reporting, etc. I'm not a "FAKE NEWS!!!" kind of guy, but one has always had to ingest news from these sources with some level of filtering in this regard, and understand that there might be other sides to the story, or whole stories of importance going unreported.

If we revert to subjecting images/video/audio clips to the same level of skepticism we had with random people informing us of pieces of news with no proof, then we're effectively just at the same level of objective universal truth as we had been for the overwhelming majority of human history.

I'm not arguing this is a good thing - just that it might have been a small and blissful island that some of us had the privilege of enjoying.

froh · 2 years ago

yes. that depndency is why it's even in the ten commandments: don't give false testimony / don't slander, don't lie.

https://en.wikipedia.org/wiki/Thou_shalt_not_bear_false_witn...

but instead they discuss who I've married. sigh.

anyhow: I share your optimism.

bloopernova · 2 years ago

This is partly why I'm so fascinated and disgusted by trolls and astroturfers. They erode trust in a given forum, which degrades the quality of discourse because no one wants to invest time in untrustworthy discussions.

Sometimes I wish I could get an honest answer from trolls about what they hope to achieve, but of course that will never happen.

throwthrowuknow · 2 years ago

A digital audio file is not even close to being proof of anything. Even without voice cloning you can easily edit, clip and compose audio into almost anything you want. It’s also not difficult to simply impersonate someone else’s manner of speaking with practice something that is commonly done by both amateurs and professional actors. The only thing that changes is the ease with which this can be done which should help everyone understand how unreliable such “proof” is.

telesilla · 2 years ago

Sounds like a remake of Sneakers is needed--with a fresh take on impersonation and social engineering, to remind people what's possible and potentially dangerous.

colecut · 2 years ago

When dealing with social conflicts, there is very rarely "proof", just evidence

In courts of law, digital tapes have been frequently used as evidence.

tyingq · 2 years ago

Legal opinions don't seem to have caught up...

https://www.justice.gov/archives/jm/criminal-resource-manual...

jilijeanlouis · 2 years ago

There are actually a big opportunity for companies like loccus: https://www.loccus.ai/

nolok · 2 years ago

I don't know the political situation in your country, but all I will say is that the putin-aligned far right wing in mine uses very much fake quote and deep fake videos and things like that to propagate ideas, and that their "followers" eat it up, and then you either have to let those non truth remain or you spend all your energy fighting them / defending yourself, making you look guilty. And in the past 3/4 years (and it exploded with covid), it's the followers themselves that now start those things.

For AI, those AI move it to "someone with some dedication can do it" to "anyone with a computer can do it".

croes · 2 years ago

Easily for people who what to do.

Now it's easy for everyone.

ActionHank · 2 years ago

These are big issues, but I would say that a bigger issue is the case where a spam caller has you on the line talking for ~10 seconds and then calls the bank or family member as you.

Android and iOS should support real time voice changers as the norm with a quick switch button on the dialer to disable it and an option to have it off for known contacts.

andy99 · 2 years ago

I've come around to the idea that the hype around criminal or bad actor uses of AI is the same as the hype around other uses. Some real uses will shake out but the delta between what's actually enabled by the tech and what was possible anyway is way smaller than people like to represent.

jddj · 2 years ago

I'm not sure. Maybe I'm caught in the hype but I feel like possible is one thing, scalable is another.

We're at the point now where, for example, a leak of phone number + address book would enable a high quality, large scale automated impersonation pipeline that many here could put together in a weekend.

Install a compromised app (or just be a once-user of a newly-compromised app), then answer the phone for a minute and then everyone you know receives a somewhat believable phonecall to empty their wallets / your accounts?

nolok · 2 years ago

I am not talking about criminals or similar, for that I agree with you.

I am talking about political. There has been a MASSIVE explosion of fakeness since covid, coupled with people who completely lost their ability to actually check information, and anything that makes it easier for anyone to propagate is dangerous in my view.

We've reached the point where "check by yourself / do your own research" moved from what the proponent of the truth used to tell people to go check, to what the other side is now using to give as much value to their facetok videos and whatnot (I am speaking in the general sense).

unraveller · 2 years ago

It's not a very consistent pearl clutch either, would they deny all access to computers because terrorists will use them to fire heat-seeking missiles?

I think it's just a lack of imagination, we can do this funny offensive misleading thing right now, it can surely be used in other ways but you'd have to let your mind wonder for one extra moment after your social media induced reaction fixation episode. This could be depression hack to hear your own voice in different accents motivate you.

corobo · 2 years ago

Honestly it couldn't hurt for people to re-learn not to trust things on the internet.

I haven't heard someone snarkily say "it's on the internet so it must be true" in like 12 years. Let's bring that back.

ranger_danger · 2 years ago

Really? I hear it all the time. I mean, if it's on the Internet it must be true.

simion314 · 2 years ago

>We're really in an era were laws and their enforcement will have a lot of catching up to do very fast.

I mean there are existing laws that would apply, there are laws against scamming or whatever bad stuff you can do with this. The world might need a law though to force media to label AI generated content and IMO a law where media and users would take responsability for their content.

What I mean if say as a random user I claim something in a media post like "eating shit cures you by covid" I should either be forced to add "btw I am not taking any respectability for my comment and I am not a medic/lawyer/expert" otherwise I should be forced to pay damages is someone sues me for my bullshit claims.

So cloning voices should be legal, sscamming people is illegal.

What is a legitimate use case for this? I can think of a hundred applications for deceiving others but struggle to come up with a scenario where one would want their voice cloned or reproduced.

dannyw · 2 years ago

You're recording a podcast and want to tweak some of your own words, without the hassle of re-recording.

You're an indie game developer, and want to have vibrant NPCs with their unique voices and dialogues powered by a LLM.

You're producing a movie, and want to tweak certain lines of dialogue; with the consent of the talent.

You suffer from health conditions and are gradually losing your voice, but you still want to communicate.

There are certainly legitimate use cases of this technology. I personally believe illegitimate use cases overshadow the legitimate use cases, but I don't think it's fair to say there are no legitimate applications.

We should strictly regulate the use of this technology by criminalizing abuse; not by banning it altogether (which is pretty hard in the case of software and small models).

dylan604 · 2 years ago

> You're producing a movie, and want to tweak certain lines of dialogue; with the consent of the talent.

The latest agreement to end the last round of strikes was to prevent this very thing.

Of your list, the medical condition to give someone their real voice instead of a Hawking voice would be the most legit reason. Everything else is a skewed sense of morally acceptable as I think they are shady

ranger_danger · 2 years ago

I wish I had your creativity when trying to think of something useful to code.

whycome · 2 years ago

It’s only a matter of time before Alexa and other agents use better customizable voices.

Audiobooks could have voices read by characters rather than a single narrator faking it. (If even)

You have a cold but still want to give a speech without coughing.

Low bandwidth transmission of audio: transmit just the text and use local voice model to replay it.

Talk to your loved ones after they’re gone.

Hilarity and comedy.

beretguy · 2 years ago

> Talk to your loved ones after they’re gone.

Ok, no, that's bad. Have you seen Black Mirror?

ranger_danger · 2 years ago

Imagine the day when Alexa speaks back to you in your own voice. People would go insane.

r2_pilot · 2 years ago

You may not be trying hard at all then. The first thing I thought was to clone your voice to use in real-time translations. I can probably think of several others mentioned in comments below, but this is a 100% always-useful never-nefarious(assuming perfect translation not being maliciously used) application.

brigadier132 · 2 years ago

This tech makes me not even want to speak.

anotherevan · 2 years ago

I have a friend with a paralysed larynx who is often using his phone or a small laptop to type in order to communicate. I know he would love it if it was possible to take old recordings of him speaking and use that to give him back "his" voice, at least in some small measure.

Unfortunately I have yet to see something that can do this and provide a voice model that you could plug into Android TTS and/or Windows which are what he uses.

jilijeanlouis · 2 years ago

Why would you use a dedicated app? Does it have to be natively embedded in android ?

shortrounddev2 · 2 years ago

I play a lot of counter-strike and it's very amusing when people hurl insults at the other team with the voice of Joe Biden

colechristensen · 2 years ago

Fixing small errors in narration, voiceovers, or other recorded content.

Translation of recorded content with the original voice into new languages.

Comedy as long as it’s obvious that it’s a fake.

Actually intentionally selling your voice to be the voice of some text to speech product. Maybe I want Alexa to have the voice of Danny Devito, as long as he’s ok with it and getting paid.

lukevp · 2 years ago

My wife has been sick all week and has to communicate over text because her voice has gone. We’ve been talking about making voice clones of ourselves for situations like this. Some people never regain their voices so preserving them before they lose it is super valuable.

gotrythis · 2 years ago

I imagine training people, and having everything I say be available in any language, matching my tonality, and being able to reach a global audience. I'm very much looking forward to this.

mlboss · 2 years ago

Podcast production without speaking. Audio correction in media.

bhickey · 2 years ago

> What is a legitimate use case for this?

Voice loss.

tmaly · 2 years ago

What if you wanted to create audio for your videos without having to have a recording session.

smrtinsert · 2 years ago

Tiktok videos for the amusement of millions?

dylan604 · 2 years ago

For 6 months before it's banned!

tjbiddle · 2 years ago

Sending personalized messages to customers

codelobe · 2 years ago

Indie Gamedevs can do their own Voice Acting? See also indie film, same use case. Actor dies / hit by buss before a work is finished - Create a few more lines posthumously (it'll be in the fine print of the contract that you allow voice & image fakes in the event not able to do them). Satire, Pranks, and alleged pranks (stuff that makes folk laugh).