Which is why we need to make this technology readily available and well known so that people are more aware of it and don't trust everything and look up sources.
Ah, who am I kidding, most people will still not fact check.
We're really in an era were laws and their enforcement will have a lot of catching up to do very fast.
Fake historical proof, fake leaks, fake endorsments, fake ads ... People couldn't be bothered to double check when it was mere random text article on facetok, it's going to be so much worse ...
I’ve been telling my friends that 5-10 years from now, the only things that you’ll be able to ~100% trust is what happens in front of your eyes, in that very moment. You can elect to trust reliable news organizations to vet things for you, but even if you do, due to polarization a huge subset of the world will think you’ve been got and discard everything as fake.
Look at stuff like Sora, or all the new voice models coming out. Just a few days ago there was a high school athletic coach (!) arrested for cloning the principal’s voice using that to say vile stuff. He only got caught because he used his own e-mail.
Now combine that with the fact that Microsoft’s new Phi-mini model approaches GPT-3.5 performance using 3.8 billion parameters, whilst GPT-3.5 uses 175 billion. And we’re only ~5 years of optimization into this tech.
We already know you cannot trust what you see with your eyes (check any "compare eye-witness reports with trusted video recordings" or watch Penn & Teller).
"I’ve been telling my friends that 5-10 years from now, the only things that you’ll be able to ~100% trust is what happens in front of your eyes, in that very moment. You can elect to trust reliable news organizations to vet things for you,"
The time is now. Even the mainstream news organizations are probably high 90s% reliable as they've been caught selectively editing, not vetting sources or facts, and displaying biases.
Trust is a dependency for human existence. Not just for civilization, but also very small communities and basic exchange of ideas, goods, and services.
I cannot foretell how the risk of trust destruction from GenAI will unfold, but I'm optimistic our creativity will win out.
If you think about it, though, the window for there being something resembling objective universal truth has been a very very short period of human history. It really didn't exist before the internet and ubiquitous smartphones.
Before the internet, TV, radio, newspapers were our sources of truth beyond just trusting people in our immediate vicinity, and these were all heavily filtered by what stories they decided to run, the amount of detail they focused on, any human bias that crept into their reporting, etc. I'm not a "FAKE NEWS!!!" kind of guy, but one has always had to ingest news from these sources with some level of filtering in this regard, and understand that there might be other sides to the story, or whole stories of importance going unreported.
If we revert to subjecting images/video/audio clips to the same level of skepticism we had with random people informing us of pieces of news with no proof, then we're effectively just at the same level of objective universal truth as we had been for the overwhelming majority of human history.
I'm not arguing this is a good thing - just that it might have been a small and blissful island that some of us had the privilege of enjoying.
This is partly why I'm so fascinated and disgusted by trolls and astroturfers. They erode trust in a given forum, which degrades the quality of discourse because no one wants to invest time in untrustworthy discussions.
Sometimes I wish I could get an honest answer from trolls about what they hope to achieve, but of course that will never happen.
A digital audio file is not even close to being proof of anything. Even without voice cloning you can easily edit, clip and compose audio into almost anything you want. It’s also not difficult to simply impersonate someone else’s manner of speaking with practice something that is commonly done by both amateurs and professional actors. The only thing that changes is the ease with which this can be done which should help everyone understand how unreliable such “proof” is.
Sounds like a remake of Sneakers is needed--with a fresh take on impersonation and social engineering, to remind people what's possible and potentially dangerous.
I don't know the political situation in your country, but all I will say is that the putin-aligned far right wing in mine uses very much fake quote and deep fake videos and things like that to propagate ideas, and that their "followers" eat it up, and then you either have to let those non truth remain or you spend all your energy fighting them / defending yourself, making you look guilty. And in the past 3/4 years (and it exploded with covid), it's the followers themselves that now start those things.
For AI, those AI move it to "someone with some dedication can do it" to "anyone with a computer can do it".
These are big issues, but I would say that a bigger issue is the case where a spam caller has you on the line talking for ~10 seconds and then calls the bank or family member as you.
Android and iOS should support real time voice changers as the norm with a quick switch button on the dialer to disable it and an option to have it off for known contacts.
I've come around to the idea that the hype around criminal or bad actor uses of AI is the same as the hype around other uses. Some real uses will shake out but the delta between what's actually enabled by the tech and what was possible anyway is way smaller than people like to represent.
I'm not sure. Maybe I'm caught in the hype but I feel like possible is one thing, scalable is another.
We're at the point now where, for example, a leak of phone number + address book would enable a high quality, large scale automated impersonation pipeline that many here could put together in a weekend.
Install a compromised app (or just be a once-user of a newly-compromised app), then answer the phone for a minute and then everyone you know receives a somewhat believable phonecall to empty their wallets / your accounts?
I am not talking about criminals or similar, for that I agree with you.
I am talking about political. There has been a MASSIVE explosion of fakeness since covid, coupled with people who completely lost their ability to actually check information, and anything that makes it easier for anyone to propagate is dangerous in my view.
We've reached the point where "check by yourself / do your own research" moved from what the proponent of the truth used to tell people to go check, to what the other side is now using to give as much value to their facetok videos and whatnot (I am speaking in the general sense).
It's not a very consistent pearl clutch either, would they deny all access to computers because terrorists will use them to fire heat-seeking missiles?
I think it's just a lack of imagination, we can do this funny offensive misleading thing right now, it can surely be used in other ways but you'd have to let your mind wonder for one extra moment after your social media induced reaction fixation episode. This could be depression hack to hear your own voice in different accents motivate you.
>We're really in an era were laws and their enforcement will have a lot of catching up to do very fast.
I mean there are existing laws that would apply, there are laws against scamming or whatever bad stuff you can do with this. The world might need a law though to force media to label AI generated content and IMO a law where media and users would take responsability for their content.
What I mean if say as a random user I claim something in a media post like "eating shit cures you by covid" I should either be forced to add "btw I am not taking any respectability for my comment and I am not a medic/lawyer/expert" otherwise I should be forced to pay damages is someone sues me for my bullshit claims.
So cloning voices should be legal, sscamming people is illegal.
Yeah not the best title/name. On a more meta note, I sometimes feel like HN comments are increasingly Reddit-style headline reactions with little investigating TFA or peering into the tech itself.
What is a legitimate use case for this? I can think of a hundred applications for deceiving others but struggle to come up with a scenario where one would want their voice cloned or reproduced.
You're recording a podcast and want to tweak some of your own words, without the hassle of re-recording.
You're an indie game developer, and want to have vibrant NPCs with their unique voices and dialogues powered by a LLM.
You're producing a movie, and want to tweak certain lines of dialogue; with the consent of the talent.
You suffer from health conditions and are gradually losing your voice, but you still want to communicate.
There are certainly legitimate use cases of this technology. I personally believe illegitimate use cases overshadow the legitimate use cases, but I don't think it's fair to say there are no legitimate applications.
We should strictly regulate the use of this technology by criminalizing abuse; not by banning it altogether (which is pretty hard in the case of software and small models).
> You're producing a movie, and want to tweak certain lines of dialogue; with the consent of the talent.
The latest agreement to end the last round of strikes was to prevent this very thing.
Of your list, the medical condition to give someone their real voice instead of a Hawking voice would be the most legit reason. Everything else is a skewed sense of morally acceptable as I think they are shady
You may not be trying hard at all then. The first thing I thought was to clone your voice to use in real-time translations. I can probably think of several others mentioned in comments below, but this is a 100% always-useful never-nefarious(assuming perfect translation not being maliciously used) application.
I have a friend with a paralysed larynx who is often using his phone or a small laptop to type in order to communicate. I know he would love it if it was possible to take old recordings of him speaking and use that to give him back "his" voice, at least in some small measure.
Unfortunately I have yet to see something that can do this and provide a voice model that you could plug into Android TTS and/or Windows which are what he uses.
Fixing small errors in narration, voiceovers, or other recorded content.
Translation of recorded content with the original voice into new languages.
Comedy as long as it’s obvious that it’s a fake.
Actually intentionally selling your voice to be the voice of some text to speech product. Maybe I want Alexa to have the voice of Danny Devito, as long as he’s ok with it and getting paid.
My wife has been sick all week and has to communicate over text because her voice has gone. We’ve been talking about making voice clones of ourselves for situations like this. Some people never regain their voices so preserving them before they lose it is super valuable.
I imagine training people, and having everything I say be available in any language, matching my tonality, and being able to reach a global audience. I'm very much looking forward to this.
Indie Gamedevs can do their own Voice Acting?
See also indie film, same use case.
Actor dies / hit by buss before a work is finished - Create a few more lines posthumously (it'll be in the fine print of the contract that you allow voice & image fakes in the event not able to do them).
Satire, Pranks, and alleged pranks (stuff that makes folk laugh).
Where are the best places to keep up with all of this? I'm very interested in this area as I want to use these tools to create things with and my own voice isn't great for this.
Speech to speech seems like it might be better than TTS to get it to be more natural, i've played around with some tools like RVC etc, but I feel like there are maybe a lot of great AI workflows I am missing amoungst all the AI noise, it's the interesting workflows and people doing interesting things with AI that I am more interested in.
Awful lot of doomsaying and drama in here. What makes this release so bad compared to the existing voice cloning AI methods that have been publicly available for ~1 year already?
I really can’t wait until voice cloning means we get a version of audiobooks read in the author’s voice. Of course it will never be quite as good as them reading it themselves but I think the author’s voice adds something that voice actors can’t- they appear to be too generic and too affected in their pronunciation for me to connect with.
What the author adds, if they're not also a trained/well-practiced voice author, is that their inflection exactly matches how they meant the words in the book to be spoken/understood.
AI isn't going to be able to do that. As good as it may get, it won't be able to read the mind of the author. It's going to be even more generic than a human reader.
Exactly, the improvement will be in rerecording terrible readings into something enjoyable or at least inoffensive. That and personalization so you can choose the voice that you prefer.
Odd, because I actually worry about this. I don't see why you'd want your books read by the author. Trained Voice Actors do a much better job, and can modulate their voices based on tone.
Autobiographies? Fine, but most of the time they are usually read by their authors.
I was hoping it would be voice transfer so the voice actor would give all the intonation and emotion and the AI would take that and make it sound like the author. Reading text with AI is getting better but yes it’ll be worse for a long time.
I have nearly no desire to have my book read my the author. They are good at writing, and an audiobook is not simply “reading” the words on the page. Maybe something like Descript that the author can use to tweak pronunciation after it’s narrated but I don’t want the author’s voice.
I would like train a model on Allyson Johnson’s voice (narrated the Honor Harrington books) and then use that to re-narrate the 1-2 books in one of the spinoffs (I think it was in the Saganami Island series?) where they used a different narrator (who was horrible).
I also might be interested in using it to clean up the Wheel of Time series where, while it’s the same 2 narrators, they change the pronunciation of various names/words book-to-book. “Moghedien” being the one that stands out most. They pronounce it at least 3 different ways:
I think I'd prefer to have options for each audiobook. I have favorite narrators, and find others unlistenable. There are also thousands and thousands of books that will never otherwise be turned into audio format unless an AI is used.
"Athletic director used AI to frame principal with racist remarks in fake audio clip, police say"
https://apnews.com/article/ai-artificial-intelligence-princi...
Ah, who am I kidding, most people will still not fact check.
Fake historical proof, fake leaks, fake endorsments, fake ads ... People couldn't be bothered to double check when it was mere random text article on facetok, it's going to be so much worse ...
I’ve been telling my friends that 5-10 years from now, the only things that you’ll be able to ~100% trust is what happens in front of your eyes, in that very moment. You can elect to trust reliable news organizations to vet things for you, but even if you do, due to polarization a huge subset of the world will think you’ve been got and discard everything as fake.
Look at stuff like Sora, or all the new voice models coming out. Just a few days ago there was a high school athletic coach (!) arrested for cloning the principal’s voice using that to say vile stuff. He only got caught because he used his own e-mail.
Now combine that with the fact that Microsoft’s new Phi-mini model approaches GPT-3.5 performance using 3.8 billion parameters, whilst GPT-3.5 uses 175 billion. And we’re only ~5 years of optimization into this tech.
I want to get off Mr Bones’ wild ride.
Won't this just be a return to the historical norm?
Prior to photography being invented, there was no guarantee that any retelling of events (whether spoken, written or drawn) was true.
It will be weird for people alive today, but it doesn't seem that risky from a societal perspective.
We're in for a fun and wild ride.
5-10 years from now we may have vision pro gen 5, or whatever system achieves market dominance, between our eyes and what happens in front of us.
the ride never ends
The time is now. Even the mainstream news organizations are probably high 90s% reliable as they've been caught selectively editing, not vetting sources or facts, and displaying biases.
I cannot foretell how the risk of trust destruction from GenAI will unfold, but I'm optimistic our creativity will win out.
Before the internet, TV, radio, newspapers were our sources of truth beyond just trusting people in our immediate vicinity, and these were all heavily filtered by what stories they decided to run, the amount of detail they focused on, any human bias that crept into their reporting, etc. I'm not a "FAKE NEWS!!!" kind of guy, but one has always had to ingest news from these sources with some level of filtering in this regard, and understand that there might be other sides to the story, or whole stories of importance going unreported.
If we revert to subjecting images/video/audio clips to the same level of skepticism we had with random people informing us of pieces of news with no proof, then we're effectively just at the same level of objective universal truth as we had been for the overwhelming majority of human history.
I'm not arguing this is a good thing - just that it might have been a small and blissful island that some of us had the privilege of enjoying.
https://en.wikipedia.org/wiki/Thou_shalt_not_bear_false_witn...
but instead they discuss who I've married. sigh.
anyhow: I share your optimism.
Sometimes I wish I could get an honest answer from trolls about what they hope to achieve, but of course that will never happen.
In courts of law, digital tapes have been frequently used as evidence.
https://www.justice.gov/archives/jm/criminal-resource-manual...
For AI, those AI move it to "someone with some dedication can do it" to "anyone with a computer can do it".
Now it's easy for everyone.
Android and iOS should support real time voice changers as the norm with a quick switch button on the dialer to disable it and an option to have it off for known contacts.
We're at the point now where, for example, a leak of phone number + address book would enable a high quality, large scale automated impersonation pipeline that many here could put together in a weekend.
Install a compromised app (or just be a once-user of a newly-compromised app), then answer the phone for a minute and then everyone you know receives a somewhat believable phonecall to empty their wallets / your accounts?
I am talking about political. There has been a MASSIVE explosion of fakeness since covid, coupled with people who completely lost their ability to actually check information, and anything that makes it easier for anyone to propagate is dangerous in my view.
We've reached the point where "check by yourself / do your own research" moved from what the proponent of the truth used to tell people to go check, to what the other side is now using to give as much value to their facetok videos and whatnot (I am speaking in the general sense).
I think it's just a lack of imagination, we can do this funny offensive misleading thing right now, it can surely be used in other ways but you'd have to let your mind wonder for one extra moment after your social media induced reaction fixation episode. This could be depression hack to hear your own voice in different accents motivate you.
I haven't heard someone snarkily say "it's on the internet so it must be true" in like 12 years. Let's bring that back.
I mean there are existing laws that would apply, there are laws against scamming or whatever bad stuff you can do with this. The world might need a law though to force media to label AI generated content and IMO a law where media and users would take responsability for their content.
What I mean if say as a random user I claim something in a media post like "eating shit cures you by covid" I should either be forced to add "btw I am not taking any respectability for my comment and I am not a medic/lawyer/expert" otherwise I should be forced to pay damages is someone sues me for my bullshit claims.
So cloning voices should be legal, sscamming people is illegal.
I tried it and I ended up sounding American, not my ususal dulcet Lancashire tones. Absolutely nothing like me.
VoiceShopAi can convert from young to old, male or female, or into any country's accent.
found via https://github.com/metame-ai/awesome-audio-plaza who is tracking most things in the voice space as they come up.
You're an indie game developer, and want to have vibrant NPCs with their unique voices and dialogues powered by a LLM.
You're producing a movie, and want to tweak certain lines of dialogue; with the consent of the talent.
You suffer from health conditions and are gradually losing your voice, but you still want to communicate.
There are certainly legitimate use cases of this technology. I personally believe illegitimate use cases overshadow the legitimate use cases, but I don't think it's fair to say there are no legitimate applications.
We should strictly regulate the use of this technology by criminalizing abuse; not by banning it altogether (which is pretty hard in the case of software and small models).
The latest agreement to end the last round of strikes was to prevent this very thing.
Of your list, the medical condition to give someone their real voice instead of a Hawking voice would be the most legit reason. Everything else is a skewed sense of morally acceptable as I think they are shady
Audiobooks could have voices read by characters rather than a single narrator faking it. (If even)
You have a cold but still want to give a speech without coughing.
Low bandwidth transmission of audio: transmit just the text and use local voice model to replay it.
Talk to your loved ones after they’re gone.
Hilarity and comedy.
Ok, no, that's bad. Have you seen Black Mirror?
Unfortunately I have yet to see something that can do this and provide a voice model that you could plug into Android TTS and/or Windows which are what he uses.
Translation of recorded content with the original voice into new languages.
Comedy as long as it’s obvious that it’s a fake.
Actually intentionally selling your voice to be the voice of some text to speech product. Maybe I want Alexa to have the voice of Danny Devito, as long as he’s ok with it and getting paid.
Voice loss.
Speech to speech seems like it might be better than TTS to get it to be more natural, i've played around with some tools like RVC etc, but I feel like there are maybe a lot of great AI workflows I am missing amoungst all the AI noise, it's the interesting workflows and people doing interesting things with AI that I am more interested in.
Do you have any recommendations on twitter of who is good to follow?
Particularly interested in people doing intreasting things with AI, I already subscribe to the normal AI newsletters, such as bens bites etc.
AI isn't going to be able to do that. As good as it may get, it won't be able to read the mind of the author. It's going to be even more generic than a human reader.
Deleted Comment
Autobiographies? Fine, but most of the time they are usually read by their authors.
I would like train a model on Allyson Johnson’s voice (narrated the Honor Harrington books) and then use that to re-narrate the 1-2 books in one of the spinoffs (I think it was in the Saganami Island series?) where they used a different narrator (who was horrible).
I also might be interested in using it to clean up the Wheel of Time series where, while it’s the same 2 narrators, they change the pronunciation of various names/words book-to-book. “Moghedien” being the one that stands out most. They pronounce it at least 3 different ways:
* Mo-gid-e-on
* Mo-ga-dean
* Mog-a-din