Readit News logoReadit News
Posted by u/chaeeunlee9611 a year ago
Show HN: Supertone Shift – AI powered Real-time voice changerproduct.supertone.ai/shif...
Supertone's Shift offers real-time voice changing technology. It lets users immediately switch to any selected voice. Just pick a voice and begin speaking. Shift is suited for VTubers, content creators, and gamers, as well as anyone who wishes to accurately express their chosen persona's voice. Try out Supertone Shift now. >> https://product.supertone.ai/shift
watersb · a year ago
Very interesting!

I would like some clarity on the Terms of Service clause 4:

> The content created using Supertone Shift remains your property. However, by using our Services, you grant Supertone a worldwide, non-exclusive, royalty-free license to use, reproduce, adapt, and display content solely for the purpose of operating and improving Supertone Shift. This license does not grant Supertone any rights to sell or distribute your content.

Does Supertone Shift need the user content in order to further improve the product during the beta period?

Or does it need the user content in normal operation (for example, running the conversion on remote servers vs local processing)?

I can see some hesitation from people if you're recording everything they say, and keeping that recording for an indefinite period of time.

I can appreciate that there may be a problem enforcing a "Don't use our product for evil" clause, if you can review usage.

The challenge here seems overwhelming.

weinzierl · a year ago
The phrasing is pretty standard, the important part is the middle sentence. Often it includes irrevocable, transferable and sublicensable as well.

That being said, I hate "remains your property" part. It's just fluff that changes nothing, but distracts from the following sentence.

Gormo · a year ago
The reason why this is standard is because functionally, anything that receives data from a user, hosts it, and transmits it to third parties is engaging in distribution of copyrighted content. Without a grant of license, pretty much every message board, social media platform, or any website or internet-based application that does anything with user-generated data could be exposed to copyright liability. You may note that this very site's legal declarations page includes an identical clause.

"Remains your property" is not fluff at all, and explicitly disclaims any ownership of rights associated with content you post, and equivalently indemnifies users against any liability for re-posting or re-using content they posted here, which they'd potentially be exposed to if they were assigning copyright to the hosting platform rather than just granting a license.

htrp · a year ago
Looks like facebook's ToS,

we may need your data for some unspecified purpose ("AI model training") that we can't even dream of right now, so we'll just take all the rights

echelon · a year ago
There are dozens of other products in this category, including completely open source ones you can fine tune.

Commercial applications like Voice.Ai and Koe are real time and have celebrity and anime voices respectively.

The RVC ecosystem on GitHub has dozens of different real time open source voice changers. I haven't kept up with the SOTA, but they're incredible, fine tunable, and 100% local.

https://voice.ai

https://koe.ai

https://m.youtube.com/watch?v=zkaBK5erB2c

andoando · a year ago
Ive tried making this exact product using all of these services, including using github repo koi is based on.

They all use like 50% of my cpu to get real time. I was able to get actual low latency with koi, but still massive cpu usage. And theres no community of models for it either.

Perhaps someone who really knows what theyre doing could optimize these open source models but its not me

IndySun · a year ago
You don't need to look far to understand those terms are standard, and by 'standard' see non-binding, or broad, it doesnt matter what they 'say' here because you will only find supertone abusing these 'terms' if someone at supertone lets you know - meanwhile your voice is syphoned off and used in anyway their friends see fit, and no terms laid out here will be broken. As per other replies for standard software terms, see duplicitous.

Deleted Comment

autoexecbat · a year ago
It could atleast have a time limit
fzaninotto · a year ago
Why does the Mac installer require admin right and a restart? Giving admin rights to an installer requires trust in the vendor. Supertone Shift is just a newborn. I cancelled the installation because of that.

I would love to test the technology without the risk of damaging my computer!

desro · a year ago
I use the great, free, "Suspicious Package" app [0] to inspect installers like these.

In fact, it was Supertone Shift's installer that prodded me to seek it out (I happened to find and install Shift a couple of weeks ago).

In this case, it needs admin permissions to install to `/Library/Application Support` as well as `/Library/Audio`.

It needs to restart in order for the HAL driver to be loaded (this provides the virtual audio interface for using the app with Teams, Zoom, etc.)

The preinstall/postinstall scripts simply handle the app's directory in Application Support.

I decided it was safe enough, and had some fun playing with it. It contacts what it claims are licensing servers (when it starts), and won't start without it. It wanted to keep contacting those servers constantly, but blocking its network access via Little Snitch didn't prevent it from functioning. The network traffic was in the single-digit kilobyte range, so I felt reasonably confident no audio data was being looted.

[0] https://mothersruin.com/software/SuspiciousPackage/

moralestapia · a year ago
Thanks for this, I was very eager to try it out but this is a always a deal breaker.
michaelmior · a year ago
This seems really cool and I can see some great use cases. But the marketing is very odd to me. It says it will let me express myself in a voice that is truly my own…but I can already do that with my natural voice. That seems more likely to be unique than what I would get by adjusting it in software.
themoonisachees · a year ago
I guess the wording is awkward, but as a trans person, I still resonate with it. I'm acutely aware it's not going to be "my voice", but neither is the one I have right now.
mintplant · a year ago
It's funny to me that we just had a big front-page thread full of HN users questioning the value of diversity, and then this thread where people struggle to figure out the obvious trans market for voice-changing software.
michaelmior · a year ago
Thanks for the explanation. This is definitely something I hadn't considered.
corytheboyd · a year ago
Outside of the trans use-case mentioned here, I could imagine some women gamers getting value out of this too. You kinda need voice comms to play some games properly, and not wanting to reveal yourself as a woman online, especially over voip, is completely reasonable. Because gamers are terrible. Something like this could make hiding that trivial, assuming the latency is accurate (would need to be very fast in some games)
baobabKoodaa · a year ago
Are gamers really more toxic towards women than men? I feel like switching gender will just switch one kind of toxicity to another.
sdfgtr · a year ago
That particular line is definitely directed towards people with gender identity issues.
itishappy · a year ago
Salesperson: You test drive any car on the lot!

You: Why? I already own a 2002 Ford Escape...

I'm not trying to make fun of you, I think you actually have a unique and impressive perspective! I've always hated hearing my voice on answering machines, so if I could choose any voice I'd choose Chris Cornell or Morgan Freeman.

idiotsecant · a year ago
Pro tip: Some people do not consider their natural voice 'their' voice.
trashcluster · a year ago
If it was compatible as a VST plugin for DAWs it would be even more useful than a standalone software. From skimming through the website it seems that Supertone already make a VST plugin so it may be a matter of time before Shift becomes a VST too.
hollowayaegis · a year ago
Self plug, but I've been developing a local AI voice changing VST [1] (bring your own RVC models, or use builtins). It works in DAWs in realtime on modern macs.

[1] https://audio.sunflower.industries

desro · a year ago
This looks cool and I've downloaded it. Clicking on the "free" tier on the subscription page brings you into Stripe checkout for the $6 tier, FYI.
jl6 · a year ago
Would it be possible to embed a watermark in the generated audio? Many people will use voice changing tech for honest purposes, but there will always be those acting to ruin it for the rest of us. There are just too many scenarios where faking your voice confers an illicit benefit.

I know watermarks are never foolproof, but they may deter casual misuse.

terhechte · a year ago
Curious Question: Given the low latency, does it run the computation on device or over the network? If on device, are there minimum CPU requirements?
catapart · a year ago
Very interested in this answer! I'd really like to see it on the website for any AI I'm considering. It's an entirely different proposition as to whether you're getting a utility or a service.
rcarmo · a year ago
I can see this being interesting for gamers and more whimsical pursuits, but I'm more curious about neural speech synthesis for both normal speech and singing--the first because there is a pretty strong demand for automated narration of training videos, and the second because of my music hobby--other than vocaloids and a few niche DAWs, I haven't found any nice Open Source tooling for the latter (the former I can mostly do with XTTSv2).
edwcross · a year ago
From what I found, XTTSv2 is based on the Coqui Public Model License, which explicits disallows commercial commercial usage: "This license allows only non-commercial use of a machine learning model and its outputs."

So, from what I understand, I cannot use it and then upload the training video to Youtube. Or can I?

margorczynski · a year ago
I guess if it is demonetized it should be ok? Or maybe not if your other content or activity is commercial, as even if the video in itself doesn't make money it would indirectly promote your other commercial activity.

Interesting legal problem.

bogwog · a year ago
Seems like we're getting closer and closer to Star Trek's universal translator