OpenAI rolls out Advanced Voice Mode with more voices and a new look

I have played with it for 20 minutes and here’s my review:

1. The low latency responses do make a difference. It feels miles better than any other voice chat out there.

2. Its pronunciation is excellent and very human like but it is not quite there. Somehow I can tell instantly that it’s a chatbot, it feels firmly in the uncanny valley.

3. On the same note if I was on call and there was a chatbot on the other side of the call I can instantly tell. It’s a mix of the voice with the way it responds, it just does not sounds like a human talking to you. I tried a bit to make it sound more human like, asking it to stop trying so hard in conversation being briefer etc but I wouldn’t say it made things better

And so my final review is, it is a big achievement over anything out there, nothing else comes close but it is like video game console graphics. You can instantly tell it’s not the real thing and because of that I find it harder to use than just typing to it.

achrono · a year ago

>Somehow I can tell instantly that it’s a chatbot [...] because of that I find it harder to use than just typing to it.

That to me is precisely the reason to still use it without hesitation, because once it starts getting very-much-human, I don't know if I want to use it unless I really have to.

I think there's a lot of merit in keeping it sounding just a little artificial so that it is easier to have some psychological distance from what is already an overly anthropomorphic experience.

In religion/religious studies, there is the occasional debate of whether or not deities are/ought to be anthropomorphic, and atheism of course finds the whole notion ridiculous. Considering that our hopes and dreams with AGI can often feel religious -- maybe it's time to take that same lens towards AI.

empath75 · a year ago

I generally like it, but anytime I bump up against the guidelines, which you do if you want to do basically anything fun at all (singing!) is the most obnoxious experience, because it feels 10x worse coming from a fake smiling personality that sounds almost human than it does over text.

mewpmewp2 · a year ago

But would you want it to feel exacty like a real person in the first place? I think for that it would have to make itself far less articulate, etc as well.

threeseed · a year ago

> I find it harder to use than just typing to it

Systems like this have existed since the 90s e.g. Dragon albeit far more rudimentary.

And the issues are exactly the same: (a) discoverability, (b) efficiency and (c) recoverability.

It is so much easier to have a screen with fixed options that you interact with, can easily see your journey and can go back for any mistakes. Versus with our voice which is the clunkiest, slowest and least precise input method we have.

zurfer · a year ago

I understand how (a) discoverability and (c) recoverability are a problem, but what do you mean with (b) efficiency?

Most people talk faster than they can type.

infecto · a year ago

Voice is the future for certain interfaces. Its only clunky, slow and not precise because of the systems the voice is interacting with.

noahjk · a year ago

> Versus with our voice which is the clunkiest, slowest and least precise input method we have.

Some related issues I have:

- my thoughts always seem to be jumbled when talking to AI

- I rush to talk quickly because any pause seems to trigger a response

- I worry words or DSL I use won’t be interpreted properly

This all leads to a pretty poor voice experience for me, and I usually forget half of what I want to talk about.

elif · a year ago

That's merely because our conversational capability has become diminished.

I can't keep a conversation going with AI as easily as a person because of my poor skills, no fault of the AI.

I will improve over time, and there is no reason I won't be able to become as natural as jean luc Picard telling his starship what to do.

corobo · a year ago

I think the uncanny valley feeling is going to be there no matter what they come up with. I, and therefore my brain, knows the voice is coming from a soulless machine[1] so it'll always feel a little off.

My perfect voice assistant would sound like Auto from Wall-E, which is supposedly a blend of MacOS' Ralph and Zarvox voices. Along the lines of (bear in mind I just wrote this directly into the terminal and didn't spend any time actually blending them lol)

  say -v ralph -r 180 "I'm sorry Dave. I’m afraid I can’t do that" & ; say -v zarvox -r 180 "I'm sorry Dave. I’m afraid I can’t do that"

And yeah I'm almost convinced that the whole voice interaction thing came about because they interact with the computer in Star Trek using voice commands.. which is probably just because watching someone type everything into a keyboard would be some boring telly.

I assume there are folks that do use it and do like it, but do they like it more than just pressing buttons to do things? No worries of being misinterpreted or having to speak like a robot at Alexa because it's failed to turn the lights off 3 voice commands in a row now. It's awesome for accessibility, don't get me wrong, I'm talking in the sense of the primary and most commonly used interface.

[1] Not a criticism, fellow soulless machines.

tkgally · a year ago

I got access to the Advanced Voice mode a couple of hours ago and have started testing it. (I had to delete and reinstall the ChatGPT app on my iPhone and iPad to get it to work. I am a ChatGPT Plus subscriber.)

In my tests so far it has worked as promised. It can distinguish and produce different accents and tones of voice. I am able to speak with it in both Japanese and English, going back and forth between the languages, without any problem. When I interrupt it, it stops talking and correctly hears what I said. I played it a recording of a one-minute news report in Japanese and asked it to summarize it in English, and it did so perfectly. When I asked it to summarize a continuous live audio stream, though, it refused.

I played the role of a learner of either English or Japanese and asked it for conversation practice, to explain the meanings of words and sentences, etc. It seemed to work quite well for that, too, though the results might be different for genuine language learners. (I am already fluent in both languages.) Because of tokenization issues, it might have difficulty explaining granular details of language—spellings, conjugations, written characters, etc.—and confuse learners as a result.

Among the many other things I want to know is how well it can be used for interpreting conversations between people who don’t share a common language. Previous interpreting apps I tested failed pretty quickly in real-life situations. This seems to have the potential, at least, to be much more useful.

(reposted from earlier item that sank quickly)

throwaway13337 · a year ago

I'm in europe and was able to accesss the feature with a VPN.

Surprising that there isn't a 'hey siri' for chatgpt yet. Obviously, that would make this sort of feature infinitely more useful. This is what monopoly gatekeeping looks like.

The limitations in this feature show the problems with both EU proactive regulation and US underregulation.

Bad regulation has become the biggest issue standing in the way of useful software for humans.

hentrep · a year ago

>Surprising that there isn't a 'hey siri' for chatgpt yet

Sort of a middleman approach and certainly not perfect, but you can invoke ChatGPT with Siri using Shortcuts.

https://help.openai.com/en/articles/7993358-chatgpt-ios-app-...

aaronharnly · a year ago

Interesting, do you know whether this can invoke Voice Mode?

terhechte · a year ago

̶H̶o̶w̶ ̶a̶r̶e̶ ̶y̶o̶u̶ ̶u̶s̶i̶n̶g̶ ̶i̶t̶?̶ ̶I̶'̶m̶ ̶t̶r̶y̶i̶n̶g̶ ̶M̶u̶l̶l̶v̶a̶d̶ ̶a̶s̶ ̶a̶n̶ ̶i̶O̶S̶ ̶V̶P̶N̶ ̶a̶n̶d̶ ̶I̶ ̶d̶i̶d̶n̶'̶t̶ ̶g̶e̶t̶ ̶i̶t̶ ̶t̶o̶ ̶w̶o̶r̶k̶.̶ ̶D̶i̶d̶ ̶y̶o̶u̶ ̶b̶u̶y̶ ̶a̶ ̶s̶e̶p̶a̶r̶a̶t̶e̶ ̶s̶u̶b̶s̶c̶r̶i̶p̶t̶i̶o̶n̶ ̶w̶i̶t̶h̶ ̶a̶ ̶U̶S̶ ̶c̶r̶e̶d̶i̶t̶ ̶c̶a̶r̶d̶?̶

Nevermind, I deleted and re-installed the app on iOS while on VPN and now it works!

CubsFan1060 · a year ago

I dunno, it's a single button press on my phone to access it. That seems plenty fine to me.

modeless · a year ago

I tried asking it to practice Chinese with me. It claimed to be able to identify tones. I tested it by using the wrong tones on purpose and it said my pronunciation was "really great". Seems like it just praises you no matter what you do.

tkgally · a year ago

I tested its ability to detect issues in my nonnative pronunciation of Japanese and Russian. As with you, it praised me more than I wanted, but it also provided pointed, appropriate feedback.

I was particularly impressed that it corrected the pitch accent of some words I said in Japanese. I speak Japanese fluently but, because I began learning as an adult, I have a foreign accent that I am unable to lose. One major component of my nonnative sound is my inadequate acquisition of the pitch accent system. Nobody ever corrects me in conversation and it would be annoying if they did. If, when I started learning Japanese forty years ago, I had had a bot that could hear and correct my pronunciation, I would have less of a foreign accent now.

Some prompt engineering is needed, though, to get rid of that excessive praise. In my next tests, I will just tell it not to praise me at all. That should work.

kanwisher · a year ago

Its pretty amazing, if you are curious just try asking it to do a live translation with a friend that speaks another language, its realtime and very seamless

m3kw9 · a year ago

Some review bullet points:

1. It's a bit too agreeable, example: "thats an excellent point" etc every single time.

2. It understands surprisingly well. example: from experience, when I explain something vaguely, my expectation is that it would not understand, but it does most of the time. It removes the frustration of needing to spell out in much more detail.

3. It feels like talking to a real person, but the way the AI talks in a sort of monotonic ways. Example: it would respond with similar tones/excitement every time.

4. Very useful if you need to chat but doesn't want to chat with humans about some subjects like ideas, and explainations.

mnicky · a year ago

Interesting that the release comes a day after Google's new models [1]. Seems a bit like strategical timing :) Maybe they waited until some of the competitors release something so that they can upstage his release with theirs?

____

[1] Which, btw, I think deserve better sentiment. On benchmarks, the new Gemini Pro seems to be better than GPT-4o. It's just not so hyped...

martypitt · a year ago

> Advanced Voice is not yet available in the EU, the UK, Switzerland, Iceland, Norway, and Liechtenstein.

That's disappointing. I wonder if it's related to legal issues, technical issues, or just doing a phased rollout?

sunaookami · a year ago

Sam Altman only said this: https://x.com/sama/status/1838864011321872407

>except for jurisdictions that require additional external review

crimsoneer · a year ago

...I'm not at all clear what this external review would be for either the UK or Switzerland.

casualbob_uk · a year ago

The app announced I had access to advanced voice mode about 10 mins ago, gave me the guided tour and got me to chose a voice, and now it's gone again, with the message back that it's on its way

During this time, I dont believe I actually had access to it, as it wouldn't hum, laugh, pick up on my voice tones etc

Got my hopes up!

raverbashing · a year ago

I initially thought this had to do with regulation as well, but people trying it over a VPN are complaining about latency, which makes me believe there's a technical (deployment regions) reason for it as well

(Also language reasons as well - though it seems it works with other languages)

How well does it work in the UK (and understanding its regional accents?)

IncreasePosts · a year ago

I doubt they would bother blocking the 1 user from Liechtenstein if it was about keeping the usage numbers low for performance reasons.

casualbob_uk · a year ago

Im in the UK and its just been enabled for my pro account about 10 mins ago

isodev · a year ago

Or perhaps it’s just not that good with other languages and accents.

terhechte · a year ago

Did someone figure out if it works when I use a VPN?

coreyh14444 · a year ago

In Denmark with a Finnish Teams Account. I've tried NordVPN, uninstalled, reinstalled. From my ipad and iphone and no luck.

crimsoneer · a year ago

Given it includes the UK, I assume it's GDPR (and probably linked to the provenance of training data?) rather than any new AI act stuff.

guappa · a year ago

When something is not available in EU, you instantly know they're up to no good.

edit: Sorry if privacy hurts your feelings, people who downvote me.

throwaway13337 · a year ago

Mario Draghi - EU central bank president - created a huge report for the EU that was just published. He singled out overregulation as a key issue in european progress.

Specifically, the report called out GDPR costing small businesses 'more than' 15% of their profits.

This is indeed quite a hurdle. Privacy isn't really the issue - it's regulation that understands that complexity has a cost. We, as developers, should understand the deep, deep cost of complexity.

As a business owner that respects privacy a great deal, GDPR and regulation like it are still an immense hurdle - the cost in understanding and in doing things to the letter of the law is, I think, hard to grasp from the outside. Regulations occupy a huge amount of space in my head that was previously filled with making a better product.

PDF available here:

https://commission.europa.eu/topics/strengthening-european-c...

mewpmewp2 · a year ago

To give benefit of the doubt, it could also be just about a review process, it might come available after a review.

E.g. one valid use-case would be about storing your voice recording on the cloud in a tokenized format.

Because the model is now directly taking your voice, which I assume as tokens, it can't be immediately deleted as opposed to speech to text, which can be quickly used to convert to text and then deleted.

lynx23 · a year ago

> ... you instantly know they're up to no good.

You're refering to the EU here, right?

M4v3R · a year ago

It's definitely legal issues, probably the same reason Apple is not rolling some of the new features like Apple Intelligence in iOS 18 to EU.

[0] https://www.macworld.com/article/2374452/apple-intelligence-...

nindalf · a year ago

> Apple Intelligence was only ever going to support American English this year anyway, with other languages coming in 2025 and beyond. That would somewhat limit its effectiveness in the EU to begin with ...

Their AI product wasn't ready for other languages, not even British English where the DMA definitely doesn't apply.

A person would have to be very naive to believe Apple directly here. They just want to generate bad press for the DMA.

jacooper · a year ago

I don't believe this, gpt4o and chatgpt is already available in the EU. Adding advanced voice mode doesn't change anything.