As a Google employee, I really don't want to be saying "Ok, Google" in my home all the time. It's totally possible for me to go to work in a subway that has Google ads, waste time on my Pixel, work at the Google office for eight hours, waste time on my Pixel, walk past the same ads on the way home, watch YouTube videos and do Google searches about random topics, and ask Google to set an alarm before I go to sleep. It's too much. :)
Agreed. And I have a small child. I _really_ don't want them to be forming "relationships" with brands by asking robot assistants named after corporations to do stuff.
Just wait (a few years, perhaps) until you buy or rent that sweet new self-driving car and try to get out of town for some rest and relaxation... you turned off the radio, but that doesn't matter, because Big G uses the windshield as a billboard looming into your personal space, beaming ads to the most captive audience that exists.
It's experiences like this that make me question why anyone would spend this much effort and money setting up a system that isn't as good as what it replaces.
I have a convenient, quiet, perfectly reliable single-bit computer on the wall in every room in my house. It costs $1, can be operated by a 2 year old, is perfectly intuitive, and the only downside I can think of is that occasionally it necessitates I be very slightly less lazy than I might otherwise be able to aspire to, i.e. when I need to go downstairs and turn off a forgotten light.
Considering my house is all using LED lights on hydro power, it's probably better for me to just leave a 5w light on all night than it is to install a Google Home setup here, in terms of my carbon footprint.
That is the key point, for me anyway. "Computer, all lights off" instead of getting out of bed when I've left something on is one of my lazy wins from voice control. Setting multiple timers without touching things with messy hands while faffing in the kitchen is another.
-- is Google misinterpreting your voice? E.g. does it hear a sound it thinks is "play" in the middle of your phrase?
-- or is it some weird statistical model that because of invisible and irrelevant correlations, sometimes concludes it's more likely you're asking for music? Like the song with that title is currently in the top 40, or was played by you in the past, or something?
-- is it because you don't speak with a 20-30 year old white male techbro bay area accented voice, so Google never even bothered testing whether it'd work for you?
(My brother in law is Australian living in the US, and has to use his "sarcastically fake American accent" to be understood on the phone. He, more politely than I, calls it his "phone voice". It's the same voice all my friends here use when parodying American stereotypes... I bet it works on Ok Google too.)
The new Android 11 power menu screen has been a life changer for me. I now control my lights with my phone 90% of the time since it's literally one power button click away, and I always have my phone on me.
It doesn't work with hue lights, so it's worthless to me. Besides, I just have a on/off widget on my home screen which is faster and more reliable than holding the power button.
My fucking Google Home can't even figure out how to play the auto playlist "My Likes" on YouTube Music.
Previously, I could say "Play my Thumbs Up" and it could do so on Google Play Music.
It keeps playing a song called "My Likes". Jesus fucking Christ, Google. If I say "Play my My Likes playlist" something random happens.
Do these guys even use their product? I'm just glad this album didn't come out before the forced migration.
EDIT: Okay, I went to verify it and this has to be the best instance of massive PEBKAC plus some UX donkeyness. The auto playlist is called "Your Likes" so I can get the Assistant to do the right thing by telling her to play her likes (Ok Google, play your likes). What the fuck man. But fine. At least I got it working.
I've suffered with this for months and now I find a solution in the few minutes after posting this.
This is my experience with any 'smart assistant' product that is or ever has been.
It's always frustrating but never particularly hard to find the special incantation that will invoke it to do the thing that you want it to. Overall though it's simply not worth the effort which is probably why I end up using these overwhelmingly complex devices only for their most mundane functions like timers and getting the weather.
Trying for anything moderately complex, and I might as well be asking the dog to do it for me.
My issue is that I found out the special incantations two years ago, and then they changed (I presume) something about the core language processing logic, and now none of that works.
For example I have Philips Hue lights behind the TV/Screen on my living room wall, and I use their "color loop" behind the screen when watching movies etc. The problem is that "TV", "Television" and "Screen" are semi-protected words, so "turn off tv lights" ends up with the TV being turned off 9/10 times. "We" compromised and those lights are now called "screen wall" lights
As for setting certain lights to "the color loop", what used to be a 90% success rate (the other 10% turning my lights to "the color blue/bloo(p)") will now set the lights of the room I'm currently in to the color loop, which is usually the living room, not the screen wall. Also as recently as this summer I used to be able to set the whole house to "the color loop" this feature recently disappeared. The color loop slowly and nearly imperceptibly fades the colors from red to green to blue etc over several minutes. It's technically part of "hue labs" but it's a "beta feature" that's been available in the product now for over three years so I would argue it is core functionality at this point.
Same here. I really wonder how these products sell. The most basic things dont work. And if they are confused, then for real. For about 2 years, when Siri happened to misunderstand the command "Call ..." it would answer "OK, calling you" and actually try to call my own number. This is so weird that it actually feels like someone wrote that piece of code to prank the user.
If these things would actually work, I'd definitely use one regularily. However, whenever I visit an Alexa owner, I realize after a few interactions that I really couldn't be bothered with this stuff.
I think the "taxi" problem is still around with Siri. Put any taxi organisation into your phonebook, and include "taxi" in the name. You will likely not be able to call it with siri, since it insists to search for taxis in your area. Its always the same bug. These things have absolutely no idea about the context. And some hand-crafted rules go haywire after a while, because apparently nobody reviews them. When I got my first iPhone (iOS 5) I put in my date of birth during configuration, and promptly noticed that the german speech synthesizers says Nineteenseventynine when I enter 1979. All aother 4-digit numbers are fine, only 1979 is pronounced english. So apparently someone put this exception in there for a completely bogus reason, and it stayed there. It is still there today, after 8 years.
A UI with next to zero discoverability and an incredibly broad input set ("all speech") must really work for most conceivable inputs, or only die-hard enthusiasts will keep trying.
> It's always frustrating but never particularly hard to find the special incantation that will invoke it to do the thing that you want it to.
not always. I used to use google play music to play music from my own library in the car. any time I asked it to play a moderately obscure artist, it would interpret that as whatever popular artist had a similar name. it would then play the radio station for that artist, since I didn't have the premium subscription. I found some success with spelling out the artist name letter by letter, but even that consistently failed for certain names.
also sometimes I would say "list albums by X" to help me remember the name of what I wanted. no matter what I tried, it would only list three albums "and others". who could want this behavior? if I ask you to list albums, yes I actually want to hear every single album name!
I'm now paying for YT music (since the free version apparently does not support android auto), and it so far it works flawlessly. infuriating.
> It's always frustrating but never particularly hard to find the special incantation that will invoke it to do the thing that you want it to.
I think magical incantations is a perfect way to think about it. Using voice assistants feels more like the land of Harry Potter than the land of technology we live in. It's the flipside of “Any sufficiently advanced technology is indistinguishable from magic”.
It's like someone took the maddening random guesswork user experience of mid-80s text adventures, mixed in mediocre speech-to-text and decided to base an entire product category around it. I absolutely do not get it.
I think we need something extremely close to AGI for natural interfaces to work.
Similar story for self-driving cars: car driving helpers/assistants (lane keeping, etc.) are ok, self-driving cars will be a huge disaster until we are really close to AGI.
These are the things where getting 80-90% there isn't enough. We're smarter than chimps or other animals because we can cover the long tail of events.
My toddler wants to hear a song 1000x, I can't do something like "Play 5 little monkeys jumping on the bed on repeat or in a loop or 10x" I have to tell it each time.
I don’t even bother using them for timers anymore since it’s usually easier to do it on my watch. Voice assistants are limited to navigation requests while driving for me
It's all just a big charade from these companies, as if they ever work. Billions of dollars made from devices that don't even work, but people buy them because they've been lead to believe they work. They're worse than useless, because they give you hope that they actually do what the companies say they do. What a sham. Maybe in another 10-20 years.
I can't understand why I should talk to any of my devices as long as they are as idiotic as they are now and like you describe above doesn't have the slightest idea about how to handle context.
That said Siri feels at least 100 times smarter than Google assistant to me, the below are actual (if somewhat anonymized) examples:
- Google suggestions when I look at the phone at 5am in the morning: "text random friend of a friend that I answered a question for over Telegram" or "call customers project manager". See https://erik.itland.no/tag:aifails for screenshots and more examples. In the years I had access to the future it maybe helped me twice by pointing out it was time to leave for an appointment.
- Siri suggestions are mostly mundane (more or less predictably tells me when to leave for appointments, kids soccer and hockey training etc, suggests picking up kids at kindergarden - although not consistently, suggests sending messages to my wife over our preferred messaging solution, tweeting, or if I drive 5 minutes down to the shopping center: that I should drive home the way I always do etc) but I have never caught it suggesting outright idiotic things like Google, and once this weekend it even suggested something semi-smart (a text message to my wife that was surprisingly close to one I could have written myself to tell her I was on my way home, including one of my rather unusual abbreviations and with good timing :-)
I tried the same and got frustrated. So I said “Siri shut up you piece of garbage” and she added “shut up you piece of garbage” to my grocery list for me. Very helpful.
Also Siri is constantly having problems knowing if I’m talking to my watch or my iPhone, even if my phone is in my pocket.
I don’t use Siri for much, and I certainly don’t have it always listening. But I did hack out a useful Shortcut to record my blood pressure and heart rate to Apple Health.
I say, “Add blood pressure <pause> 120 over 80 plus 60.” Then I use the shortcut to parse the string on the / and +, and record it to Health.
The hard part was finding delimiters that Siri would consistently record as a single character. That and realizing I needed a manual review step to make sure Siri didn’t happily pump garbage into my logs.
Me: Hey Siri, add tomatoes and grapes to the shopping list in Pap-
Siri: I couldn’t find a shopping list, do you want me to create one?
Me: no. Hey Siri, add tomatoes and grap-
Siri: you have to select which app you want to continue <shows 6 apps, including Paprika. I tap Paprika> Sorry, Paprika has not implemented this function yet.
I use Google Keep for that :). Share it with my wife, we both have the widget in our phone desktops (or whatever the home screen is called). Works very nice.
I built my own with a raspberry pi, touch screen, and tiny webserver (on Hetzner). It doesn't use voice. It works perfectly in the kitchen, and perfectly on my phone while shopping.
I've been wondering whether I should re-enable siri and give it an other shot. I use the todos app for that and from time to time it would be convenient to do it handsfree.
I see that I still have no reason to bother, it's going to frustrate me more than anything else (especially with how downhill voiceover has gone in 12).
I was with you until the end. I use Reminders lists that are shared with my wife, we have many many lists for different things. They sync to all my other devices so they're always within 2 seconds of reach. But yes, Siri really can't handle all this, I have to manage it "manually."
I wonder what the difference is. I not only am able to add stuff to my Groceries list 100% of the time but I also have multiple shopping lists so I can say "Add x to my Amazon list" or "Add x to my Costco list" and never have an issue.
My wife and I use Google Keep + assistants and they work fabulously. We have different lists: Costco, Amazon, Home Depot, Grocery, Alcohol, and the only time we ever have issues is when one of use says the wrong thing.
Maybe ask her “What did I say?” At least on iPhone, she’ll show you the textual version (and let you type to modify it). Because it sure sounds like she isn’t understanding the word “Groceries”.
My wife and I have been using an app called Todoist(no affiliation) and have been loving it. Groceries is a shared list either of us can contribute to or check off of at any time.
I don't use any of these "assistants", but curious if you responded with "Groceries List"? Knowing they work on keywords, to Siri, you may not be actually answering her question.
I have endless frustrations with Google Assistant / Google Now / Whatever they call it now. A few examples:
1. I have my phone set up to trust bluetooth in my car and unlock my phone. I get in the car and say "okay google, open spotify" -- this is so that it will continue playing what I was listening to before I left work.
"Okay", she says, and then tells me that she can't do that because my screen is locked. Sometimes this works, and sometimes it does not.
2. When I had Google Play Music it reliably would play random sub-par covers of songs rather than the original, even when I specified the artist.
3. Sometimes it decides to rely on screen input instead of audio controls. I can't do that while I'm driving.
4. It sometimes ends voice input too early or does voice input inconsistently. I've sent messages to my wife saying "I'm on my way home exclamation point" instead of "I'm on my way home!"
5. Commands which have worked for months suddenly stop working.
6. Sometimes my screen stays on, forever, after asking to play music. (OnePlus 7T, Android 10). This does not always happen.
7. Google: "Here's your message, send it?"
Me: Yes
Google: Sits there for a moment and pops up the results for "Yes" in the assistant.
My biggest gripe isn't what it can and can not do. It is the inconsistency that drives me up the wall. I am not a heavy user and most of my requests are because I wish for it to be hands-free in a car with bluetooth audio. I'm sure that this is a harder problem to solve than just me interacting with the phone, but it is a common use case.
> 5. Commands which have worked for months suddenly stop working.
This is my biggest gripe. Whatever magic voodoo ML they use is inconsistent, and it's not clear what level of abstraction this inconsistency is happening in.
What I want is the reliability of Google Assistant's speech to text parsing, combined with a firm, customizable interface. Something like If This Then That, where there are some default commands with a clear reliable command pattern: "send message to George Orwell, we live in your book", and commands can be added.
I'm sure when I had an iPhone 5 I had reliable skipping tracks and sending text messages via Siri when driving.
I use Android these days, but have stopped even attempting to use voice control when driving for all the reasons you've mentioned. It does almost feel like the functionality has gone backwards in recent years.
I don't even bother trying to use any sort of voice commands while in a car unless I'm stopped and the radio is off. Road noise makes voice recognition an impossible task.
This is a very interesting theory, I don't know if this was revealed somehow but considering how consistently terrible the guesses are on assistant devices... I wouldn't be surprised.
Why would a live version of a song pay less royalties than the studio version? Similarly for the instrumental version. The only one that seems like it would pay less royalties is maybe the cover, if the cover band has a weaker royalty rate with the provider.
I think the far more likely explanation is just that these home assistant products suck.
This is a larger GIGO problem with the music industry these days. It's not a shortcoming of voice assistants in particular. Ever use Spotify? Sometimes it seems that 9 out of 10 albums returned by any given search are live versions, heavily-doctored remastered editions, or remix collections.
I have a friend that works for one of the major assistants not made in MountainView... there is explicit logic where it looks if the song title includes "live | cover | instrumental | etc" and tries to find a new version.
Think of the assistants as young children. As a reference, someone I know spent some time in Miami with his wife and 2 young kids. After some time, the young son told his dad that he wanted to go back to "yourhammy". The dad eventually decoded "yourhammy" was the kid's interpretation of Miami as My-hammy. Your Likes => My Likes reminded me of that story.
That is a great story. Honestly, I do intend to treat smart assistants that way. I can accept that they're imperfect and that they're tools that only work in certain ways. I can figure out a way to either make them be useful to me or abandon them if the way is too hard. I'm not asking for perfection.
The thing is predictability, though, and maybe handling the common use cases. It gets frustrating when they get worse. Kids, on the other hand, only get better at understanding you (though perhaps also better at frustrating you on purpose).
To put it simply, I'm happy to make myself perform incantations. I'll say "Ok Google, grooblepuff the bonkman" to get the thing to do the thing. This whole thing has made me understand why wizards and sorcerers chant Accio! and Sectumsempra! and shit like that because if they just said "Bring me my firebolt" no one knows how the AI that runs magic in the world would interpret that.
And you know someone who feels this strongly about the product is pretty bought into it. Like, if I didn't use it so much, I wouldn't be complaining this much.
Regressions in language understanding are common when children are acquiring deeper understanding of the rules of that language. For example a kid may start saying 'letted' for the past-tense of 'let', even if they had used it correctly before.
About 16 years ago, I worked in the IBM Solutions Experience Lab (with the smart kitchen and living room and stuff). One thing I did was to set up the "smart car" simulator, which connected with IBM's cloud stuff at the time, including their voice recognition. I'm testing this diddly-bob and say, "Turn on headlights." The simulacar honks its horn. Loudly.
Unfortunately, there was a tour going through the lab at the time. Some VPs from some company got to watch me honk the horn and then bang my head against the desk.
In the last 16 years, the state of the art has not advanced, as far as recognizing my speech goes. It still don't work.
Cunningham's Law states "the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."
The concept is named after Ward Cunningham, father of the wiki. According to Steven McGeady, the law's author, Wikipedia may be the most well-known demonstration of this law.
Cunningham's Law can be considered the Internet equivalent of the French saying "prêcher le faux pour savoir le vrai" (preach the falsehood to know the truth). Sherlock Holmes has been known to use the principle at times (for example, in The Sign of the Four.)
I find that usually having a second set of eyes when you do it again usually forces the person to focus, whereas when it wasn't working before they probably had something else on the brain or were on autopilot. I like to call it the IT magic touch or job security.
Most "Ok Google" assistance feels like a gimmick to me, but here's something very simple I'd love to have working:
I watch a lot of YouTube on my phone when cooking, I even built a cardboard stand for my phone for this reason. What I want is for YT to respond to these voice commands:
- Pause video.
- Play video.
- Rewind 10 seconds.
- Skip the (expletive) ad <-- ok, I can understand why this one might not work.
Sadly, this doesn't work. And it's the only voice assistance I really need :(
I've been playing around with a Nest Hub Max, and with that you can pause/play a video by holding your hand like a stop sign in front of it. Kind of gimmicky, but occasionally useful.
Was also curious about your use cases on the phone (aside from ad skip), and they actually worked for me. I'm using a pixel 4, though wouldn't think that'd make a difference.
"Hey Google pause/unpause youtube" and "Hey Google skip/rewind 30 seconds on youtube" work on my android phone. So I don't know why you say it doesn't work.
YTM is completely worthless if you have/had a family plan with GPM. My kids basically lost all of their access to music as Youtube itself is not available to kids, no matter how hard you try or how often a parent enters their password.
In classic Internet tradition, you basically need to setup a shadow Google account where you lie about their age and add them to your family account anyway. Thanks Google!
Wait til you discover that the web version of YTM can't chromecast. Yep, Google's own music product can't communicate with Google's own music player product.
The "solution" is to cast the tab. Which means lower bitrate, no music controls, and if you cast to a display device, the whole tab screencasts etc.
I guess it's because nobody important uses the web anymore? Or something?
(Disclaimer: I work at Google. But not on YTM. Most Googlers I know have switched to Spotify, including myself.)
It really has. Google Play Music integrated so well with everything. YT Music has the massive advantage that there is so much more music on YouTube but damn, the integration is shambolic.
I thought the same but it literally doesn't work for me. Just tried it. It plays something random. I tried it on my phone to see what it was doing and it picked "my supermix" once and the song "my likes" the next time.
Outside of setting a timer, I've kinda given up on voice commands.
My tolerance for mistakes for simple commands that work sometimes / are the right command ... but don't work is ultra low.
Like how is it my Android phone will default to just googling "exact valid voice command letter for letter" (it's used in a commercial for cripes sake!) ... and not somehow notice that?
If you take a look at recent neural net papers for dialogue and question answering you'll see amazing things. It is really mind boggling they don't improve the commercial product, maybe it's still too expensive to run for the general public.
Similarly, they say English language Search is powered by Transformers. But when I want to perform searches it often switches the intent to something wrong. It's a blunt tool, not a precision instrument.
Given how they killed off Google Play Music and foisted YouTube Music on us, I have to conclude that Google hates music and wants to do everything they can to ruin the listening experience. It's as if no one involved in product design has even a passing interest in music.
When cancelling my YouTube Premium subscription one of the exit survey questions for "reason for cancellation" was "Unhappy with the YouTube Music App." So clearly there's some awareness of the unpopularity of YTM out there. However while it specifically asked about the app, whereas the real problem is the whole YTM service/experience on all platforms -- in particular the web app. Hopefully the cancellations and the survey results percolate to the right people.
"AI" (quoting since we aren't there yet) will have to deal with what Melanie Mitchell calls "the long tail problem" before we'll accept it as AI - https://youtu.be/NMUqvhuDZtQ . While she gives instances such as "how is an autonomous vehicle supposed to distinguish between a flock of birds on the road versus a small snowman versus a kid versus a rock versus a small animal.. all of which would warrant different responses, if the system can only respond if it has seen this stuff in training before." (paraphrasing) .. and she brings up the example of Teslas getting confused by salt lines laid prior to snow season. (I'm in India, so I wouldn't know such things were done either.)
With natural language and speech interfaces, we face such long tail problems too .. like the Burger King ad triggering a whopper lookup via "ok google".[1]
Come back home from work, I frequently say "Ok Google, text Mary 'home in 10'", only to be told that it can't do that because it doesn't have a "Home" number for "Mary."
I have to re-do it with "Ok Google, text Mary 'I'll be home in 10'", and growl a lot on the inside.
Reminds me of this voice command skecth I've watched recently (careful, loud): https://www.youtube.com/watch?v=4n-GAd33jew
Especially, since the problem wasn't the voice command in your case.
Conversely, I've been pleasantly surprised at how well voice recognition works with weird song/album titles nowadays. "Hey Siri, play 'Zombie by the Cranberries by Andrew Jackson Jihad' by AJJ on Spotify" works exactly as intended.
The assistant has gotten markedly worse at finding music since the home devices came out a few years ago. Songs I used to be able to find by describing vaguely it can now no longer find at all, and it gives me random indie artists for songs I actually know the title og unless I literally spell it out (and even then, it often fails). Woe be to you whose desired song only exists as a Youtube (but not Youtube Music) video. Something behind the scenes has changed, and it's just another reason never to trust Google to maintain their services.
It feels like a trope to even say it but it amazes me how badly Google have handled the transition from Google Play Music to YT Music. For me YT Music is an inferior experience in every single way.
Realisticly, how many of the people working on Google Voice, Siri and Alexa do you think try to use their own service as much as possible day to day? Because just like you, I think it's evident they simply don't. They build for others, not for themselves, and they miss the point 50% of the time.
What we need are people building interfaces for themselves, not people building interface they think are good for others.
Apparently the new brand for speakers and displays with Google Assistant is now "Nest" -- as far as I can tell the Nest Mini and Nest Hub are the same hardware (?) as the Google Home Mini and the Google Home Hub. (And the "Nest Audio" is the new version of what was the original "Google Home".)
It's taken me 10+ attempts to get Google Home to wake me up to a radio station on weekday mornings. Even with the alarm supposedly set, it works maybe 30% of the time, so I end up setting alarms elsewhere too.
I'm astonished at how badly it works compared to Alexa, but sadly, Alexa no longer supports alarms via BBC Sounds, so it's not an option for me.
When Bart, Lisa , Maggie and Marge were at the Mt. Useful Visitors Center, Bart went to a statue of Smokey the Bear. Smokey said "Only WHO can prevent forest fires ?" Bart then pressed the You button and Smokey said "You pressed you, referring to me, that is incorrect. The correct answer is you."
This used to really irritate me. Microsoft called folders things like “my pictures”. The awkward personalisation seemed so gross (I used a Mac). However Spotify’s “Your likes” is even worse, it sounds more like big brother giving me temporary access. I’m not sure how I ended up fixating on this.
I'm still surprised these things have become so ubiquitous. I remember thinking "these things will never catch on", but they caught on massively in spite of their flaws. I guess the ability to do an action without requiring hands is such a powerful draw that it outweighs all the pain that comes with it.
It's probably obvious that the only way these services would get improved is if they actually made money. Alexa is constantly being improved since it's a vector for revenue. Siri and Google Home and Cortana are tinker toys for engineering and R&D.
Google Play Music was so trash but I stuck with it because I had too many tracks sent by friends/my own on it (which Google Play would randomly delete altogether at times).
When they forced the transition to YouTube music, I gave up the service for good.
the google assistant is getting objectively worse and stupider. It worked better a couple years ago. The thing I hate the most about these opaque, closed box systems is the absolute dependence I have on them. I can't just freeze the version of software I'm running that is working. Nearly every app I have that works great, will at some point stop working and regress. Almost universally that is true. I despise the new era we're in where the user has zero control over what is running on their hardware.
Haha, I've lived a life that ranges from lacking toilets to living for free in beautiful apartments owned by movie stars. Trust me, I know my life is great.
But the way I see it is that any Google Assistant PM is going to see this for what it is: someone who is angry because they love not because they hate.
My wife used to always says "Hey Google Play Smooth Jazz" when we got a google home, so since it was hooked up to my spotify, I made a new playlist called "Smooth Jazz" that simply contained "Never Gonna Give You Up" by Rick Astley.
Dear Google/Alexa/Cortana/Siri product managers in the streaming audio space, here is what we actually want:
* "never play that song ever again"
* "add <artist> to my blocked list"
* "set the plays limit for Baby Shark to once per day"
* a feature that prevents interpretation of any phrase leading to the same result, if the response to that result led to a "stop!" or "for fuck's sake!" within the next ten seconds. i.e. stop doing the same incorrect thing over and over again if it's clear that it's not the desired outcome.
* "play that song I said I liked a few days ago"
* "shuffle my favourite songs"
We don't want to block explicit lyrics, some of our favourite songs have swear words in them. We want to block entire artists who are simply awful.
We don't want you to get the song wrong and be forced to yell "stop!" until you shut up. We want to make it clear that not only was that the wrong song, we also never want you to ever think it could possibly be the right song.
We want the magic of our kids being able to play their music on demand, but without the headaches that come from kids being total assholes most of the time.
We want you - our voice assistant - to assist us, not provoke us to rage.
You've made a useful tool. You've made a fun gimmick. But now it's time to make it a pleasant experience, because right now it just plain isn't.
Google has responded correctly to "never play that song again" for me in the past. Same with "hey google, fuck this song". I don't know if it explicitly "downvoted" the song or not, but the song was immediately skipped and I hadn't heard it any time soon after.
That said, I've refrained from using my google puck in recent months and it had been regressing pretty heavily when I did, so I can't guarantee that still works.
Yeah, I'm sure some providers do support some of my feature requests here. But definitely not a single one supports all of them (or I'd have moved to it!) Amazon Music, for example, supports this blocking only for songs in their pre-built playlists. It's little constraints like that that contribute to the frustration.
- stop giving me suggestions. Alternatively, don't give me the same suggestion again within a month of the last time you have it to me.
- you've got the weather, that's great. Now stop repeating the same unwanted information (the current weather) when I ask for multiple forecasts (what's the weather for tomorrow and Saturday -> the weather tomorrow is... Currently it is... There is a weekend advisory in effect...The weather Saturday is... Currently it is... There is a wind advisory...)
- let me specify what information I want when I ask for the weather; it should be trivial to get the wind speed, UV index, it whatever other item I want every time I ask for current weather or forecast, not just if it's going to be unusual.
"OK Google" is the biggest problem I find with Google Home and Google Assistant so far.
Things like "Alexa" and "Siri" are short and seem more practical. When I say "Ok Google", I feel like doing a mouth-exercise. And multiply this extra effort over the number of times you need to apply it in a single session. It is a pain.
This was an option on the Moto X second generation phone. I have no clue why Google decided to remove the feature of setting the wakeup phrase, but at the time they were actually touting it as an upcoming feature (I guess to get me to upgrade my phone).
So being forced to say ”OK Google" is actually a regression from previous capability.
I’m no language expert but the difference I can notice is the word Siri comes from the tip of the tongue and feels a lot easier than google where G comes from the back of the mouth and is more awkward.
And Google Assistant doesn't? We quite often (~once a week, minimum) have our Home mini trigger off of some unrelated conversation. 50/50 on if "google" is ever mentioned, too.
I frequently reply to my kids with "Ok, go $do_a_thing" and I almost always trigger my phone, even if it's in my pocket. The kicker is that I use "Hey Google" when I want the assistant so the "Ok Google" form is not even needed.
My Google Mini gets triggered several times a night when I'm watching video. Maybe 1/3 of the time it's somebody saying "ok", but most of the time it seems to be random.
Vaguely reminiscent of when Marco V released a track called "C:\ del *.mp3"[0] out of frustration towards mp3 rips. Still amusing to me that he didn't quite get the command right.
The other day a song called S01E02.Return.Of.The.Arsonist.720p.HDTV.x264 by Blood Command appeared in my Spotify Discover Weekly. I thought someone messed up, but no...that's the name of song!
Instead of "C:\ del .mp3", it should be "del C:\ .mp3".
Edit: lol, I didn't know that we could format text with asterisks on HN. There should be an asterisk before each ".mp3", but as you can see, if I put them in it just italicizes the text in between them.
The default responses for assistants are getting much worse, to the point where a misheard phrase can do a lot of things you don’t want instead of just stopping immediately.
My favorite misfeature is babbling. Used to be you could just say “off” to stop a misinterpreted command, which worked fine because the assistant didn’t used to babble. Now though, its own rambling follow-ups interfere with its ability to even hear you desperately saying “no! off! stop! shut up! cancel!”. And my new favorite, every command response ending with an unsolicited “BY THE WAY: $thing_i_did_not_ask_for”.
80% of the time my lights would turn on.
20% of the time, I’d be greeted with: “Ok, playing ‘Turn on the Lights’ by Future on Spotify.”
And I’d stand there in the dark, listening to music I don’t like, questioning my life decisions.
Man. Seems kinda relentless.
Submit a PR that responds to "Ok Google, enough!" by degoogling your life including automatically submitting your resignation.
"Oh, let me demonstrate this. Ok Google, turn off the lights."
"Ok Google, turn on the lights now."
"Ok Google, mute."
"Ok Google, turn on the lights."
"Ok Google, turn on the lights, damn it."
"Ok Google, turn, on, the, lights---there you go. I swear it works better yesterday."
Might as well just flip the switch myself, if I have to debate the assistant half of the time in total darkness.
"Alexa, ask phillips hue to turn on the living room light"
vs
clap clap
I have a convenient, quiet, perfectly reliable single-bit computer on the wall in every room in my house. It costs $1, can be operated by a 2 year old, is perfectly intuitive, and the only downside I can think of is that occasionally it necessitates I be very slightly less lazy than I might otherwise be able to aspire to, i.e. when I need to go downstairs and turn off a forgotten light.
Considering my house is all using LED lights on hydro power, it's probably better for me to just leave a 5w light on all night than it is to install a Google Home setup here, in terms of my carbon footprint.
That is the key point, for me anyway. "Computer, all lights off" instead of getting out of bed when I've left something on is one of my lazy wins from voice control. Setting multiple timers without touching things with messy hands while faffing in the kitchen is another.
-- is Google misinterpreting your voice? E.g. does it hear a sound it thinks is "play" in the middle of your phrase?
-- or is it some weird statistical model that because of invisible and irrelevant correlations, sometimes concludes it's more likely you're asking for music? Like the song with that title is currently in the top 40, or was played by you in the past, or something?
(My brother in law is Australian living in the US, and has to use his "sarcastically fake American accent" to be understood on the phone. He, more politely than I, calls it his "phone voice". It's the same voice all my friends here use when parodying American stereotypes... I bet it works on Ok Google too.)
Dead Comment
Hey Pal! for Edinburgh
Hey Jimmy! for Glasgow
Yo! for LA
Oi! for NY
Doode for SF
etc
Previously, I could say "Play my Thumbs Up" and it could do so on Google Play Music.
It keeps playing a song called "My Likes". Jesus fucking Christ, Google. If I say "Play my My Likes playlist" something random happens.
Do these guys even use their product? I'm just glad this album didn't come out before the forced migration.
EDIT: Okay, I went to verify it and this has to be the best instance of massive PEBKAC plus some UX donkeyness. The auto playlist is called "Your Likes" so I can get the Assistant to do the right thing by telling her to play her likes (Ok Google, play your likes). What the fuck man. But fine. At least I got it working.
I've suffered with this for months and now I find a solution in the few minutes after posting this.
It's always frustrating but never particularly hard to find the special incantation that will invoke it to do the thing that you want it to. Overall though it's simply not worth the effort which is probably why I end up using these overwhelmingly complex devices only for their most mundane functions like timers and getting the weather.
Trying for anything moderately complex, and I might as well be asking the dog to do it for me.
For example I have Philips Hue lights behind the TV/Screen on my living room wall, and I use their "color loop" behind the screen when watching movies etc. The problem is that "TV", "Television" and "Screen" are semi-protected words, so "turn off tv lights" ends up with the TV being turned off 9/10 times. "We" compromised and those lights are now called "screen wall" lights
As for setting certain lights to "the color loop", what used to be a 90% success rate (the other 10% turning my lights to "the color blue/bloo(p)") will now set the lights of the room I'm currently in to the color loop, which is usually the living room, not the screen wall. Also as recently as this summer I used to be able to set the whole house to "the color loop" this feature recently disappeared. The color loop slowly and nearly imperceptibly fades the colors from red to green to blue etc over several minutes. It's technically part of "hue labs" but it's a "beta feature" that's been available in the product now for over three years so I would argue it is core functionality at this point.
If these things would actually work, I'd definitely use one regularily. However, whenever I visit an Alexa owner, I realize after a few interactions that I really couldn't be bothered with this stuff.
I think the "taxi" problem is still around with Siri. Put any taxi organisation into your phonebook, and include "taxi" in the name. You will likely not be able to call it with siri, since it insists to search for taxis in your area. Its always the same bug. These things have absolutely no idea about the context. And some hand-crafted rules go haywire after a while, because apparently nobody reviews them. When I got my first iPhone (iOS 5) I put in my date of birth during configuration, and promptly noticed that the german speech synthesizers says Nineteenseventynine when I enter 1979. All aother 4-digit numbers are fine, only 1979 is pronounced english. So apparently someone put this exception in there for a completely bogus reason, and it stayed there. It is still there today, after 8 years.
not always. I used to use google play music to play music from my own library in the car. any time I asked it to play a moderately obscure artist, it would interpret that as whatever popular artist had a similar name. it would then play the radio station for that artist, since I didn't have the premium subscription. I found some success with spelling out the artist name letter by letter, but even that consistently failed for certain names.
also sometimes I would say "list albums by X" to help me remember the name of what I wanted. no matter what I tried, it would only list three albums "and others". who could want this behavior? if I ask you to list albums, yes I actually want to hear every single album name!
I'm now paying for YT music (since the free version apparently does not support android auto), and it so far it works flawlessly. infuriating.
I think magical incantations is a perfect way to think about it. Using voice assistants feels more like the land of Harry Potter than the land of technology we live in. It's the flipside of “Any sufficiently advanced technology is indistinguishable from magic”.
Similar story for self-driving cars: car driving helpers/assistants (lane keeping, etc.) are ok, self-driving cars will be a huge disaster until we are really close to AGI.
These are the things where getting 80-90% there isn't enough. We're smarter than chimps or other animals because we can cover the long tail of events.
Meanwhile, here in Siri Land...
Me: Hey, Siri, add "tomatoes" to my Groceries list.
Siri: OK, Reapreducer. Which list should I add it to?
Me: Groceries.
Siri: OK, Reapreducer. Which list should I add it to?
Me: Groceries.
Siri: OK, Reapreducer. Which list should I add it to?
Me: Groceries.
Siri: OK, Reapreducer. Which list should I add it to?
Me: Groceries.
Siri: OK, Reapreducer. Which list should I add it to?
Me: Cancel.
I've gone back to paper grocery lists. They Just Work™.
That said Siri feels at least 100 times smarter than Google assistant to me, the below are actual (if somewhat anonymized) examples:
- Google suggestions when I look at the phone at 5am in the morning: "text random friend of a friend that I answered a question for over Telegram" or "call customers project manager". See https://erik.itland.no/tag:aifails for screenshots and more examples. In the years I had access to the future it maybe helped me twice by pointing out it was time to leave for an appointment.
- Siri suggestions are mostly mundane (more or less predictably tells me when to leave for appointments, kids soccer and hockey training etc, suggests picking up kids at kindergarden - although not consistently, suggests sending messages to my wife over our preferred messaging solution, tweeting, or if I drive 5 minutes down to the shopping center: that I should drive home the way I always do etc) but I have never caught it suggesting outright idiotic things like Google, and once this weekend it even suggested something semi-smart (a text message to my wife that was surprisingly close to one I could have written myself to tell her I was on my way home, including one of my rather unusual abbreviations and with good timing :-)
Also Siri is constantly having problems knowing if I’m talking to my watch or my iPhone, even if my phone is in my pocket.
Me: Ok, Google. Remind me to leave at 1 O'Clock.
Google: Ok. Do you want to save this? <shows preview of reminder, dings to indicate it's listening>
Me: Yes.
Google: Shows Google search results for "yes" and tosses the reminder.
I say, “Add blood pressure <pause> 120 over 80 plus 60.” Then I use the shortcut to parse the string on the / and +, and record it to Health.
The hard part was finding delimiters that Siri would consistently record as a single character. That and realizing I needed a manual review step to make sure Siri didn’t happily pump garbage into my logs.
> Hey Siri, set a timer for 8 minutes.
> Ok. 8 minutes and counting.
Siri: I couldn’t find a shopping list, do you want me to create one?
Me: no. Hey Siri, add tomatoes and grap-
Siri: you have to select which app you want to continue <shows 6 apps, including Paprika. I tap Paprika> Sorry, Paprika has not implemented this function yet.
Me: Hey Siri, addtomatoesandgrapestotheshoppinglistinPaprika
Siri: ok, I’ve added “tomatoesandgrapes” to the shopping list in Paprika 3
<phone flies out of the window>
Also fun - you say something, the words you said show up on the screen, and then change into completely different words.
Voice Recognition Elevator - ELEVEN!
https://www.youtube.com/watch?v=MNuFcIRlwdc
First aired in 2009
https://github.com/fredley/digital-black
I see that I still have no reason to bother, it's going to frustrate me more than anything else (especially with how downhill voiceover has gone in 12).
https://news.ycombinator.com/item?id=25280410
I don't use any of these "assistants", but curious if you responded with "Groceries List"? Knowing they work on keywords, to Siri, you may not be actually answering her question.
1. I have my phone set up to trust bluetooth in my car and unlock my phone. I get in the car and say "okay google, open spotify" -- this is so that it will continue playing what I was listening to before I left work.
"Okay", she says, and then tells me that she can't do that because my screen is locked. Sometimes this works, and sometimes it does not.
2. When I had Google Play Music it reliably would play random sub-par covers of songs rather than the original, even when I specified the artist.
3. Sometimes it decides to rely on screen input instead of audio controls. I can't do that while I'm driving.
4. It sometimes ends voice input too early or does voice input inconsistently. I've sent messages to my wife saying "I'm on my way home exclamation point" instead of "I'm on my way home!"
5. Commands which have worked for months suddenly stop working.
6. Sometimes my screen stays on, forever, after asking to play music. (OnePlus 7T, Android 10). This does not always happen.
7. Google: "Here's your message, send it?" Me: Yes Google: Sits there for a moment and pops up the results for "Yes" in the assistant.
My biggest gripe isn't what it can and can not do. It is the inconsistency that drives me up the wall. I am not a heavy user and most of my requests are because I wish for it to be hands-free in a car with bluetooth audio. I'm sure that this is a harder problem to solve than just me interacting with the phone, but it is a common use case.
This is my biggest gripe. Whatever magic voodoo ML they use is inconsistent, and it's not clear what level of abstraction this inconsistency is happening in.
What I want is the reliability of Google Assistant's speech to text parsing, combined with a firm, customizable interface. Something like If This Then That, where there are some default commands with a clear reliable command pattern: "send message to George Orwell, we live in your book", and commands can be added.
I use Android these days, but have stopped even attempting to use voice control when driving for all the reasons you've mentioned. It does almost feel like the functionality has gone backwards in recent years.
This isn't great.
Ask for War Pigs? Here’s the live version
Ask for a Come Sail Away? Here’s a terrible cover
Ask for Magic Stick? Here’s an instrumental
I swear it just picks the version of the song that pays the least royalties and plays that instead of the right one...
This is a very interesting theory, I don't know if this was revealed somehow but considering how consistently terrible the guesses are on assistant devices... I wouldn't be surprised.
I think the far more likely explanation is just that these home assistant products suck.
Karaoke time! You can do this.
I am sure they're working on improving it. We've not yet reached late stage capitalism with voice assistants.
The thing is predictability, though, and maybe handling the common use cases. It gets frustrating when they get worse. Kids, on the other hand, only get better at understanding you (though perhaps also better at frustrating you on purpose).
To put it simply, I'm happy to make myself perform incantations. I'll say "Ok Google, grooblepuff the bonkman" to get the thing to do the thing. This whole thing has made me understand why wizards and sorcerers chant Accio! and Sectumsempra! and shit like that because if they just said "Bring me my firebolt" no one knows how the AI that runs magic in the world would interpret that.
And you know someone who feels this strongly about the product is pretty bought into it. Like, if I didn't use it so much, I wouldn't be complaining this much.
Unfortunately, there was a tour going through the lab at the time. Some VPs from some company got to watch me honk the horn and then bang my head against the desk.
In the last 16 years, the state of the art has not advanced, as far as recognizing my speech goes. It still don't work.
The answer to any technical problem will present itself within 30 seconds (sometimes minutes) of asking "Hey, can you take a look at this?"
Cunningham's Law states "the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."
The concept is named after Ward Cunningham, father of the wiki. According to Steven McGeady, the law's author, Wikipedia may be the most well-known demonstration of this law.
Cunningham's Law can be considered the Internet equivalent of the French saying "prêcher le faux pour savoir le vrai" (preach the falsehood to know the truth). Sherlock Holmes has been known to use the principle at times (for example, in The Sign of the Four.)
I watch a lot of YouTube on my phone when cooking, I even built a cardboard stand for my phone for this reason. What I want is for YT to respond to these voice commands:
- Pause video.
- Play video.
- Rewind 10 seconds.
- Skip the (expletive) ad <-- ok, I can understand why this one might not work.
Sadly, this doesn't work. And it's the only voice assistance I really need :(
[1] https://github.com/kalliope-project/kalliope
Was also curious about your use cases on the phone (aside from ad skip), and they actually worked for me. I'm using a pixel 4, though wouldn't think that'd make a difference.
OK Google, install Newpipe.apk
In classic Internet tradition, you basically need to setup a shadow Google account where you lie about their age and add them to your family account anyway. Thanks Google!
The "solution" is to cast the tab. Which means lower bitrate, no music controls, and if you cast to a display device, the whole tab screencasts etc.
I guess it's because nobody important uses the web anymore? Or something?
(Disclaimer: I work at Google. But not on YTM. Most Googlers I know have switched to Spotify, including myself.)
Even Gmail, can’t trust that it won’t be deprecated in the future.
If you say "OK Google play my likes on youtube music" it will play `Your likes`.
If you forget the `on youtube music` part even if your default player is set to Youtube music it will play the `my likes` song.
My tolerance for mistakes for simple commands that work sometimes / are the right command ... but don't work is ultra low.
Like how is it my Android phone will default to just googling "exact valid voice command letter for letter" (it's used in a commercial for cripes sake!) ... and not somehow notice that?
Similarly, they say English language Search is powered by Transformers. But when I want to perform searches it often switches the intent to something wrong. It's a blunt tool, not a precision instrument.
With natural language and speech interfaces, we face such long tail problems too .. like the Burger King ad triggering a whopper lookup via "ok google".[1]
[1] https://www.nytimes.com/2017/04/12/business/burger-king-tv-a...
I start using it. I have two or three commands I do regularly.
One of the commands stops working. I stop using it forever.
I have to re-do it with "Ok Google, text Mary 'I'll be home in 10'", and growl a lot on the inside.
Inferior is much better than "we'll delete your music library on an undisclosed date in December".
Maybe not Apple because their fanatics will apologize for any wrong doing.
Wonder if these AI assistants will lead to the development of a new language.
Exterminatum Horix Abracadabra (Siri, play something nice).
What we need are people building interfaces for themselves, not people building interface they think are good for others.
I'm astonished at how badly it works compared to Alexa, but sadly, Alexa no longer supports alarms via BBC Sounds, so it's not an option for me.
When Bart, Lisa , Maggie and Marge were at the Mt. Useful Visitors Center, Bart went to a statue of Smokey the Bear. Smokey said "Only WHO can prevent forest fires ?" Bart then pressed the You button and Smokey said "You pressed you, referring to me, that is incorrect. The correct answer is you."
This used to really irritate me. Microsoft called folders things like “my pictures”. The awkward personalisation seemed so gross (I used a Mac). However Spotify’s “Your likes” is even worse, it sounds more like big brother giving me temporary access. I’m not sure how I ended up fixating on this.
That said, I end up frustrated with Alexa more often than satisfied.
I get the impression that many Google employees do not use the products they work on.
When they forced the transition to YouTube music, I gave up the service for good.
Deleted Comment
Now only Google can hear my every word via cellphone and Nest.
And look how incompetent Google is. I don't even get creepy suggestions on ads since I deleted facebook.
This is a question that I too ask myself daily with a lot of the software that I deal with and depend on.
But the way I see it is that any Google Assistant PM is going to see this for what it is: someone who is angry because they love not because they hate.
We don't want you to get the song wrong and be forced to yell "stop!" until you shut up. We want to make it clear that not only was that the wrong song, we also never want you to ever think it could possibly be the right song.
We want the magic of our kids being able to play their music on demand, but without the headaches that come from kids being total assholes most of the time.
We want you - our voice assistant - to assist us, not provoke us to rage.
You've made a useful tool. You've made a fun gimmick. But now it's time to make it a pleasant experience, because right now it just plain isn't.
That said, I've refrained from using my google puck in recent months and it had been regressing pretty heavily when I did, so I can't guarantee that still works.
- stop giving me suggestions. Alternatively, don't give me the same suggestion again within a month of the last time you have it to me. - you've got the weather, that's great. Now stop repeating the same unwanted information (the current weather) when I ask for multiple forecasts (what's the weather for tomorrow and Saturday -> the weather tomorrow is... Currently it is... There is a weekend advisory in effect...The weather Saturday is... Currently it is... There is a wind advisory...) - let me specify what information I want when I ask for the weather; it should be trivial to get the wind speed, UV index, it whatever other item I want every time I ask for current weather or forecast, not just if it's going to be unusual.
Why didn't Google ever change it?!
So being forced to say ”OK Google" is actually a regression from previous capability.
https://www.droid-life.com/2014/12/04/moto-x-tip-use-a-whist...
How is that much different from "Hey Google", both are 3 syllable. Alexa is too though I agree just saying the name flows better.
Deleted Comment
[0]https://www.discogs.com/MarcoV-Cdelmp3-Solarize/release/1334...
https://en.wikipedia.org/wiki/List_of_Mr._Robot_episodes
Edit: lol, I didn't know that we could format text with asterisks on HN. There should be an asterisk before each ".mp3", but as you can see, if I put them in it just italicizes the text in between them.
My favorite misfeature is babbling. Used to be you could just say “off” to stop a misinterpreted command, which worked fine because the assistant didn’t used to babble. Now though, its own rambling follow-ups interfere with its ability to even hear you desperately saying “no! off! stop! shut up! cancel!”. And my new favorite, every command response ending with an unsolicited “BY THE WAY: $thing_i_did_not_ask_for”.