I was the PM on this feature (and these are my views not Googles, as per normal). The truth is actually much more banal. Most people are just really positive over email.
The model is trained to offer reply suggestions that have the highest chance of being accepted, from a large whitelist of the most common short replies. The whitelist contains many negative options. We're optimizing for click through rate. That's it. There's no editorial judgement, definitely not "‘no’ struck someone as too negative and they had to take it out."
We actually experimented with intentionally inserting more negative options to increase diversity. Doing this reliably causes a hit to our metrics.
Discovering this made me pretty happy about the world. Most people are generally pretty friendly to one another (at least over email!).
> We're optimizing for click through rate. That's it. There's no editorial judgement...
I think Google in general needs to start taking seriously the idea that this is an editorial judgement.
There’s nothing necessarily wrong with optimising for click-through rate, but doing that will skew the options you offer to your viewers in specific ways. Those outcomes are still Google’s responsibility, even if they are the result of choices which are not under Google’s direct control, because Google has made the editorial choice to use click-through rate as their primary metric of choice.
(It’s relatively innocuous in this particular case, but when Google uses click-through to choose between content offered up by other people, Google is setting up feedback loops between consumers, content producers and the algorithm that mediates between the two which can trap the system as a whole in pernicious local attractors in the phase space of possible choices that may be impossible to escape.)
Precisely. Worth noting that the same phenomenon of "pernicious local attractors" is responsible for a whole slew of discriminatory behaviors. Blacks are more likely to go to prison! Therefore we'd better frisk this black guy extra thoroughly...
The truth may be banal, but the impact is not. That's the problem with technology. So often, there's no ill intent in the design decisions, but at scale, the effects can be harmful, even massively harmful.
> We're optimizing for click through rate. That's it. There's no editorial judgement...
Something that all of us as technologists need to learn is that this IS an editorial judgement. We do not get to disclaim responsibility just because we delegated that responsibility to an algorithm. It is we who delegated it, we who chose the algorithm and the metrics, and we who are responsible.
"We're optimizing for click through rate" is simply not good enough in 2019.
We could claim to be naive in 2000, perhaps even 2010. But today? We all know we're playing with fire now, and it doesn't much matter that of course we didn't MEAN to burn the house down. What matters is that we don't give lit matches to children and we know how to not set the house on fire.
I agree and good point on the broader meaning of editorial judgement. If we had seen any negative impact such as the FB/YouTube examples we would absolutely take that into account. I haven't seen anything remotely like yet with regard to smart reply but would want to be the first to see it if it exists.
I read GP as saying they watched the system's behavior and decided that going by click rates gave the best results. I'd say that is the editorial judgment, in the sense that you're talking about.
If they just set things up to go by click rates and assumed that whatever results occurred must necessarily be preferable, then that would be disclaiming responsibility in the way you describe, but GP's version doesn't sound like that.
It is good enough, 2019 doesn't mean we don't reflect user behavior.
Not every recommender is a moral agent. Auto-suggest and conspiracy videos aren't the same thing.
And technically, yes, there's not such thing as a non-editorialized recommender, but this feels like a pointless nit since there's a world of difference between trying to be a reflection vs intentionally not reflecting based on a human-made moral judgement.
What even is the moral editorialization in auto-suggest? It's obvious we don't want conspiracy videos pushed front and center of educational topics, but how on earth do you balance the number of "yes" vs "no"? That's clearly going to be a completely made up number. If you don't have a clear problem to fight, it's probably best to not editorialize just to say you did.
Pretty sure Gmail is a product not a life giving service, they still do tend to optimise for people continuing to use their service. The societal effects of X technology seems to be the new scare tactic buzzword. It's an automated email response used for emails that don't merit an actual response, the slippery slope fallacy doesn't apply here.
I'd apply that instead to their language autocomplete feature(the grayed out text that appears when you start typing and that you can just press tab to insert into your email) which is scary in the way it tries to shepherd your language
From my personal experience I think the metrics produced the correct outcome here. I'm not going to click one of those things unless it gets the tone right. So, if you present one 'yes', one 'no', and one 'maybe' (for example) then you only have one chance to nail the tone (because obviously the only candidate is the one that is giving the answer I want to give) and the odds are pretty low. If on the other hand 95% of all replies are a yes, giving three 'yes' options gives you three chances to right the right tone, 95% of the time.
Or you could take this as an opportunity to present first choice of content, then choice of tone. That's two clicks, but it would be a lot more useful if I could pick between yes/no/maybe, and once chosen it would present me with a list of phrases for that choice.
One thing I can definitely understand is that most negative responses have to be carefully crafted, usually polite, and highly dependent on the relationship you have with the other person. One-liner positives are often considered polite since you're just accepting their request; one-liner negatives are almost never polite.
While it's exceedingly commonplace to fire off a one-line e-mail such as "Yes I can" or "Sure that works" it's very uncommon to respond with just "No I can't" or "No that doesn't work". More likely you'd respond "That doesn't work but how about ..." or "That doesn't work because ..."
> Discovering this made me pretty happy about the world. Most people are generally pretty friendly to one another (at least over email!).
Another possibility is that modern wage laborers are forced to wear a mask of positivity and say "yes, absolutely!" or "awesome!" to everything, because they are terrified of losing their jobs and not being able to keep their necks above water, in a world where fewer and fewer people own most of everything.
Perhaps you yourself are wearing this mask of positivity, when you come here and say that this makes you "pretty happy about the world".
The road to hell is paved with good intentions (and blindly following simplistic metrics).
> wage laborers are forced to wear a mask of positivity
RSA Animate made a short animation about how a culture of forced positivity in white color jobs, and how it contributed to the late-2000s financial meltdown.
I can't even figure out what author thinks is wrong with the feature.
The post is mainly about services that push users into accepting things (like data sharing or microphone access), but the only example raised is your "suggested replies" feature, which has no apparent connection to what the post criticizes.
I'm sure you considered whether optimizing for click-through rate at the expense of diversity of response "moods" (or whatever) was the right tradeoff. Why did you ultimately decide it was?
We also take qualitative metrics pretty seriously and found that this experience was well received on a more subjective level, high csat in particular. It surpassed the success criteria we set out for the feature and I'm happy about where it landed. Of course it's not going to be ideal for everyone, so we have a setting to disable.
To be clearer on our metrics: In addition to qual there's also a precision/recall tradeoff, so saying ctr alone was an oversimplification. We have coverage targets, but that doesn't really impact the positivity of the suggestions.
As someone who tends to write overly verbose emails, I find this feature to be useful, in that it reminds me how often "Will do." is an entirely appropriate, succinct response.
Is your model trained globally, or does it learn individual users' styles?
I could imagine that globally the most common replies are three forms of "yes" but for any individual user the most common replies are two forms of "yes" and one "no", simply because there's less diversity in one user's replies than in the entire population.
There is important variation. Some of us looked around and found how to turn it off, immediately. We are the ones who get a different experience from the rest.
That this the only other choice available is also significant, particularly in reference to other cases where they have chosen to make turning something off impossible.
Taking away the extremely popular "classic layout" is an example. It wasn't costing them anything, but getting people used to accepting arbitrary changes forced on them has worked out very well for Apple, and Google has to want some of that. Each passive acquiescence makes the next easier, until it becomes automatic.
Facebook deemphasizing clickbait content reliably hurts their metrics. YouTube deemphasizing extremist videos reliably hurts their metrics. This metrics-driven product development was totally understandable (and I even tried to do it) up until a few years ago when we simply got too good at it and it ripped the world apart precisely along the seams where the human brain has bugs that cause these things to increase metrics.
Email prefill replies are a silly and relatively harmless example to make this case, but the point is every last button on every product we use is caked in this and we're realizing too late that metrics identifying what humans want often does not coincide with what humans want being good.
People may not say no as often, but one no can be very powerful. It's harder to say no. Billions of people nudged towards saying yes is not nothing.
This whole feature is fucking terrible. I hate it. Please provide a way to turn it off.
For that matter, everything about the Gmail redesign is terrible. It has gimmicky crap like this I don’t want, nothing new I do want, is slow as molasses, and is extremely buggy and unreliable. It is constantly making me wait (sometimes minutes on a slow connection) to do actions which used to be fast, constantly misinterpreting my inputs and weirdly scrolling my page around, it takes unreasonable amounts of CPU/memory, and I have lost text I was typing several times. I frequently end up wanting to punch my computer when using the current Gmail, something which never used to happen.
Sometimes I get fed up enough to use the plain html version. But that is not really satisfactory either.
Bringing back the Gmail of circa 10 years ago would be a huge improvement.
Edit: apparently the suggested responses feature can be disabled in settings. That’s something at least.
This model could exploit known or unknown bugs in the human psyche, optimizing for click through rate today, or something else tomorrow, while bypassing a conscious human decision, like "yes" and "no".
Imagine a light switch in your house that doesn't go ON an OFF but has a bunch of different options every day depending on what the model is trained to do or who the PM for this "feature" is.
It's ultimately manipulation, a loss of control, an unwinable fight against a learning machine that gets what it wants.
For me anyway, most of my gmail emails are sent to people I am very familiar with (my wife, friends etc.) So the language I use will often contain personal jokes or colloquialisms unique to us or our community. This feature would be so much more useful if it was trained on my own email. The click through rate would be much higher then, at least for me.
> [...] experimented with [...] negative options [...] causes a hit to our metrics.
I have very bad opinion on this feature overall, but now i am curious; how did you measure people wanting to say No and not finding a quick reply? And then how did you measure the quality of the No available on the test set?
and so was Hitler in ~1923. Optimizing something that affects how other people act (same happens on YT Veritasium:My Video Went Viral https://www.youtube.com/watch?v=fHsa9DqmId8) leads to bad outcomes (radicalization, race to the bottom, catering to lowest denominator etc).
Another explanation: "no" responses require more information.
If you are saying yes, then that's basically all you need to say. If you're saying no, you usually also provide a reason, so you probably wouldn't use any of the short, generic "no" responses Gmail would come up with.
This was my initial reaction. Every time I use the suggested replies feature it is in the affirmative, because that is an easy response. No takes some wordsmithing I'd rather do myself.
Agreed, there are dozens of ways to say no, and choosing the right one for a particular email is a much more important choice than what phrase you use to say yes.
I’m a little... confused, is this blog post informed by one data point and then extrapolates a set of conclusions / reasons / motives? I can understand the sentiments of the article and poster, but I am not convinced of the premises that Gmail always prepares affirmative responses, and I’d love to see more data on that point specifically.
I'm also not convinced. I frequently see Gmail suggestions that I would not consider positive, for instance: I'm not interested. What is this about? Who are you?
I use gmail and I value privacy, I pay for g-suite though.
I tried all of them, fastmail, protonmail, tutanota, Runbox and a few others, and none of them came close to using gmail. So I figured, what the hell, g-suite is around the same price and it gets me 30gb of cloud storage as well.
I wish google was a little more clear on where the privacy starts and ends though. Like if you want to use google home with a g-suite, you have to enable a bunch of tracking. I know this has to do with the fact that google wants to separate business from personal use, but really, I don’t want two e-mails.
I used to use gauite (and still do at work) but switched to fastmail for my private stuff, and it feels faster and better in every possible way, with the exception of not having google docs or equiv
Fastmail also based out of Australia. There are dimensions to hypocrisy, not all of which are relevant from every perspective. Product choice is complexly multi-dimensional. A better response might involve asking why such a proclaimed privacy-conscious consumer remains with Gmail.
I suppose you are talking about "Assistance and Access Bill"? I don't see how this relates to this discussion, which is about your email provider using your email text for commercial profits.
That said, if you prefer US government to Australian government, I am sure there are also privacy-preserving email providers in US as well. And there is also Swiss Protonmail, outside of both US and UK jurisdictionm, but it is more expensive.
They could be paying for Gmail (through GSuite). Afair the ToS for GSuite claims they won't harvest your data.
Anecdotally, I have my private mail at Fastmail and my work mail at GMail and I've never seen an ad related to work emails (or private emails ofc, but that was to be expected)
This is kind of meta because I’ve turned this
autocomplete feature off, I’m sure of it.
Did I just do it on my phone? Did my wifi
blip so the AJAX didn’t work? I certainly
didn’t turn it on.
This strike home hard for me, as a pervasive problem. So many tech companies conveniently "forget" about user preferences all the time.
For example, on my kobo e-reader, I'm positive I've disabled auto-update. And yet, one day few weeks ago it auto-updated and the new version stopped displaying side-loaded .epub files (from project Guttenberg). No rollback, no appeal. Seller's 2-year warranty has recently expired. Now essentially I have a modestly expensive semi-brick that will only let me read two titles purchased via kobo store, and nothing else
I have an issue with something similar. I often get emails from email lists where I'm pretty sure I have previously unsubscribed. But I can never be sure if I actually did and they're ignoring it, or if I unsubscribed from something else.
This is the new superstition of the digital age: instead of saying "huh, that's funny" when some autocomplete suggestions strike you as odd, invent a myth about your personal relationship to the gods of computing (the big tech firms) to explain it.
There are a lot of things about the computers we use that we simply don't understand unless someone tells us (because we weren't involved and don't have access to the code) and yet I guess many people want to pretend that they know what's going on?
As a non-native English speaker these sentences are very confusing to me. I'm not used to English in an informal setting, and they seem overly informal answers with implied meanings.
They're relatively informal, but they all generally mean "yes" and I would not be against using a response like this when replying to a friend or close acquaintance.
The model is trained to offer reply suggestions that have the highest chance of being accepted, from a large whitelist of the most common short replies. The whitelist contains many negative options. We're optimizing for click through rate. That's it. There's no editorial judgement, definitely not "‘no’ struck someone as too negative and they had to take it out."
We actually experimented with intentionally inserting more negative options to increase diversity. Doing this reliably causes a hit to our metrics.
Discovering this made me pretty happy about the world. Most people are generally pretty friendly to one another (at least over email!).
I think Google in general needs to start taking seriously the idea that this is an editorial judgement.
There’s nothing necessarily wrong with optimising for click-through rate, but doing that will skew the options you offer to your viewers in specific ways. Those outcomes are still Google’s responsibility, even if they are the result of choices which are not under Google’s direct control, because Google has made the editorial choice to use click-through rate as their primary metric of choice.
(It’s relatively innocuous in this particular case, but when Google uses click-through to choose between content offered up by other people, Google is setting up feedback loops between consumers, content producers and the algorithm that mediates between the two which can trap the system as a whole in pernicious local attractors in the phase space of possible choices that may be impossible to escape.)
> We're optimizing for click through rate. That's it. There's no editorial judgement...
Something that all of us as technologists need to learn is that this IS an editorial judgement. We do not get to disclaim responsibility just because we delegated that responsibility to an algorithm. It is we who delegated it, we who chose the algorithm and the metrics, and we who are responsible.
"We're optimizing for click through rate" is how we got the proliferation of misinformation on Facebook. It's how we got https://twitter.com/chrislhayes/status/1037831503101579264. It's how we got Pizzagate.
"We're optimizing for click through rate" is simply not good enough in 2019.
We could claim to be naive in 2000, perhaps even 2010. But today? We all know we're playing with fire now, and it doesn't much matter that of course we didn't MEAN to burn the house down. What matters is that we don't give lit matches to children and we know how to not set the house on fire.
If they just set things up to go by click rates and assumed that whatever results occurred must necessarily be preferable, then that would be disclaiming responsibility in the way you describe, but GP's version doesn't sound like that.
It's easy to say that something isn't good enough if you don't elaborate on what the axis good-bad is.
It almost sounds like human nature gets in the way of revenue, so it has to be manipulated into something more business friendly.
We won’t consciously make that choice but it often seems to be the consequence.
Not every recommender is a moral agent. Auto-suggest and conspiracy videos aren't the same thing.
And technically, yes, there's not such thing as a non-editorialized recommender, but this feels like a pointless nit since there's a world of difference between trying to be a reflection vs intentionally not reflecting based on a human-made moral judgement.
What even is the moral editorialization in auto-suggest? It's obvious we don't want conspiracy videos pushed front and center of educational topics, but how on earth do you balance the number of "yes" vs "no"? That's clearly going to be a completely made up number. If you don't have a clear problem to fight, it's probably best to not editorialize just to say you did.
Deleted Comment
I'd apply that instead to their language autocomplete feature(the grayed out text that appears when you start typing and that you can just press tab to insert into your email) which is scary in the way it tries to shepherd your language
What dangers do you see in this specific example. People will be too lazy to type no, if they don't agree?
While it's exceedingly commonplace to fire off a one-line e-mail such as "Yes I can" or "Sure that works" it's very uncommon to respond with just "No I can't" or "No that doesn't work". More likely you'd respond "That doesn't work but how about ..." or "That doesn't work because ..."
Another possibility is that modern wage laborers are forced to wear a mask of positivity and say "yes, absolutely!" or "awesome!" to everything, because they are terrified of losing their jobs and not being able to keep their necks above water, in a world where fewer and fewer people own most of everything.
Perhaps you yourself are wearing this mask of positivity, when you come here and say that this makes you "pretty happy about the world".
The road to hell is paved with good intentions (and blindly following simplistic metrics).
It's healthy to say no sometimes.
RSA Animate made a short animation about how a culture of forced positivity in white color jobs, and how it contributed to the late-2000s financial meltdown.
"Smile or Die" https://www.youtube.com/watch?v=u5um8QWWRvo
The post is mainly about services that push users into accepting things (like data sharing or microphone access), but the only example raised is your "suggested replies" feature, which has no apparent connection to what the post criticizes.
To be clearer on our metrics: In addition to qual there's also a precision/recall tradeoff, so saying ctr alone was an oversimplification. We have coverage targets, but that doesn't really impact the positivity of the suggestions.
I could imagine that globally the most common replies are three forms of "yes" but for any individual user the most common replies are two forms of "yes" and one "no", simply because there's less diversity in one user's replies than in the entire population.
That this the only other choice available is also significant, particularly in reference to other cases where they have chosen to make turning something off impossible.
Taking away the extremely popular "classic layout" is an example. It wasn't costing them anything, but getting people used to accepting arbitrary changes forced on them has worked out very well for Apple, and Google has to want some of that. Each passive acquiescence makes the next easier, until it becomes automatic.
Email prefill replies are a silly and relatively harmless example to make this case, but the point is every last button on every product we use is caked in this and we're realizing too late that metrics identifying what humans want often does not coincide with what humans want being good.
People may not say no as often, but one no can be very powerful. It's harder to say no. Billions of people nudged towards saying yes is not nothing.
I know it's always the first measurement that springs to mind but I can assure you there are others.
This way you can get other desired behaviours without negatively affecting your metrics.
To put it another way:
The choice of metric is also a design decision.
> Doing this reliably causes a hit to our metrics.
I take this as one more piece of evidence that thinking in terms of rates and metrics make you blind to the user experience.
For that matter, everything about the Gmail redesign is terrible. It has gimmicky crap like this I don’t want, nothing new I do want, is slow as molasses, and is extremely buggy and unreliable. It is constantly making me wait (sometimes minutes on a slow connection) to do actions which used to be fast, constantly misinterpreting my inputs and weirdly scrolling my page around, it takes unreasonable amounts of CPU/memory, and I have lost text I was typing several times. I frequently end up wanting to punch my computer when using the current Gmail, something which never used to happen.
Sometimes I get fed up enough to use the plain html version. But that is not really satisfactory either.
Bringing back the Gmail of circa 10 years ago would be a huge improvement.
Edit: apparently the suggested responses feature can be disabled in settings. That’s something at least.
Gmail's "basic HTML" version still works. I use it myself.
Deleted Comment
Imagine a light switch in your house that doesn't go ON an OFF but has a bunch of different options every day depending on what the model is trained to do or who the PM for this "feature" is.
It's ultimately manipulation, a loss of control, an unwinable fight against a learning machine that gets what it wants.
I have very bad opinion on this feature overall, but now i am curious; how did you measure people wanting to say No and not finding a quick reply? And then how did you measure the quality of the No available on the test set?
(I'd guess they looked at peoples' manually typed responses too though?)
I don’t believe you are contributing to the downfall of civilization.
Deleted Comment
- "Let's meet sometime!"
- "Good idea, we really should meet in person!"
and then nothing happens.
>We're optimizing for click through rate.
and so was Hitler in ~1923. Optimizing something that affects how other people act (same happens on YT Veritasium:My Video Went Viral https://www.youtube.com/watch?v=fHsa9DqmId8) leads to bad outcomes (radicalization, race to the bottom, catering to lowest denominator etc).
If you are saying yes, then that's basically all you need to say. If you're saying no, you usually also provide a reason, so you probably wouldn't use any of the short, generic "no" responses Gmail would come up with.
It's ironic coming from person with @gmail.com email. Fastmail's service is $5/month, but they chose the free one which harvests their data instead.
If even the privacy advocates are not using their own advice, I doubt freemium is going to be over anytime soon.
I tried all of them, fastmail, protonmail, tutanota, Runbox and a few others, and none of them came close to using gmail. So I figured, what the hell, g-suite is around the same price and it gets me 30gb of cloud storage as well.
I wish google was a little more clear on where the privacy starts and ends though. Like if you want to use google home with a g-suite, you have to enable a bunch of tracking. I know this has to do with the fact that google wants to separate business from personal use, but really, I don’t want two e-mails.
Fastmail also based out of Australia. There are dimensions to hypocrisy, not all of which are relevant from every perspective. Product choice is complexly multi-dimensional. A better response might involve asking why such a proclaimed privacy-conscious consumer remains with Gmail.
That said, if you prefer US government to Australian government, I am sure there are also privacy-preserving email providers in US as well. And there is also Swiss Protonmail, outside of both US and UK jurisdictionm, but it is more expensive.
Anecdotally, I have my private mail at Fastmail and my work mail at GMail and I've never seen an ad related to work emails (or private emails ofc, but that was to be expected)
For example, on my kobo e-reader, I'm positive I've disabled auto-update. And yet, one day few weeks ago it auto-updated and the new version stopped displaying side-loaded .epub files (from project Guttenberg). No rollback, no appeal. Seller's 2-year warranty has recently expired. Now essentially I have a modestly expensive semi-brick that will only let me read two titles purchased via kobo store, and nothing else
There are a lot of things about the computers we use that we simply don't understand unless someone tells us (because we weren't involved and don't have access to the code) and yet I guess many people want to pretend that they know what's going on?
As a non-native English speaker these sentences are very confusing to me. I'm not used to English in an informal setting, and they seem overly informal answers with implied meanings.
Hanlon's razor.