Gmail really wants me to say yes

I was the PM on this feature (and these are my views not Googles, as per normal). The truth is actually much more banal. Most people are just really positive over email.

The model is trained to offer reply suggestions that have the highest chance of being accepted, from a large whitelist of the most common short replies. The whitelist contains many negative options. We're optimizing for click through rate. That's it. There's no editorial judgement, definitely not "‘no’ struck someone as too negative and they had to take it out."

We actually experimented with intentionally inserting more negative options to increase diversity. Doing this reliably causes a hit to our metrics.

Discovering this made me pretty happy about the world. Most people are generally pretty friendly to one another (at least over email!).

pja · 6 years ago

> We're optimizing for click through rate. That's it. There's no editorial judgement...

I think Google in general needs to start taking seriously the idea that this is an editorial judgement.

There’s nothing necessarily wrong with optimising for click-through rate, but doing that will skew the options you offer to your viewers in specific ways. Those outcomes are still Google’s responsibility, even if they are the result of choices which are not under Google’s direct control, because Google has made the editorial choice to use click-through rate as their primary metric of choice.

(It’s relatively innocuous in this particular case, but when Google uses click-through to choose between content offered up by other people, Google is setting up feedback loops between consumers, content producers and the algorithm that mediates between the two which can trap the system as a whole in pernicious local attractors in the phase space of possible choices that may be impossible to escape.)

dTal · 6 years ago

Precisely. Worth noting that the same phenomenon of "pernicious local attractors" is responsible for a whole slew of discriminatory behaviors. Blacks are more likely to go to prison! Therefore we'd better frisk this black guy extra thoroughly...

zestyping · 6 years ago

The truth may be banal, but the impact is not. That's the problem with technology. So often, there's no ill intent in the design decisions, but at scale, the effects can be harmful, even massively harmful.

> We're optimizing for click through rate. That's it. There's no editorial judgement...

Something that all of us as technologists need to learn is that this IS an editorial judgement. We do not get to disclaim responsibility just because we delegated that responsibility to an algorithm. It is we who delegated it, we who chose the algorithm and the metrics, and we who are responsible.

"We're optimizing for click through rate" is how we got the proliferation of misinformation on Facebook. It's how we got https://twitter.com/chrislhayes/status/1037831503101579264. It's how we got Pizzagate.

"We're optimizing for click through rate" is simply not good enough in 2019.

We could claim to be naive in 2000, perhaps even 2010. But today? We all know we're playing with fire now, and it doesn't much matter that of course we didn't MEAN to burn the house down. What matters is that we don't give lit matches to children and we know how to not set the house on fire.

prlambert · 6 years ago

I agree and good point on the broader meaning of editorial judgement. If we had seen any negative impact such as the FB/YouTube examples we would absolutely take that into account. I haven't seen anything remotely like yet with regard to smart reply but would want to be the first to see it if it exists.

aeternus · 6 years ago

In this case click through rate does seem to be the right measure. The feature is most useful when it is suggesting the most likely replies.

fenomas · 6 years ago

I read GP as saying they watched the system's behavior and decided that going by click rates gave the best results. I'd say that is the editorial judgment, in the sense that you're talking about.

If they just set things up to go by click rates and assumed that whatever results occurred must necessarily be preferable, then that would be disclaiming responsibility in the way you describe, but GP's version doesn't sound like that.

Arnt · 6 years ago

So what does good mean in this case, then?

It's easy to say that something isn't good enough if you don't elaborate on what the axis good-bad is.

ljm · 6 years ago

Totally agreed. My take on it is that all this work on optimising for business metrics is optimising away humanity, and diversity.

It almost sounds like human nature gets in the way of revenue, so it has to be manipulated into something more business friendly.

We won’t consciously make that choice but it often seems to be the consequence.

B-Con · 6 years ago

It is good enough, 2019 doesn't mean we don't reflect user behavior.

Not every recommender is a moral agent. Auto-suggest and conspiracy videos aren't the same thing.

And technically, yes, there's not such thing as a non-editorialized recommender, but this feels like a pointless nit since there's a world of difference between trying to be a reflection vs intentionally not reflecting based on a human-made moral judgement.

What even is the moral editorialization in auto-suggest? It's obvious we don't want conspiracy videos pushed front and center of educational topics, but how on earth do you balance the number of "yes" vs "no"? That's clearly going to be a completely made up number. If you don't have a clear problem to fight, it's probably best to not editorialize just to say you did.

Deleted Comment

kkarakk · 6 years ago

Pretty sure Gmail is a product not a life giving service, they still do tend to optimise for people continuing to use their service. The societal effects of X technology seems to be the new scare tactic buzzword. It's an automated email response used for emails that don't merit an actual response, the slippery slope fallacy doesn't apply here.

I'd apply that instead to their language autocomplete feature(the grayed out text that appears when you start typing and that you can just press tab to insert into your email) which is scary in the way it tries to shepherd your language

ramblerman · 6 years ago

A lot of outrage and grandiose claims about misuses of technology, but you didn't really tackle the issue at hand.

What dangers do you see in this specific example. People will be too lazy to type no, if they don't agree?

s17n · 6 years ago

From my personal experience I think the metrics produced the correct outcome here. I'm not going to click one of those things unless it gets the tone right. So, if you present one 'yes', one 'no', and one 'maybe' (for example) then you only have one chance to nail the tone (because obviously the only candidate is the one that is giving the answer I want to give) and the odds are pretty low. If on the other hand 95% of all replies are a yes, giving three 'yes' options gives you three chances to right the right tone, 95% of the time.

eridius · 6 years ago

Or you could take this as an opportunity to present first choice of content, then choice of tone. That's two clicks, but it would be a lot more useful if I could pick between yes/no/maybe, and once chosen it would present me with a list of phrases for that choice.

dheera · 6 years ago

One thing I can definitely understand is that most negative responses have to be carefully crafted, usually polite, and highly dependent on the relationship you have with the other person. One-liner positives are often considered polite since you're just accepting their request; one-liner negatives are almost never polite.

While it's exceedingly commonplace to fire off a one-line e-mail such as "Yes I can" or "Sure that works" it's very uncommon to respond with just "No I can't" or "No that doesn't work". More likely you'd respond "That doesn't work but how about ..." or "That doesn't work because ..."

TelmoMenezes · 6 years ago

> Discovering this made me pretty happy about the world. Most people are generally pretty friendly to one another (at least over email!).

Another possibility is that modern wage laborers are forced to wear a mask of positivity and say "yes, absolutely!" or "awesome!" to everything, because they are terrified of losing their jobs and not being able to keep their necks above water, in a world where fewer and fewer people own most of everything.

Perhaps you yourself are wearing this mask of positivity, when you come here and say that this makes you "pretty happy about the world".

The road to hell is paved with good intentions (and blindly following simplistic metrics).

It's healthy to say no sometimes.

pdkl95 · 6 years ago

> wage laborers are forced to wear a mask of positivity

RSA Animate made a short animation about how a culture of forced positivity in white color jobs, and how it contributed to the late-2000s financial meltdown.

"Smile or Die" https://www.youtube.com/watch?v=u5um8QWWRvo

fenomas · 6 years ago

I can't even figure out what author thinks is wrong with the feature.

The post is mainly about services that push users into accepting things (like data sharing or microphone access), but the only example raised is your "suggested replies" feature, which has no apparent connection to what the post criticizes.

valleyer · 6 years ago

I'm sure you considered whether optimizing for click-through rate at the expense of diversity of response "moods" (or whatever) was the right tradeoff. Why did you ultimately decide it was?

prlambert · 6 years ago

We also take qualitative metrics pretty seriously and found that this experience was well received on a more subjective level, high csat in particular. It surpassed the success criteria we set out for the feature and I'm happy about where it landed. Of course it's not going to be ideal for everyone, so we have a setting to disable.

To be clearer on our metrics: In addition to qual there's also a precision/recall tradeoff, so saying ctr alone was an oversimplification. We have coverage targets, but that doesn't really impact the positivity of the suggestions.

ac29 · 6 years ago

As someone who tends to write overly verbose emails, I find this feature to be useful, in that it reminds me how often "Will do." is an entirely appropriate, succinct response.

cperciva · 6 years ago

Is your model trained globally, or does it learn individual users' styles?

I could imagine that globally the most common replies are three forms of "yes" but for any individual user the most common replies are two forms of "yes" and one "no", simply because there's less diversity in one user's replies than in the entire population.

ncmncm · 6 years ago

There is important variation. Some of us looked around and found how to turn it off, immediately. We are the ones who get a different experience from the rest.

That this the only other choice available is also significant, particularly in reference to other cases where they have chosen to make turning something off impossible.

Taking away the extremely popular "classic layout" is an example. It wasn't costing them anything, but getting people used to accepting arbitrary changes forced on them has worked out very well for Apple, and Google has to want some of that. Each passive acquiescence makes the next easier, until it becomes automatic.

prlambert · 6 years ago

It's global

kevinflo · 6 years ago

Facebook deemphasizing clickbait content reliably hurts their metrics. YouTube deemphasizing extremist videos reliably hurts their metrics. This metrics-driven product development was totally understandable (and I even tried to do it) up until a few years ago when we simply got too good at it and it ripped the world apart precisely along the seams where the human brain has bugs that cause these things to increase metrics.

Email prefill replies are a silly and relatively harmless example to make this case, but the point is every last button on every product we use is caked in this and we're realizing too late that metrics identifying what humans want often does not coincide with what humans want being good.

People may not say no as often, but one no can be very powerful. It's harder to say no. Billions of people nudged towards saying yes is not nothing.

antoinevg · 6 years ago

Please please stop optimising everything for click-through rate.

I know it's always the first measurement that springs to mind but I can assure you there are others.

This way you can get other desired behaviours without negatively affecting your metrics.

To put it another way:

The choice of metric is also a design decision.

tobr · 6 years ago

> We're optimizing for click through rate.

> Doing this reliably causes a hit to our metrics.

I take this as one more piece of evidence that thinking in terms of rates and metrics make you blind to the user experience.

jacobolus · 6 years ago

This whole feature is fucking terrible. I hate it. Please provide a way to turn it off.

For that matter, everything about the Gmail redesign is terrible. It has gimmicky crap like this I don’t want, nothing new I do want, is slow as molasses, and is extremely buggy and unreliable. It is constantly making me wait (sometimes minutes on a slow connection) to do actions which used to be fast, constantly misinterpreting my inputs and weirdly scrolling my page around, it takes unreasonable amounts of CPU/memory, and I have lost text I was typing several times. I frequently end up wanting to punch my computer when using the current Gmail, something which never used to happen.

Sometimes I get fed up enough to use the plain html version. But that is not really satisfactory either.

Bringing back the Gmail of circa 10 years ago would be a huge improvement.

Edit: apparently the suggested responses feature can be disabled in settings. That’s something at least.

mrob · 6 years ago

>Bringing back the Gmail of circa 10 years ago would be a huge improvement.

Gmail's "basic HTML" version still works. I use it myself.

Deleted Comment

Marsymars · 6 years ago

If you don't want to switch to a different email provider, why not just use native email clients rather than the gmail web client?

UweSchmidt · 6 years ago

This model could exploit known or unknown bugs in the human psyche, optimizing for click through rate today, or something else tomorrow, while bypassing a conscious human decision, like "yes" and "no".

Imagine a light switch in your house that doesn't go ON an OFF but has a bunch of different options every day depending on what the model is trained to do or who the PM for this "feature" is.

It's ultimately manipulation, a loss of control, an unwinable fight against a learning machine that gets what it wants.

afro88 · 6 years ago

For me anyway, most of my gmail emails are sent to people I am very familiar with (my wife, friends etc.) So the language I use will often contain personal jokes or colloquialisms unique to us or our community. This feature would be so much more useful if it was trained on my own email. The click through rate would be much higher then, at least for me.

gcb0 · 6 years ago

> [...] experimented with [...] negative options [...] causes a hit to our metrics.

I have very bad opinion on this feature overall, but now i am curious; how did you measure people wanting to say No and not finding a quick reply? And then how did you measure the quality of the No available on the test set?

taneq · 6 years ago

There is no 'No', there is only 'Yes' and 'un-Yes'.

(I'd guess they looked at peoples' manually typed responses too though?)

dahdum · 6 years ago

As one of those relentlessly positive email users, thank you for this feature. It’s easily ignored and very useful at times.

I don’t believe you are contributing to the downfall of civilization.

Deleted Comment

_nalply · 6 years ago

Friendly, yes, but noncommittal.

- "Let's meet sometime!"

- "Good idea, we really should meet in person!"

and then nothing happens.

myst · 6 years ago

IIUC the model is global. Why it's not per-user?

OrgNet · 6 years ago

Because they want everyone to think/say the same.

rasz · 6 years ago

Let me Godwin it up

>We're optimizing for click through rate.

and so was Hitler in ~1923. Optimizing something that affects how other people act (same happens on YT Veritasium:My Video Went Viral https://www.youtube.com/watch?v=fHsa9DqmId8) leads to bad outcomes (radicalization, race to the bottom, catering to lowest denominator etc).

MikusR · 6 years ago

This comment explains everything that is wrong about the redesign of Gmail and Google as a whole.