(Note the thread displays differently now because Twitter have changed their cropping algorithm)
Originally @colinmadland was trying to post examples of how Zoom virtual background had removed his black colleagues head, however when he posted the side-by-side images (with heads) on Twitter, twitter always cropped out his colleague and just showed him, even if he horizontally swapped the image. So, while trying to talk about an apparently racist algorithm in Zoom, he was scuppered by an apparently racist algorithim in Twitter.
The web version shows Mitch, but the app shows a blank white (which is at the center of the image, meaning it didn’t try to crop to one of the faces). I’m on iOS.
That example is from last September, so it doesn't say anything on if it is improved or not. They probably generate the cropping once, on posting the tweet.
The Zoom example is a racist algorithm. It was built using a against a data set that produced different results for different skin colours.
The Twitter example was not a racist algorithm. It would consistently pick one head over the other, but it had nothing to do with the skin colour. It might preference the black head for some pairs, and the white head for other pairs.
In the second example people anthropomorphised the algorithm. They assumed that any example of a preference for an images was due to a racial bias. It was easy to keep feeding it images to get to an input that confirmed this assumption.
I find that calling it a `racist algorithm` doesn't really do it any good unless the behaviour was intentional. This is a case of poor training data the same as google image classification messing up with tags.
Plenty of racism in humans isn't malicious, either, but is just a byproduct of bad training data. The outcome is bad regardless of what was intended, and it's the outcome that matters.
Let's say your company decides to use AI to assist in hiring. It turns out the algorithm used is biased when it comes to candidates' race. If there is a disparate impact[1] on protected classes in hiring that's unrelated to job performance, intentions don't matter in the eyes of labor law, what matters are the effects.
ok we need pictures of human faces, luckily I've got all these white people here!
on edit: it was racist in result, in that it empowers a racist system, it was not racist in intention - as in the people gathering the training data probably didn't say hey how can we empower a racist system with this?
Racism as a concept has evolved in meaning. It used to only include the most severe intentional cases of bigoted behavior, whereas now it also includes less obvious biases that lead to preventable but not necessarily intentional instances of everyday prejudice and bigotry.
I am for one happy we have unneutered the word from having to reach a bar so high, it wouldn't apply to most bigotry, but it is also unfortunate for people who have not caught on and believe calling a thing racist is a damning statement of evil intent, but it really is not anymore. Or those that insist on meaning of words remaining static forever.
So, I can choose to see only un-cropped images on my TL, and the author can see a preview of the algorithm's crop before they tweet -- but a glaring omission is simply exposing a crop tool to the author. The model works by choosing a point on which to center the crop. Why can't you give user's a UI to do the same? "Tap a focal point in the image, or let our robot decide!"
The blog post mentions several times how ML might not be the right choice for cropping; but their conclusion was...to keep using ML for cropping. I hope someone got a nice bonus for building the model!
> but their conclusion was...to keep using ML for cropping
My takeaway from the article was that their conclusion was to remove cropping from the product, starting incrementally on iOS. (I got cropping removed on Android as well recently). That seems like the opposite of "keep using ML for cropping"?
I can't really see any down side, besides maybe a little bit of developer time, to allowing users to see a preview of the crop and optionally override. It's done all the time in other places.
It's probably a bit harder at Twitter's unique scale. They have an incredibly high throughput of new posts and a large portion of these posts include between 1-4 images that need cropping.
Image cropping algorithms are hard. When we made our first one for reddit, it used this algorithm:
Find the larger dimension of the image. Remove either the first or last row/column of pixels, based on which had less entropy. Keep repeating until the image was a square.
The most notable "bias" of this algorithm was the male gaze problem identified in the article. Women's breasts tended to have more entropy than their face, so the algorithm focused on that since it was optimized for entropy. To solve the problem, we added software that allowed the user to choose their thumbnail, but not a lot of users used it or even realized they could.
I assume they've since upgraded it to use more AI with actual face detection and so on, but at the time, doing face detection on every image was computational infeasible.
Breasts shouldn't have more entropy than face. Perhaps the reason is due to the breasts being in the middle of the picture, so the face gets being compared to bottom rows more frequently?
Why not? Shirts might have flashy patterns, differently colored fabrics, alternating skin and shirt. On a row-per-row basis I can see the chest area being more entropic than a face with an even skin tone.
edit: I googled "woman" and selected random pictures which showed the whole upper body, entropy summed over each row to the right: https://imgur.com/a/oVB57gu
And even just trying to access a post/thread on mobile. I already clicked the link, then I have to click it once more to say “yes I want to do the browser thing I explicitly did”, then another time still to actually show more than half a screen of content.
I don't think the claim is that the behaviour is caused by "male gaze", but rather that the outcome of always focusing the cropping around any visible cleavage is functionally identical.
Whether or not it's unsupervised, whether or not it's sexist, it seems that a thumbnail focusing on a person's face rather than their breasts is typically going to be more desirable. Depending on context, of course.
"We began testing a new way to display standard aspect ratio photos... without the saliency algorithm crop. The goal of this was to give people more control over how their images appear while also improving the experience of people seeing the images in their timeline. After getting positive feedback on this experience, we launched this feature to everyone."
So the solution all along was to give users the ability to crop their own photos. Why wasn't this the original way of doing things?
Instead of forcing a complicated algorithm into the Twitter experience, it seems to me that the solution all along was just to let users do what they do best-- make tweets for themselves. This incident strikes me as a major failing of AI: We are so eager to shoehorn AI/ML into our products that we lose sight of what actually makes users happy.
What’s really remarkable is that giving users the ability to manually crop would be an amazing way to gather data on optimal cropping, which they could have used to train their model down the road. I can only imagine how much more time and money went into gathering eye tracking data.
If you were trying for real bias in your cropping algorithm, I would suspect training it on what the average, unconsciously-biased user thinks is the best crop nearly guarantees it.
> Why wasn't this the original way of doing things?
Someone wanted to do a feature so they could get promoted. Probably with some mumbo jumbo about how it reduces the number of clicks to create a tweet and thus increases revenue.
> One of our conclusions is that not everything on Twitter is a good candidate for an algorithm, and in this case, how to crop an image is a decision best made by people.
This seems like it should have been a foregone conclusion. What was the driving force in the first place to think cropping images with an AI model was desirable? Seems like ML was a solution looking for a problem here, and I'm glad they've realised that.
Right but... we've been cropping images in web applications since... y'know, pretty much ever. Using ML to do this was always pretty ridiculous overkill. Give the users an image cropper, and be done with it.
All those examples show large improvement. Of course they might cherrypick images with large improvement for their blog advertising the feature. But still, it illustrates why people would think it's a good idea.
Of course they don't seem to consider the idea of not cropping at all.
I'm more forgiving about corporate jargon than most. A lot of it really does help optimize communication for the situations you encounter in corporate work.
But "learnings" is literally, exactly, just a synonym for "lessons." Can we not?
I disagree, "Sharing lessons..." would mean "here is an educational resource that we have created, as teachers, for an audience of students". I think "Lessons learned..." is closer to what you mean to suggest, and "Learnings..." is more concise (this is from Twitter, after all).
It's a neologism, or possibly the resurrection of a long unused form - I don't know exactly how we came about it but I agree completely that one of the meanings of "lesson" is "a thing which has been learned".
In my experience, there's a tendency toward folksiness in certain varieties of corpspeak that causes rejection of "formal-sounding" terms and repurposing of "plainer" forms to create new words, hence lessons = learnings, protégé = mentee, and so on.
/rant but I feel like talking about percentage points of difference is always hard for humans. For example:
> In comparisons of men and women, there was an 8% difference from demographic parity in favor of women.
would have been clearer (and more correct) as "an 8 percentage-point difference from demographic parity". That 8 pp difference though is a 16% "relative" difference (58/50), or more starkly "The algorithm chose the woman almost 40% more often" (58/42 => 1.38). That said, the diagram in the post [1] is much easier for humans to parse and say "wow, that looks pretty far off!".
tl;dr: A number like 8% sounds like "no big deal", but 8 percentage points (on each side) is a big deal!
> In comparisons of black and white individuals, there was a 4% difference from demographic parity in favor of white individuals.
It's hard to believe that the bias was only 4% - there were a lot of people testing with images that they sourced themselves, and the preference for white people seemed much closer to 80-20.
The paper authors mention that their training data is from Wikidata (pictures of celebrities). I wonder if the types of photos in that dataset are meaningfully representative of the kinds of photos that people usually post to Twitter.
> It's hard to believe that the bias was only 4% - there were a lot of people testing with images that they sourced themselves, and the preference for white people seemed much closer to 80-20.
It's very easy to believe the bias was near-zero given you are citing highly motivated people on Twitter cherrypicking from thousands of examples and a little baffling you find that to be more credible than controlled systematic experimentation; note, for example, the extremely striking fact that the fuss completely missed the other bias they found which was several times larger - showing how totally useless people on social media are for testing these things and how they can conjure up "80-20" biases which don't exist.
> highly motivated people on Twitter cherrypicking
One of the reference threads that identifies the issue happened on it by accident highlighting a surprising experience in another product (Zoom). Believe it or not, people who care about this stuff are not looking for stuff to complain about, we’re tired and overwhelmed. And I would hope that people who, upon discovering a vulnerability find and catalogue the ways it can be exploited, would be celebrated here.
I admit that it wasn't especially rigorous testing, but I personally tested this along with other people I knew. I used real photos from my camera and my wife's (we are of different races), featuring photos of ourselves, friends, and family.
I of course hope that the systems I use aren't racist against my loved ones. I am motivated to confirm whether or not they are, but I didn't go on to parlay my findings into an essay for clout. I gained nothing from doing this, except the knowledge that Twitter was suckier than I knew.
https://twitter.com/colinmadland/status/1307111816250748933
(Note the thread displays differently now because Twitter have changed their cropping algorithm)
Originally @colinmadland was trying to post examples of how Zoom virtual background had removed his black colleagues head, however when he posted the side-by-side images (with heads) on Twitter, twitter always cropped out his colleague and just showed him, even if he horizontally swapped the image. So, while trying to talk about an apparently racist algorithm in Zoom, he was scuppered by an apparently racist algorithim in Twitter.
It was widely covered in the press at the time https://www.theguardian.com/technology/2020/sep/21/twitter-a...
The Twitter example was not a racist algorithm. It would consistently pick one head over the other, but it had nothing to do with the skin colour. It might preference the black head for some pairs, and the white head for other pairs.
In the second example people anthropomorphised the algorithm. They assumed that any example of a preference for an images was due to a racial bias. It was easy to keep feeding it images to get to an input that confirmed this assumption.
[1] https://en.wikipedia.org/wiki/Disparate_impact
ok we need pictures of human faces, luckily I've got all these white people here!
on edit: it was racist in result, in that it empowers a racist system, it was not racist in intention - as in the people gathering the training data probably didn't say hey how can we empower a racist system with this?
I am for one happy we have unneutered the word from having to reach a bar so high, it wouldn't apply to most bigotry, but it is also unfortunate for people who have not caught on and believe calling a thing racist is a damning statement of evil intent, but it really is not anymore. Or those that insist on meaning of words remaining static forever.
The blog post mentions several times how ML might not be the right choice for cropping; but their conclusion was...to keep using ML for cropping. I hope someone got a nice bonus for building the model!
My takeaway from the article was that their conclusion was to remove cropping from the product, starting incrementally on iOS. (I got cropping removed on Android as well recently). That seems like the opposite of "keep using ML for cropping"?
Cute puppy nose -> click -> porn ad.
Deleted Comment
Find the larger dimension of the image. Remove either the first or last row/column of pixels, based on which had less entropy. Keep repeating until the image was a square.
The most notable "bias" of this algorithm was the male gaze problem identified in the article. Women's breasts tended to have more entropy than their face, so the algorithm focused on that since it was optimized for entropy. To solve the problem, we added software that allowed the user to choose their thumbnail, but not a lot of users used it or even realized they could.
I assume they've since upgraded it to use more AI with actual face detection and so on, but at the time, doing face detection on every image was computational infeasible.
edit: I googled "woman" and selected random pictures which showed the whole upper body, entropy summed over each row to the right: https://imgur.com/a/oVB57gu
Someone wrote and tested this algorithm, and either:
a) didn't test it on pictures of women, or,
b) didn't notice that it cropped breasts rather than faces, or,
c) didn't think that was a problem.
If they had noticed and cared, this wouldn't be the approach in use.
Clearly there is human-derived input in the system (otherwise... What's the point just crop randomly)
Dead Comment
Deleted Comment
Deleted Comment
Aha, perhaps that's the problem then.
So the solution all along was to give users the ability to crop their own photos. Why wasn't this the original way of doing things?
Instead of forcing a complicated algorithm into the Twitter experience, it seems to me that the solution all along was just to let users do what they do best-- make tweets for themselves. This incident strikes me as a major failing of AI: We are so eager to shoehorn AI/ML into our products that we lose sight of what actually makes users happy.
Someone wanted to do a feature so they could get promoted. Probably with some mumbo jumbo about how it reduces the number of clicks to create a tweet and thus increases revenue.
This seems like it should have been a foregone conclusion. What was the driving force in the first place to think cropping images with an AI model was desirable? Seems like ML was a solution looking for a problem here, and I'm glad they've realised that.
Twitter crops photos to fit their preview formats. It seems like an obvious improvement to show people's faces when cropping, etc.
https://blog.twitter.com/engineering/en_us/topics/infrastruc...
All those examples show large improvement. Of course they might cherrypick images with large improvement for their blog advertising the feature. But still, it illustrates why people would think it's a good idea.
Of course they don't seem to consider the idea of not cropping at all.
Deleted Comment
But "learnings" is literally, exactly, just a synonym for "lessons." Can we not?
In my experience, there's a tendency toward folksiness in certain varieties of corpspeak that causes rejection of "formal-sounding" terms and repurposing of "plainer" forms to create new words, hence lessons = learnings, protégé = mentee, and so on.
> In comparisons of men and women, there was an 8% difference from demographic parity in favor of women.
would have been clearer (and more correct) as "an 8 percentage-point difference from demographic parity". That 8 pp difference though is a 16% "relative" difference (58/50), or more starkly "The algorithm chose the woman almost 40% more often" (58/42 => 1.38). That said, the diagram in the post [1] is much easier for humans to parse and say "wow, that looks pretty far off!".
tl;dr: A number like 8% sounds like "no big deal", but 8 percentage points (on each side) is a big deal!
[1] https://cdn.cms-twdigitalassets.com/content/dam/blog-twitter...
It's hard to believe that the bias was only 4% - there were a lot of people testing with images that they sourced themselves, and the preference for white people seemed much closer to 80-20.
The paper authors mention that their training data is from Wikidata (pictures of celebrities). I wonder if the types of photos in that dataset are meaningfully representative of the kinds of photos that people usually post to Twitter.
It's very easy to believe the bias was near-zero given you are citing highly motivated people on Twitter cherrypicking from thousands of examples and a little baffling you find that to be more credible than controlled systematic experimentation; note, for example, the extremely striking fact that the fuss completely missed the other bias they found which was several times larger - showing how totally useless people on social media are for testing these things and how they can conjure up "80-20" biases which don't exist.
One of the reference threads that identifies the issue happened on it by accident highlighting a surprising experience in another product (Zoom). Believe it or not, people who care about this stuff are not looking for stuff to complain about, we’re tired and overwhelmed. And I would hope that people who, upon discovering a vulnerability find and catalogue the ways it can be exploited, would be celebrated here.
I of course hope that the systems I use aren't racist against my loved ones. I am motivated to confirm whether or not they are, but I didn't go on to parlay my findings into an essay for clout. I gained nothing from doing this, except the knowledge that Twitter was suckier than I knew.