The reason these companies don't fix these systems is because they don't know how. It is easier to remove certain outputs or retire the whole system. There is no line of code they can tweak.
This reminds me of a favorite tweet from 2013: "Then Google Maps was like, 'turn right on Malcolm Ten Boulevard' and I knew there were no black engineers working there" -- https://twitter.com/alliebland/status/402990270402543616
This isn't a problem with diversity. Everybody knows how to pronounce Malcom X. And it's not like just because a google engineer was black that he was like "oh, let's try and see if Malcom X is pronounced correctly because he's black and I'm black too". This only happens in white people's brain.
This is a contrarian take that may get me downvoted and unfairly labeled, but I encourage critical thinking instead:
I've struggled with people telling me that these FAANG companies have "diversity problems," as a person of color myself. A majority of software engineers are female and male immigrants from East Asia and South Asia. These population centers are some of the most diverse regions of the world. The engineers who have been hired by preparing for and passing these companies' selective merit based coding tests had to overcome adverse conditions in their home countries as well, including extreme poverty, starvation, and totalitarian regimes.
Why do they not count toward diversity, to some white and white-adjacent critics? What message are we sending to people who are ethnic minorities from certain groups who earned their spots through merit and have also been targeted in recent newsworthy attacks, just as others have, when we make these kinds of accusations? What does a non problematic ethnic composition look like? What are these companies doing right toward some minority groups and wrong towards others?
As others noted, just because someone is black doesn't mean that they would have caught this. The whole point of ML is to adapt to what is effectively an unbounded set of inputs, pretty much by definition there will be cases where even a team of 100% black people will train a model that, given the correct input, will fail in ways that particularly affect black people.
> Facebook, like a lot of tech companies, has long had problems with diversity in engineering.
If that is the case, why is it that Google voice nav routinely butchers the names of places and roads in India in spite of having thousands of Indian engineers on staff?
Could we blame the intractability of the problem, or just plain old incompetence, before we blame every single problem in the world on racism and lack of 'diversity'?
So disappointing. I was legitimately looking for a monkey pic I took years ago to no avail because of no searchability. One of the richest companies in the world prefers to just remove ability than to solve hard problems. But hey, at least we all get ads.
You can't just "test" a neural network like that. For all you know they tested a thousand pictures of Chimpanzees and Gorillas against the network, but for some reason the NN decided to classify the photo differently because the subject was standing in front of the wrong kind of tree or wearing a funny-colored hat.
There's no super reliable way to prevent this (with current tech) other than forbidding that output entirely.
Is it inexcusable that if I search 'Japan' to look for pics from my trip to Japan, it shows me pictures containing any Asian person at all? If I search Japan today, I get mostly pics of my not Japanese wife. But I guess we don't complain enough for anyone to care.
> Which makes it an inexcusable mistake to make in 2021 - how are you not testing for this?
They probably are, but not good enough. These things can be surprisingly hard to detect. Post hoc it is easy to see the bias, but it isn't so easy before you deploy the models.
If we take racial connotations out of it then we could say that the algorithm is doing quite well because it got the larger hierarchical class correct, primate. The algorithm doesn't know the racial connotations, it just knows the data and what metric you were seeking. BUT considering the racial and historical context this is NOT an acceptable answer (not even close).
I've made a few comments in the past about bias and how many machine learning people are deploying models without understanding them. This is what happens when you don't try to understand statistics and particularly long tail distributions. gumboshoes mentioned that Google just removed the primate type labels. That's a solution, but honestly not a great one (technically speaking). But this solution is far easier than technically fixing the problem (I'd wager that putting a strong loss penalty for misclassifiying a black person as an ape is not enough). If you follow the links from jcims then you might notice that a lot of those faces are white. Would it be all that surprising if Google trained from the FFHQ (Flickr) Dataset?[0] A dataset known to have a strong bias towards white faces. We actually saw that when Pulse[1] turned Obama white (do note that if you didn't know the left picture was a black person and who they were that this is a decent (key word) representation). So it is pretty likely that _some_ problems could simply be fixed by better datasets (This part of the LeCunn controversy last year).
Though datasets aren't the only problems here. ML can algorithmically highlight bias in datasets. Often research papers are metric hacking, or going for the highest accuracy that they can get[2]. This leaderboardism undermines some of the usage and often there's a disconnect between researchers and those in production. With large and complex datasets we might be targeting leaderboard scores until we have a sufficient accuracy on that dataset before we start focusing on bias on that dataset (or more often we, sadly, just move to a more complex dataset and start the whole process over again). There's not many people working on the biased aspects of ML systems (both in data bias and algorithmic bias), but as more people are putting these tools into production we're running into walls. Many of these people are not thinking about how these models are trained or the bias that they contain. They go to the leaderboard and pick the best pre-trained model and hit go, maybe tuning on their dataset. Tuning doesn't eliminate the bias in the pre-training (it can actually amplify it!). ~~Money~~Scale is NOT all you need, as GAMF often tries to sell. (or some try to sell augmentation as all you need)
These problems won't be solved without significant research into both data and algorithmic bias. They won't be solved until those in production also understand these principles and robust testing methods are created to find these biases. Until people understand that a good ImageNet (or even JFT-300M) score doesn't mean your model will generalize well to real world data (though there is a correlation).
So with that in mind, I'll make a prediction that rather than seeing fewer cases of these mistakes rather we're going to see more (I'd actually argue that there's a lot of this currently happening that you just don't see). The AI hype isn't dying down and more people are entering that don't want to learn the math. "Throw a neural net at it" is not and never will be the answer. Anyone saying that is selling snake oil.
I don't want people to think I'm anti-ML. In fact I'm a ML researcher. But there's a hard reality we need to face in our field. We've made a lot of progress in the last decade that is very exciting, but we've got a long way to go as well. We can't just have everyone focusing on leaderboard scores and expect to solve our problems.
i wonder how testing for that looks and sounds in corporate environment. It may as well be an area similar to patents - you pretend that you never heard, never discussed, God forbid any mentioning in corporate email/chat/etc. or clicking on a link from inside a corporate network,...
I've been trying to avoid controversy lately, but hey, here's one to downvote.
Have we considered AI and ML as a general brain replacement is a failed idea? That we humans feel we are so smart we can recreate or exceed millions of year evolution of a human brain?
I'd never call AI a waste, it's not. But getting it to do human things just may be.
Even a child can tell the difference between a human of any color and an ape. How many billions have been spent trying, and failing, to exceed the bar of the thoughts of a human child?
No it isn't a failed idea at all. The products out today are remarkably useful even if not perfect. I have tested out the google lens thing on google photos and it is astounding.
I took a photo of the water pump from a car windscreen wiper and google was able to correctly identify what it was. I took a photo of a generic PCB which showed the back of a driver board for an LCD and google was able to bring up the exact type of board it was with the names of the ICs on it.
In these examples, google photos ai has far exceeded what the average human can achieve. We just have to keep in mind that these systems are not perfect and only a best guess which should be verified by a person later.
The problem here is not that the mistake was very costly or disruptive to the function of the feature, but that the mistake was highly offensive which is something very hard to avoid.
The problems it solved for you are immensely useful to you, but not remarkable IMO.
The problem it's solving is that it can do things that somebody with zero experience cannot. If you had an auto parts pro, or an EE, they probably could have done the same for you.
So, in general, AI is helpful because it has a much larger breadth of knowledge. Granted.
But I want examples of it doing depth, too.
My wife uses Lens when we fish. It's way, way worse than a fisherman with any experience at all.
> Have we considered AI and ML as a general brain replacement is a failed idea?
Yes. It is currently known to fail at this prospect. It is an open research question as to whether current methods can be merely "scaled up" using more compute to achieve "general brain replacement". I personally am skeptical about that considering basic problems such as concept drift (but I am by no means an expert).
You define what constitutes as valuable to be arbitrarily difficult/inconceivable with current methods (because it's an area of open research) and then say we should divert course merely because we don't know it's possible?
> never call AI a waste, it's not. But getting it to do human things just may be.
It already can do things thought to be previously exclusively "human" (such as beating Go). Recently it also helped make significant advancements for protein folding which are sure to yield benefits to medical science at least indirectly. I believe this statement is either incorrect, or you're expecting people to have some strange definition of "exclusively human", which is of course also open research and unanswered.
Couldn’t you apply the same way of thinking to finding a cure for AIDS, or doing interstellar travel, or P = NP, or pretty much any problem that we haven’t solved yet? Just because we can’t solve a problem within our lifetime doesn’t mean it’s not solvable at all. This is one of the most basic principles by which knowledge, and therefore, technology, progresses.
Not at all. If a child could solve the AIDS issue and science couldn't, then maybe.
Humans and machines are so different today. Of course machines beat us at number calculations and such. But we have organs that computers don't and can't have. And our brains are much more in tune with using those than power of 2 bit twiddling.
As we ourselves don't understand how it works, how can we ever write a machine that does?
Given the complexity of the solutions employed in this space and the task we’re trying to get them to solve (or perhaps, the solutions we’re looking for problems to) I’m not that surprised.
Taken to the extreme, AI code is essentially something like:
add(M, N) {
return M + N + rand();
}
In addition, being tested with a (in relation to the complete set) very small set of input data.
Is that a result of a skewed training set or are people really hard to tell apart from gorillas if there are no obvious tells like large difference in brightness of different areas of the face?
Deep learning, for all its recent glories, still suffers from relatively crude, slow-converging training algorithms compared to other areas of ML and statistics.
Maybe to your typical SGD-type algorithm, working off a dataset filled with mostly light skin toned people, skin tone just looks like a real solid first-order way to distinguish humans and primates, and picking up the black people / primate distinction seems much more marginal and second-order, in terms of impact on the cost function.
If most of the people in the dataset were black, I predict you wouldn't see this.
Consider too what they are likely using for inputs: photos with associated comments.
I don't know Facebook's TOS sufficiently to know whether they are using private groups as source material, but if you're utilizing bigoted content to train pattern recognition, you will replicate bigoted content.
I’m fairly certain if you showed pictures of both groups to a toddler they’d be able to sort them correctly. It’s really not hard for a human to tell the difference. Which tells me that FB’s AI isn’t really that great.
Human-like really depends on your interpretation. That's a generous reading of what's going on. If you google Gorilla faces, I don't think you would be confused.
The AI is not that smart and these examples show it.
It reminds me of some of the explorers' tales of people who were half-human, half some other animal, or of people covered in hair, the first of which may have originated from seeing people riding animals, and the others to various (actual) primates. If humans can make such mistakes, certainly Facebook's AI can be excused for its confusion.
There is evidence that CNN's use texture features more than shape features, i.e. have a texture bias. It's hard to tell in this case without access to the data/model, but it's very possible colour is being overvalued by the classifier and causes the errors.
The video features white and black men. It seems like concluding the algorithm is calling black men primates is the same kind of error people are accusing the algorithm/Facebook of. i.e. The reason you think it's racist is because you assume it's talking about black people specifically suggesting you think the word is more apt to describe black people.
Primates and humans are similar labels. This was almost certainly not intentional. Video classifiers are going to make mistakes - sometimes crude or offensive ones. I don't get outrage over labeling errors like this. Facebook should fix the issue - but they shouldn't apologize. It only encourages grievance seekers.
We’re assuming it because that’s exactly what has happened with other products in the past. It’s an issue the field has struggled with, so it seems likely.
Maybe I'm not aware of what you're referring to, but I don't think so. I think, like this incident, companies apologize for stuff like this because they lack the courage to say the truth, which is that it's an unfortunate labeling error but not a big deal. Instead, they judge it to be more political to beg forgiveness. Of course, the people who get offended by labeling errors are only encouraged by apologies and use them as evidence of wrongdoing.
Speak for yourself. Intent is one of the key factors in crime investigations. Even in 'every aspect of life', intent plays a critical role in greasing the society's abrasion. It helps us understand each other better. Did you accidentally bumped me or did you try to push me out of the bus?
> The reason you think it's racist is because you assume it's talking about black people specifically suggesting you think the word is more apt to describe black people.
No, I think it's racist because racists have a long history of calling black people primates, and because an automated system doesn't get to escape scrutiny and critique just because someone didn't specifically put in a line of code that emulates the actions of racists.
This happens because there are no black people of consequence in the ML pipeline. In my previous company Everytime we built a new model, a bunch of us would test it. Being the only black person in the company, I often found some very odd things and we would correct it before shipping.
I understand that fb is a much bigger scale, but all the reason to have a much more diverse set of eyes to test their models before they go live.
If you want to avoid this, hire more black people, seriously.
It's not obvious to me what black people would have done to fix this specific problem. Would they have said "oh we should make sure to test the algorithm on blurry images of people in a forest and make sure it doesn't get confused"?
I worked for another computer vision company, Clarifai that had the same issue. One of the employees noticed it and we retrained the model before it became public.
This is what amazes me. Given this exact thing has happened in the past and resulted in public humiliation of the companies involved, how did they not notice this? Why didn’t they check for it?
What’s being reported is that there is a single video which is mislabeled. For all we know, they did test for this, and believed there was no issue.
AI models are deterministic in a purely technical sense, but practically speaking, they are non-deterministic black boxes. It’s not as if you can write a unit test which generates all possible videos of black people and makes sure it never outputs “gorilla”.
I think the negative reaction is reasonable. Clearly, if a human did this it would a problem, so why should it be acceptable for an automated system to do the same thing? The fact that it is unintentional doesn't negate the fact that it's an embarrassing mistake.
On the other hand, imagine a world where these labels were applied by a massive team of humans instead of a deep learning algorithm. At Facebook's scale, would the photos end up with more or less racist labels on average over time? My guess is that the model does a better job, but this is just another example of why we should be wary about trusting ML systems with important work.
Clearly, if a human did this it would a problem, so why should it be acceptable for an automated system to do the same thing?
One worries that the corporate overlords are preparing the legal system for completely impune manufacturers of self-driving cars. "Sorry your child is dead; the car did it so there's no one to sue or convict."
That raises the question, is it embarrassing or an expected mistake to be learned from. Many things are mislabel many things are labelled properly but we never say AI must feel pride at the good labeling job why would we give emotions to an emotionless system?
> it embarrassing or an expected mistake to be learned from.
I would say it's both. It's embarrassing for Facebook because it looks racist even though it really isn't. The system might be emotionless but the people who interact with it aren't, and we don't expect them to be.
https://news.ycombinator.com/item?id=28415582
People obviously still see value in discussing it
Google Photos in 2015: https://www.wired.com/story/when-it-comes-to-gorillas-google...
Flickr in 2015: https://www.independent.co.uk/life-style/gadgets-and-tech/ne...
Dead Comment
Dead Comment
Facebook, like a lot of tech companies, has long had problems with diversity in engineering. Here's an article from April that discusses specific incidents and the broader background: https://www.washingtonpost.com/technology/2021/04/06/faceboo...
I've struggled with people telling me that these FAANG companies have "diversity problems," as a person of color myself. A majority of software engineers are female and male immigrants from East Asia and South Asia. These population centers are some of the most diverse regions of the world. The engineers who have been hired by preparing for and passing these companies' selective merit based coding tests had to overcome adverse conditions in their home countries as well, including extreme poverty, starvation, and totalitarian regimes.
Why do they not count toward diversity, to some white and white-adjacent critics? What message are we sending to people who are ethnic minorities from certain groups who earned their spots through merit and have also been targeted in recent newsworthy attacks, just as others have, when we make these kinds of accusations? What does a non problematic ethnic composition look like? What are these companies doing right toward some minority groups and wrong towards others?
If that is the case, why is it that Google voice nav routinely butchers the names of places and roads in India in spite of having thousands of Indian engineers on staff?
Could we blame the intractability of the problem, or just plain old incompetence, before we blame every single problem in the world on racism and lack of 'diversity'?
Silly Google TTS, the proper pronunciation is obviously "Malcolm the Tenth" there.
Once you search for these:
https://www.google.com/search?q=human+female+face&tbm=isch
https://www.google.com/search?q=human+male+face&tbm=isch
You can see that 'human face' has a bit of post-hoc tuning.
https://www.google.com/search?q=human+face&tbm=isch
There's no super reliable way to prevent this (with current tech) other than forbidding that output entirely.
https://i.ibb.co/Mf6rVdf/Screenshot-20210907-002516-Photos.j...
Nobody who has traveled at all would mistake my wife and child as Japanese. And doing so is especially insidious considering the Bataan death march.
They probably are, but not good enough. These things can be surprisingly hard to detect. Post hoc it is easy to see the bias, but it isn't so easy before you deploy the models.
If we take racial connotations out of it then we could say that the algorithm is doing quite well because it got the larger hierarchical class correct, primate. The algorithm doesn't know the racial connotations, it just knows the data and what metric you were seeking. BUT considering the racial and historical context this is NOT an acceptable answer (not even close).
I've made a few comments in the past about bias and how many machine learning people are deploying models without understanding them. This is what happens when you don't try to understand statistics and particularly long tail distributions. gumboshoes mentioned that Google just removed the primate type labels. That's a solution, but honestly not a great one (technically speaking). But this solution is far easier than technically fixing the problem (I'd wager that putting a strong loss penalty for misclassifiying a black person as an ape is not enough). If you follow the links from jcims then you might notice that a lot of those faces are white. Would it be all that surprising if Google trained from the FFHQ (Flickr) Dataset?[0] A dataset known to have a strong bias towards white faces. We actually saw that when Pulse[1] turned Obama white (do note that if you didn't know the left picture was a black person and who they were that this is a decent (key word) representation). So it is pretty likely that _some_ problems could simply be fixed by better datasets (This part of the LeCunn controversy last year).
Though datasets aren't the only problems here. ML can algorithmically highlight bias in datasets. Often research papers are metric hacking, or going for the highest accuracy that they can get[2]. This leaderboardism undermines some of the usage and often there's a disconnect between researchers and those in production. With large and complex datasets we might be targeting leaderboard scores until we have a sufficient accuracy on that dataset before we start focusing on bias on that dataset (or more often we, sadly, just move to a more complex dataset and start the whole process over again). There's not many people working on the biased aspects of ML systems (both in data bias and algorithmic bias), but as more people are putting these tools into production we're running into walls. Many of these people are not thinking about how these models are trained or the bias that they contain. They go to the leaderboard and pick the best pre-trained model and hit go, maybe tuning on their dataset. Tuning doesn't eliminate the bias in the pre-training (it can actually amplify it!). ~~Money~~Scale is NOT all you need, as GAMF often tries to sell. (or some try to sell augmentation as all you need)
These problems won't be solved without significant research into both data and algorithmic bias. They won't be solved until those in production also understand these principles and robust testing methods are created to find these biases. Until people understand that a good ImageNet (or even JFT-300M) score doesn't mean your model will generalize well to real world data (though there is a correlation).
So with that in mind, I'll make a prediction that rather than seeing fewer cases of these mistakes rather we're going to see more (I'd actually argue that there's a lot of this currently happening that you just don't see). The AI hype isn't dying down and more people are entering that don't want to learn the math. "Throw a neural net at it" is not and never will be the answer. Anyone saying that is selling snake oil.
I don't want people to think I'm anti-ML. In fact I'm a ML researcher. But there's a hard reality we need to face in our field. We've made a lot of progress in the last decade that is very exciting, but we've got a long way to go as well. We can't just have everyone focusing on leaderboard scores and expect to solve our problems.
[0] https://github.com/NVlabs/ffhq-dataset
[1] https://twitter.com/Chicken3gg/status/1274314622447820801
[2] https://twitter.com/emilymbender/status/1434874728682901507
i wonder how testing for that looks and sounds in corporate environment. It may as well be an area similar to patents - you pretend that you never heard, never discussed, God forbid any mentioning in corporate email/chat/etc. or clicking on a link from inside a corporate network,...
Deleted Comment
Have we considered AI and ML as a general brain replacement is a failed idea? That we humans feel we are so smart we can recreate or exceed millions of year evolution of a human brain?
I'd never call AI a waste, it's not. But getting it to do human things just may be.
Even a child can tell the difference between a human of any color and an ape. How many billions have been spent trying, and failing, to exceed the bar of the thoughts of a human child?
I took a photo of the water pump from a car windscreen wiper and google was able to correctly identify what it was. I took a photo of a generic PCB which showed the back of a driver board for an LCD and google was able to bring up the exact type of board it was with the names of the ICs on it.
In these examples, google photos ai has far exceeded what the average human can achieve. We just have to keep in mind that these systems are not perfect and only a best guess which should be verified by a person later.
The problem here is not that the mistake was very costly or disruptive to the function of the feature, but that the mistake was highly offensive which is something very hard to avoid.
The problem it's solving is that it can do things that somebody with zero experience cannot. If you had an auto parts pro, or an EE, they probably could have done the same for you.
So, in general, AI is helpful because it has a much larger breadth of knowledge. Granted.
But I want examples of it doing depth, too.
My wife uses Lens when we fish. It's way, way worse than a fisherman with any experience at all.
Yes. It is currently known to fail at this prospect. It is an open research question as to whether current methods can be merely "scaled up" using more compute to achieve "general brain replacement". I personally am skeptical about that considering basic problems such as concept drift (but I am by no means an expert).
You define what constitutes as valuable to be arbitrarily difficult/inconceivable with current methods (because it's an area of open research) and then say we should divert course merely because we don't know it's possible?
> never call AI a waste, it's not. But getting it to do human things just may be.
It already can do things thought to be previously exclusively "human" (such as beating Go). Recently it also helped make significant advancements for protein folding which are sure to yield benefits to medical science at least indirectly. I believe this statement is either incorrect, or you're expecting people to have some strange definition of "exclusively human", which is of course also open research and unanswered.
Humans and machines are so different today. Of course machines beat us at number calculations and such. But we have organs that computers don't and can't have. And our brains are much more in tune with using those than power of 2 bit twiddling.
As we ourselves don't understand how it works, how can we ever write a machine that does?
Taken to the extreme, AI code is essentially something like:
In addition, being tested with a (in relation to the complete set) very small set of input data.Maybe to your typical SGD-type algorithm, working off a dataset filled with mostly light skin toned people, skin tone just looks like a real solid first-order way to distinguish humans and primates, and picking up the black people / primate distinction seems much more marginal and second-order, in terms of impact on the cost function.
If most of the people in the dataset were black, I predict you wouldn't see this.
I don't know Facebook's TOS sufficiently to know whether they are using private groups as source material, but if you're utilizing bigoted content to train pattern recognition, you will replicate bigoted content.
The AI is not that smart and these examples show it.
Humans are primates. It's weird that it selected such a broad label, but it didn't select an incorrect label.
e: I assume something similar has been done before by training a model on brown/black bears then throwing polar bears at it. Anyone know the outcome?
Dead Comment
When I was quite young, I referred to some firefighters as robots.
Deleted Comment
which says a lot about the state of our alleged human outperforming AI
And I'd like to see a gorilla in any pose that's really hard (for a human) to differentiate from a person.
The truth is: the recognition algorithm is not very sophisticated after all.
Deleted Comment
Primates and humans are similar labels. This was almost certainly not intentional. Video classifiers are going to make mistakes - sometimes crude or offensive ones. I don't get outrage over labeling errors like this. Facebook should fix the issue - but they shouldn't apologize. It only encourages grievance seekers.
In every aspect of your life
No, I think it's racist because racists have a long history of calling black people primates, and because an automated system doesn't get to escape scrutiny and critique just because someone didn't specifically put in a line of code that emulates the actions of racists.
I understand that fb is a much bigger scale, but all the reason to have a much more diverse set of eyes to test their models before they go live.
If you want to avoid this, hire more black people, seriously.
I guess first step might be to "hire more black QA people".
"Oh, maybe we should look into that"
AI models are deterministic in a purely technical sense, but practically speaking, they are non-deterministic black boxes. It’s not as if you can write a unit test which generates all possible videos of black people and makes sure it never outputs “gorilla”.
Deleted Comment
On the other hand, imagine a world where these labels were applied by a massive team of humans instead of a deep learning algorithm. At Facebook's scale, would the photos end up with more or less racist labels on average over time? My guess is that the model does a better job, but this is just another example of why we should be wary about trusting ML systems with important work.
One worries that the corporate overlords are preparing the legal system for completely impune manufacturers of self-driving cars. "Sorry your child is dead; the car did it so there's no one to sue or convict."
Deleted Comment
I would say it's both. It's embarrassing for Facebook because it looks racist even though it really isn't. The system might be emotionless but the people who interact with it aren't, and we don't expect them to be.