Content classification and moderation at scale is hard. I wish this article had provided more concrete statistics.
> The group, formed in December, is private and relatively small (57 members) but is still active, with four posts in the last month. We attempted to reach the administrator by email but did not receive a response.
It’s not really honest to expect Facebook to correctly classify every single, tiny group created on the entire website.
> It’s not really honest to expect Facebook to correctly classify every single, tiny group created on the entire website.
I agree, but that's what Facebook said it would do. IMO the onus is on Facebook here. They promised to do something they surely knew they couldn't do. I suspect because the alternative is admitting that they don't really have any control over their recommendation algorithm, and the logical conclusion to that is to stop recommending things. But they don't ever want to do that because their engagement numbers would tank.
I don’t think it’s reasonable to interpret their statement to mean that their user-submitted content classification would never be without fault. I think people are forgetting the scale of Facebook, or even the scale of the global internet population.
If Facebook made a change to stop recommending groups that had been categorized as political, that’s a fair fulfillment of their statement. If some users are miscategorizing their user-created groups and some of those aren’t caught by automated filters and some of those are slipping into recommendations somewhere, we’re starting to play a game of “gotchas”.
What do people actually want from Facebook? 100% perfect categorization of 100% user-generated content is impossible, and I think most people on HN understand that. So is there some degree of “good enough” that would be acceptable, or is this the type of issue that will generate outrage as long as someone can find an isolated exception somewhere? If it’s the latter, I think we’re bound to wear out the patience of reasonable people following along.
The same reason it’s reasonable for Hacker News to promote user-submitted links that have been voted up by users. Before being individually vetted ahead of time by moderators.
Do we really want to go down the road of restricting websites from sharing user-generated content? That’s a non-starter for the free internet.
If moderating their content is necessary to maintain a healthy platform (in terms that our society determines), then their inability to moderate that content means they are too big and need to be cut down to a manageable size.
What real good has Facebook done? Even the few examples that can be mustered don't compete with the ongoing damage it does to our society. If they can't handle these problems at this scale (global X-billion users scale) then they should scale down.
As a society we don't restrict fundamental civil rights such as freedom of speech for individuals or corporations just because it might cause dune vaguely defined "damage". The legal bar is higher than that, as it should be.
We can demand companies do whatever we, as a society, deem important to do.
If their scale and cost structure makes complying with that demand "impossible" then it's completely reasonably to propose that it's their scale and business model that's the problem, not the nature of the demand.
> It’s not really honest to expect Facebook to correctly classify every single, tiny group created on the entire website
I think it's entirely reasonable[0]. I'm sick and tired of this bogus idea that large websites are simply too big to be moderated by human beings. It's bullshit propaganda that is invented by companies that don't want to spend the money to do it.
The issue is that FB is not doing a good job - still recommending “Progressive Democrats of Nevada,” “Michigan Republicans”, “Bernie Sanders for President 2020,” “Liberty lovers for Ted Cruz,” “Philly for Elizabeth Warren”. Sure, it's non-trivial, but if these fall through the cracks, then you're not trying hard enough.
My guess is that there was a mixture of:
1) This not being a priority for FB
2) FB Eng / PM being too clever for its own good and not doing the obvious thing of substring matching ("not scalable"). (Arguably a part of (1) - if it's really important, you'd have the team to maintain a list of banned strings).
3) Just a plain old bug (although arguably this is a subcategory of (1)).
Then maybe they shouldn't allow everyone to create these groups on their platform.
If I run a music venue, and every weekend I sell out 1,000 seats... BUT every weekend 1 person shoots another person... they're not going to let me keep having concerts. I can't throw my hands up and say "well there are just too many people at this event, I can't check them all for guns"
Have we reached a decade of "AI will fix it" yet, or are we still a couple of years off?
I feel like so many issues around content moderation have been waved off by the big tech companies as being solved by algorithms and yet time and time again it's made very clear that algorithms alone aren't going to solve these problems.
I used to think it was because the engineers and managers involved believed so deeply in the potential but now I'm pretty sure it's because higher ups know the alternative (e.g. hiring humans, or abandoning money-generating algorithmic feeds) would be expensive and they'll throw anything and everything they can at the wall to avoid having to do it.
Content moderation at Facebook scale is really, really hard. Even for real people.
Again I hate to defend Facebook, but your comment is very misleading by ignoring that Facebook HAS hired humans to help solve the content moderation problem. A lot of them.
The clear implication here is surely that 15,000 is not enough. Which is maybe not surprising when Facebook claims to have 2.85 billion active users. By that metric 15k isn't really all that much.
> Content moderation at Facebook scale is really, really hard. Even for real people.
Maybe you could even argue that it's impossible. The question then is what to do about it. One answer could be "stop using recommendation algorithms you do not control", but that would harm Facebook's profits.
Yes, 15000 is a lot. But >2 billion people is a lot more. Wikipedia has far fewer people employed on content moderation yet they seem to fair just fine. This number throwing on Facebook "doing an effort" is an attempt to keep the toxic business model that they have a live at an acceptable rate. How many people would they save for hiring if they'd change the timeline into a chronological one instead of their current one? It'd be much harder to get as much attention than, but also less addictive, which is a net good for every human except for Facebook as a business.
That being the case, it seems like Facebook in it's current form shouldn't exist then. It has no natural right to exist, and the harms clearly outweigh any benefits that could be argued.
> I don't love defending Facebook, but as of last year, they employed 15 THOUSAND content moderators
If all that discussions in Facebook happened in cafes, libraries and schools there will be way more "moderators" in that spaces. 15,000 is nothing for a business with 2.85 billion users. That is a little more than 1 moderator for each 200,000 users.
One of my extended family members works for a FAANG company (not Facebook) on a team that moderates user-generated content. They are actually using AI to great effect, with great results. They have large amounts of manual review and moderation as well, but having AI augment the human intervention and also continuously trained based on human overrides of AI decisions makes the system much more efficient.
What many people don't understand is that even if Facebook employed human moderators to individually review every group, there would still be isolated examples of policy violations slipping through. Human moderators are far from perfect. They can and do make mistakes. At scale, you get some moderators who don't actually care about doing the work, so they start letting things slip through instead of reviewing them. AI is also helpful in flagging human moderator decisions that disagree strongly with AI predictions, which can then be used to catch moderator errors or improve the AI, depending on which is ultimately deemed correct.
It's easy to forget that Facebook has almost 3 billion users. At scales that large, even 99.99% correct content moderation (human or otherwise) will still result in a lot of incorrectly moderated content slipping through.
More commonly, it may not be immediately obvious that content breaks policies, or in this case that a group is political. Or maybe the group started as one thing, but then evolved over time to become highly political. Or maybe they chose a name that sounds innocuous, but is actually very offensive given some obscure context or lingo. The problem is impossible to solve perfectly, so we need to instead focus on setting realistic expectations for what can be done.
What is the "it"? A society problem or a company problem? I don't think AI is designed to fix society problems. They are designed to make more money for the company, even with the cost of the society.
Most social networks operate with a business model/philosophy that the bare minimum of moderation will promote uploading and engagement, capturing the biggest user base, while also saving on expenses. Moderation is just for when you achieve market dominance or attention, and the bare minimum is shifted slightly towards a bit more moderation to appease any critical comments, outcries, or changes in expectations.
AI moderation is, and, I think, always has been, a cost saving measure. The effectiveness of it for the health of a platform has always been secondary.
The problem is the humans - the users. They keep clicking on outrage-bait, and re-posting it. AIs are not going to fix the humans. They might turn the gain down a bit, they might cut off some of the worst stuff, but they're not going to fix the problem. The problem is us.
But IMO you minimize the importance of controlling that gain. HN is full of humans, too, but here the worst/craziest stuff usually gets reduced visibility. On the big "social" platforms, they crank that crap up to 11.
I'd love to see more discussion about what success actually looks like for content moderation on facebook (or the internet, really).
We don't seem to care that the legal system doesn't get remotely close to catching everyone who commits an ACTUAL crime so why do we need to be perfect (whatever that even means) for content moderation?
What makes you think we're not already in that decade, and this is what it looks like when AI "fixes" things?
Are you familiar with AI Alignment? When AI does work, it does not always align with human values and motives. And even if it did, it only aligns with the values and motives of its creators.
>we used keyword-based classification to assess whether they contained support for politicians, movements, parties, or ideologies
They don't need AI to improve this. A boring old keyword search would improve things. Sure, classifying groups perfectly is a difficult problem especially at Facebook's scale. But this shows almost a complete lack of effort when groups like "Bernie Sanders for President 2020" and "Liberty lovers for Ted Cruz" are recommended.
At least the latter manipulative broadcasts have a vetting process to root them in some semblance of reality. Then you can rummage among various ones to see truth hidden in plain sight.
But the fight is for the uncritical consumer who isn't going to look beyond the substance. The web overall also provides a vast amount of information that's useful to those who critically sift it even it is arguably harmful to those who uncritically swallow it.
Moreover, the vetting process is an extra added because of mass media's monopoly position - part of "journalist quality" generally. And this has been declining for a while. Partly 'cause of competition from the web but also because of a generally more competitive environment.
Which is to say, we're not going back to the old situation regardless so we may as well appreciate the benefits and drawbacks of each era.
"People being able to directly organize and bypass the established political parties and corporate media entities that currently control the system is bad."
- The established political parties and corporate media entities that currently control the system
Spoiler alert: the people "directly organized[ing]" aren't less manipulated, nor less manipulating. The rules have shifted and it's allowed some new players on the field, but it's the exact same game. Took very little time for the "game" to catch up with and overtake the web, in the scheme of things, really.
(though, yes, of course, you're right that entrenched interests will hate this new thing whether or not what I wrote above is true)
This but its good actually. The opposite of "corporate media" isn't honest factual reporting free of agenda and aligned with the viewers interest. The opposite of "corporate media" is a complete lack of factual investigation, replaced instead by the loud opinions of overconfident charlatans spewing forth bullshit at an unprecedented rate. News orgs print retractions and have a reputation to maintain, but no one is ever going to come back and fact check a facebook rant. Its called a news 'feed' because its served from a trough.
There exists just-as-factual non-corporate media. I think Democracy Now! is a good example. Yeah, it's very biased, but I think it's factual reporting.
> Citizen Browser consists of a paid nationwide panel of Facebook users who automatically send us data from their Facebook feeds.
Is this much different from Cambridge Analytica's "This Is Your Digital Life" (other than being a browser extension rather than an API)?
From the site (https://themarkup.org/citizen-browser/2021/01/05/how-we-buil...):
> To protect the privacy of panelists, we automatically strip potential identifiers from their captured Facebook data. The raw data we collect from them is never seen by a person and is automatically deleted after one month.
So they say. CA's data was governed by a TOS, that didn't prevent them from abusing it.
I love that the whole discourse revolves around the problem of stoopid voters having wrong thoughts and sharing wrong opinions.
I wonder what “political” even means. A group of people who like Ben Shapiro will be classified as political, I guess. But what about an LGBT+ youth group? Is it politics? Is it Facebook that decides what issue is political and controversial and which issue should be considered the de facto norm?
You're following into the trap of "there are two sexual orientations: straight and political."
You can say that everything is politics and power dynamics. That the existence of a comic book club at your high school is the political act of organizing and normalizing these freaks and nerds who indoctrinate good christian boys away from football and family values with their superhero propaganda. The problem with reasoning like this is that it creates this world view where everything revolves this lotus of "establishment" and must exist in opposition to it. Where really it's just people who like comics and want to share this thing they like.
The same with how LGBT groups are portrayed. There are people engaging in politics on issues like gay marriage and trans rights but that doesn't make an LGBT youth group set up to give community to people who feel alone and marginalized a place to be themselves and make friends a political group.
But Ben Sharpie, his content is literally just political outrage bait designed to push a very specific narrative for the purpose of changing the political tide. And since that's all that he and people like him do day in and day out it's why they assume that that's what our comic book club must be doing as well. Bleh.
You know what we do in our LGBT youth group, play Super Smash Bros, bake cookies, and watch gay rom coms.
Our Discord is just a stream of people calling each other and random things gay, a deep fried meme of Lord Farquaad saying 'E', a twitter screencap about how attractive girls are with suspenders, a meme of Ferris from Re:Zero saying "excited gay noises" with a story about how a stranger called her miss in public, and someone suggesting we have a cottagecore theme night.
I don't want social media enforcing anything either, especially if it comes from the elitists or our government. The latter breaks all sorts of constitutional amendments. They're using a middle man to claim they're not stomping on the constitution.
Meanwhile, YouTube continues to force a coronavirus section in my recommendations despite me hiding it over and over again. It's just an annoyance but still.
Youtube gets way too little blame for radicalizing a large chunk of angry/confused people imo. I watched a video one time that I guess was alt-right-adjacent. Youtube immediately started recommending more hardcore basically racist content intended to induce more anger - leading my right down the rabbit hole if I wanted to go. I had to train it to stop doing that.
Instagram seems to have done the same thing for a bunch of people in the wellness community - pulling them some of them down into the Q-rabbit hole.
I truly despise the algorithms and blame them for most of the problem.
You can easily opt out of this. Turn off watch history and search history, then you'll have recommendations related to your subscriptions, not what you watch. I've been doing this for over a decade now.
To solve this, I avoid using YouTube for discovering new videos because I don't believe the recommendation system has my interests in mind (e.g. spending my time more enjoyably). I've installed a free browser add-on to hide all YouTube recommendations in the sidebar (Distraction Free YouTube for Firefox).
I don't think so, those banners are pure virtue signal. An attempt to look good to the market they've determined is the most profitable to align with.
If they were actually trying to change minds, they have so many better tools than an annoying banner that only makes you dig in harder on whatever you already believed.
"Google FUNDED virus research carried out by Wuhan-linked scientist Peter Daszak for over a decade, new report reveals, amid accusations Big Tech has silenced COVID lab leak theory"
If this is true now you know why Google has pushed so hard on controlling any information around it.
Meanwhile, I'm co-administering a little Facebook page for a political project and would really like to see it recommended to people. The whole point of being on Facebook is to get more exposition.
I wish people would stop seeing politics as inherently filthy, as opposed to other "hobbies" or volunteerships which are clean. Not all politics are about self-interest; I would argue most activists have the greater good in mind (even though they might disagree vehemently on what that is).
> The group, formed in December, is private and relatively small (57 members) but is still active, with four posts in the last month. We attempted to reach the administrator by email but did not receive a response.
It’s not really honest to expect Facebook to correctly classify every single, tiny group created on the entire website.
I agree, but that's what Facebook said it would do. IMO the onus is on Facebook here. They promised to do something they surely knew they couldn't do. I suspect because the alternative is admitting that they don't really have any control over their recommendation algorithm, and the logical conclusion to that is to stop recommending things. But they don't ever want to do that because their engagement numbers would tank.
If Facebook made a change to stop recommending groups that had been categorized as political, that’s a fair fulfillment of their statement. If some users are miscategorizing their user-created groups and some of those aren’t caught by automated filters and some of those are slipping into recommendations somewhere, we’re starting to play a game of “gotchas”.
What do people actually want from Facebook? 100% perfect categorization of 100% user-generated content is impossible, and I think most people on HN understand that. So is there some degree of “good enough” that would be acceptable, or is this the type of issue that will generate outrage as long as someone can find an isolated exception somewhere? If it’s the latter, I think we’re bound to wear out the patience of reasonable people following along.
Why is it reasonable for Facebook to promote content it hasn't classified?
Do we really want to go down the road of restricting websites from sharing user-generated content? That’s a non-starter for the free internet.
What real good has Facebook done? Even the few examples that can be mustered don't compete with the ongoing damage it does to our society. If they can't handle these problems at this scale (global X-billion users scale) then they should scale down.
If their scale and cost structure makes complying with that demand "impossible" then it's completely reasonably to propose that it's their scale and business model that's the problem, not the nature of the demand.
I think it's entirely reasonable[0]. I'm sick and tired of this bogus idea that large websites are simply too big to be moderated by human beings. It's bullshit propaganda that is invented by companies that don't want to spend the money to do it.
[0] https://www.nytimes.com/2021/04/28/business/facebook-earning...
If I run a music venue, and every weekend I sell out 1,000 seats... BUT every weekend 1 person shoots another person... they're not going to let me keep having concerts. I can't throw my hands up and say "well there are just too many people at this event, I can't check them all for guns"
It is when a billionaire and a company of chosen few can weild so much power over people everywhere
I feel like so many issues around content moderation have been waved off by the big tech companies as being solved by algorithms and yet time and time again it's made very clear that algorithms alone aren't going to solve these problems.
I used to think it was because the engineers and managers involved believed so deeply in the potential but now I'm pretty sure it's because higher ups know the alternative (e.g. hiring humans, or abandoning money-generating algorithmic feeds) would be expensive and they'll throw anything and everything they can at the wall to avoid having to do it.
Content moderation at Facebook scale is really, really hard. Even for real people.
Again I hate to defend Facebook, but your comment is very misleading by ignoring that Facebook HAS hired humans to help solve the content moderation problem. A lot of them.
> Content moderation at Facebook scale is really, really hard. Even for real people.
Maybe you could even argue that it's impossible. The question then is what to do about it. One answer could be "stop using recommendation algorithms you do not control", but that would harm Facebook's profits.
If all that discussions in Facebook happened in cafes, libraries and schools there will be way more "moderators" in that spaces. 15,000 is nothing for a business with 2.85 billion users. That is a little more than 1 moderator for each 200,000 users.
Great. Keep going. They clearly need many thousands more - and they can afford it[0].
[0] https://www.nytimes.com/2021/04/28/business/facebook-earning...
What many people don't understand is that even if Facebook employed human moderators to individually review every group, there would still be isolated examples of policy violations slipping through. Human moderators are far from perfect. They can and do make mistakes. At scale, you get some moderators who don't actually care about doing the work, so they start letting things slip through instead of reviewing them. AI is also helpful in flagging human moderator decisions that disagree strongly with AI predictions, which can then be used to catch moderator errors or improve the AI, depending on which is ultimately deemed correct.
It's easy to forget that Facebook has almost 3 billion users. At scales that large, even 99.99% correct content moderation (human or otherwise) will still result in a lot of incorrectly moderated content slipping through.
More commonly, it may not be immediately obvious that content breaks policies, or in this case that a group is political. Or maybe the group started as one thing, but then evolved over time to become highly political. Or maybe they chose a name that sounds innocuous, but is actually very offensive given some obscure context or lingo. The problem is impossible to solve perfectly, so we need to instead focus on setting realistic expectations for what can be done.
What is the "it"? A society problem or a company problem? I don't think AI is designed to fix society problems. They are designed to make more money for the company, even with the cost of the society.
In both cases AI will do whatever task you design or train it to do.
AI moderation is, and, I think, always has been, a cost saving measure. The effectiveness of it for the health of a platform has always been secondary.
But IMO you minimize the importance of controlling that gain. HN is full of humans, too, but here the worst/craziest stuff usually gets reduced visibility. On the big "social" platforms, they crank that crap up to 11.
We don't seem to care that the legal system doesn't get remotely close to catching everyone who commits an ACTUAL crime so why do we need to be perfect (whatever that even means) for content moderation?
Are you familiar with AI Alignment? When AI does work, it does not always align with human values and motives. And even if it did, it only aligns with the values and motives of its creators.
Thus the public criticism.
>we used keyword-based classification to assess whether they contained support for politicians, movements, parties, or ideologies
They don't need AI to improve this. A boring old keyword search would improve things. Sure, classifying groups perfectly is a difficult problem especially at Facebook's scale. But this shows almost a complete lack of effort when groups like "Bernie Sanders for President 2020" and "Liberty lovers for Ted Cruz" are recommended.
There is really no way out of this. The age of information being dictated from a handful of corporations in New York is over.
Moreover, the vetting process is an extra added because of mass media's monopoly position - part of "journalist quality" generally. And this has been declining for a while. Partly 'cause of competition from the web but also because of a generally more competitive environment.
Which is to say, we're not going back to the old situation regardless so we may as well appreciate the benefits and drawbacks of each era.
- The established political parties and corporate media entities that currently control the system
(though, yes, of course, you're right that entrenched interests will hate this new thing whether or not what I wrote above is true)
Yeah this is exactly my point.
You just described corporate media. Ever watched Fox/CNN/MSNBC? Even the NYT and WaPo is going that direction now.
Is this much different from Cambridge Analytica's "This Is Your Digital Life" (other than being a browser extension rather than an API)?
From the site (https://themarkup.org/citizen-browser/2021/01/05/how-we-buil...): > To protect the privacy of panelists, we automatically strip potential identifiers from their captured Facebook data. The raw data we collect from them is never seen by a person and is automatically deleted after one month.
So they say. CA's data was governed by a TOS, that didn't prevent them from abusing it.
I wonder what “political” even means. A group of people who like Ben Shapiro will be classified as political, I guess. But what about an LGBT+ youth group? Is it politics? Is it Facebook that decides what issue is political and controversial and which issue should be considered the de facto norm?
You can say that everything is politics and power dynamics. That the existence of a comic book club at your high school is the political act of organizing and normalizing these freaks and nerds who indoctrinate good christian boys away from football and family values with their superhero propaganda. The problem with reasoning like this is that it creates this world view where everything revolves this lotus of "establishment" and must exist in opposition to it. Where really it's just people who like comics and want to share this thing they like.
The same with how LGBT groups are portrayed. There are people engaging in politics on issues like gay marriage and trans rights but that doesn't make an LGBT youth group set up to give community to people who feel alone and marginalized a place to be themselves and make friends a political group.
But Ben Sharpie, his content is literally just political outrage bait designed to push a very specific narrative for the purpose of changing the political tide. And since that's all that he and people like him do day in and day out it's why they assume that that's what our comic book club must be doing as well. Bleh.
You know what we do in our LGBT youth group, play Super Smash Bros, bake cookies, and watch gay rom coms.
Our Discord is just a stream of people calling each other and random things gay, a deep fried meme of Lord Farquaad saying 'E', a twitter screencap about how attractive girls are with suspenders, a meme of Ferris from Re:Zero saying "excited gay noises" with a story about how a stranger called her miss in public, and someone suggesting we have a cottagecore theme night.
Name the people you trust at Facebook to make that same distinction you just made in your post.
Name the process you as a consumer can use to hold those at Facebook to account when they inevitably get it wrong.
Point me to the highly-transparent records of how and when Facebook employees made that decision.
Obviously none of the above exists and that is the core issue.
Some shadowy group of "our betters" does the deciding, and somehow these decisions appear in TV personalities with remarkable swiftness.
"Sociology isn't a hard science" is a cover story.
If the reversal on the ban regarding discussion of the lab leak hypothesis gives any indication, Facebook definitively does decide.
Instagram seems to have done the same thing for a bunch of people in the wellness community - pulling them some of them down into the Q-rabbit hole.
I truly despise the algorithms and blame them for most of the problem.
If they were actually trying to change minds, they have so many better tools than an annoying banner that only makes you dig in harder on whatever you already believed.
If this is true now you know why Google has pushed so hard on controlling any information around it.
I wish people would stop seeing politics as inherently filthy, as opposed to other "hobbies" or volunteerships which are clean. Not all politics are about self-interest; I would argue most activists have the greater good in mind (even though they might disagree vehemently on what that is).