I don't think this post means as much as people are acting like it does.
The indicators of being a spambot they have in their post seem VERY iffy to me. "Not tweeting in the past 120 days", "Location set to a non resolving location", "Small number of followers", "default profile image", "No URL in bio or non-resolving URL in bio", "Not on many lists", "tweets in a different language than the person they're following" - Those all seem like extremely weak signals to me. My profile matches 6 of those, and I'm a human. I would like to see them hand-verify a subset of their results and see if their algorithm matches reality.
Also note that they define "active" differently than Twitter. They define "active" as having tweeted recently. Twitter gives spambot numbers as a percent of monetizable daily active users. I wonder if Twitter's given bot numbers are low because bots don't typically lurk or load ads. I can believe that the total bot count as a percentage of users or as a percentage of recently-tweeting-users is higher than 5%, but that only 5% of daily visitors seeing ads are bots.
This is a terrible metric. Real people use the location field for all sorts of non-location purposes, as well as more freeform descriptions of their location that wouldn't resolve mechanically.
Just out of interest, imagine you were in a hot desert. There is a tortoise in front of you. You reach down and you flip the tortoise over on its back. The turtoise lays on its back, its belly baking in the hot sun, beating its legs, trying to turn itself over but it can't, not without for your help. But you're not helping. Why?
My main account literally has “fake account” in the description as sort of a joke, because I really use Twitter exclusively for browsing, and maybe liking/RT’ing stuff.
But hey maybe with these kind of analysis, and rando computer generated / un-appealable bans in the the future the "real accounts" will just mean "very elaborate bot".
Mine as well. I deliberately have not set a profile image, and have not attracted many followers. I probably should not bother with Twitter but I am around and am a real human.
* Twitter estimates its <5% number from human analysis of multi-thousand user random samplings of mDAU
* Twitter allows that number to remain so high to avoid introducing friction like captcha into real users' experiences
* Twitter uses all sorts of internal private data in its analysis
* Parag says you cannot get a reliable indication of bot/not bot without this internal private data
Having just finished building a Twitter analysis tool, I agree with Parag that the Twitter API doesn't provide sufficient clarity to make decisions about spam. This article's analysis doesn't hold up - just because you can name several features you're going to use to generate a spam confidence score about an account does not mean that spam confidence score will have any precision.
Musk responded to Parag's thread with a poop emoji. Not going to lie if I worked at twitter I would be a little nervous about my career at this point. For several reasons including what appears to be the potential for a very hostile work culture in the near term. Musk is being openly antagonistic towards twitter leadership and denigrating the people that work there. Although it does seem more and more like the deal is not going to close. I don't think Musk wants it anymore and is seizing on anything to get out of it.
> Musk is being openly antagonistic towards twitter leadership and denigrating the people that work there.
Seeing how badly twitter has been managed, (for a laugh, check out their "R&D" expenditure), how mush of a loss making enterprise it has been, and how it always at risk of a take over, is it that surprising?
If it has been Elliot Management (the previous rumored takeover threat for twitter) a group far less prone to public display than Musk... would things have been any less different? The only difference is that Musk is being open about what he has been doing, which I see a public good, frankly.
Elliot's track record shows it is far more vicious in layoffs of cuts.
> * Twitter allows that number to remain so high to avoid introducing friction like captcha into real users' experiences
This doesn't pass the smell test in my opinion. Given that everyone who tries to create an account without a phone number has to go through the friction of getting their account locked immediately, they clearly don't care about this sort of friction. Not to mention the friction of just trying to view a tweet which has been discussed at length on HN before.
The difference is between posting a tweet everytime and going through the friction once. That's what he's talking about. You are comparing different things.
I have a Twitter account with > 10k followers that is a few years old. Created without a phone number, and of course, immediately locked. Somehow, managed get it unlocked and still going on a few years later without a phone number. Though, I'm always in fear of the ban hammer.
> Given that everyone who tries to create an account without a phone number has to go through the friction
Yes
> Not to mention the friction of just trying to view a tweet which has been discussed at length on HN before.
Yes
Now imagine filling out 3 captchas every time you open the twitter app on your phone, 1 for ever time you tweet and 1 for every person you follow.
Most users of twitter, use either an app on their phone or the cookies in their browser suffer the friction of forgetting their password (likely password1 btw) because they have to log in so infrequently.
Also, bot networks are so kind as to label themselves with the same hashtag, I don't understand why Twitter doesn't analyze trending topics to detect bots. There's always dormant accounts with thousands of followed/followers who start the propaganda.
But we were talking about bots. You and Parag are shifting the dialogue to spam
> The hard challenge is that many accounts which look fake superficially – are actually real people. And some of the spam accounts which are actually the most dangerous – and cause the most harm to our users – can look totally legitimate on the surface.
He's not talking about detecting bots. I.e. fake and automated accounts. He's talking about twitter users/bots that cause what they perceive to be harmful content. Which is a very different thing, and was the whole point of Musk's intended involvement in the first place.
>* Twitter uses all sorts of internal private data in its analysis
>* Parag says you cannot get a reliable indication of bot/not bot without this internal private data
"I have secret information so trust me" is an excellent reason to reject an assertion every time, whether it is made by an individual, a corporation, or a government. It doesn't mean it isn't true, but it means that absolutely nobody should put any credence in the assertion at all.
> Parag says you cannot get a reliable indication of bot/not bot without this internal private data
That's convenient isn't it?
This report made headlines because it aligns with everyone's experience with Twitter: almost everyone on Twitter is either a bot or a corporate managed account.
That doesn't align with my experience. The vast majority of what I see in my feed is real people. I do know, however, that if I look at the replies to any viral tweet that a large percentage of them will be through fake accounts.
"But you don't know which ones we count as mDAUs" and "accounts that look like spam are actually real" are not as good a defense as he thinks. The product is still affected by spam and fraud even if it's excluded from advertising metrics, and accounts that look like spam are not good for the product either even if they happen to be real people for whatever reason.
Absolutely not. No social media company should take a set of user's private data and "open the data" (especially just because some blowhard is trying to find any reason to back out of a deal). Even without the "open" bit, they shouldn't be providing that data to a third party.
"We undercount active users whose accounts are protected, accounts that view tweets but don’t send any, and accounts that log in and engage in other ways beyond tweeting (like favoriting or adding profiles to lists)."
My markup. If I understand correctly, not having a public tweet is a marker for being a spam account. Isn't that kind of a lot of people? I know from other forums that there's a large ratio between lurkers and active posters.
Given that you need an account to customise your timeline, and, these days, pretty much for just reading a tweet, there may be loads of real reader accounts that never post and never bother customizing their profile.
I remember when I found out I was a bot. It was during a harrowing judgement conflict with an image-based captcha about traffic lights, and caused a complete shutdown of my higher order processes. To this day, I still don't know the threshold of how much traffic light actually needs to be in a square to be considered for selection, so I just stopped logging in to everything.
If you have not tweeted.. you are excluded from this analysis.
A spam account is gonna spam right? But some real users of twitter may only tweet once a year. This study just doesn't include you. It isn't saying you are spam, just not including you in the count.
There are probably many 'fake' or bot accounts that don't tweet; they'd be used to prop up the 'likes' or views of other accounts, either customers paying for exposure, or other bots.
The population of accounts that tweet is going to contain a larger proportion of spammers than the population of active Twitter accounts. Presenting the former as the latter is disingenuous.
Yeah, but it's just clickbait then. An honest title would be "We sampled users that act like bots on Twitter and found out that lots of them were bots".
Yea, I have an account that is literally just for reading 3 peoples' tweets (they're former generals that frequently comment on Russia's invasion of Ukraine). I have their timelines bookmarked, and just read the threads they post.
so much of this thread is 'proof' that supports whatever elon musk is trying to do, but seems to not realize this is a totally independent actor making a dubious set of assumptions to come to a number. This analysis might be interesting because it is timely but it has no bearing on the musk twitter acquisition anymore than me running such an analysis does.
1) They talk about "active" accounts (meaning have tweeted in the last 9 weeks), and do a bunch of filtering against that. That seems like a huge bias - lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely. Frankly, until recently, my twitter account would have been one of the ones they would have discarded as inactive. This one thing alone makes me question all of the rest of their results.
2) By the same token, the rate or frequency with which a user sends tweets has no relation to whether a user is monetizable. If they're seeing ads, they're monetizable...lurkers are just as monetizable as high-volume posters.
You seem to be arguing against something that the article doesn't claim. The article isn't equating inactivity and fake/spam, but that: of the accounts that actively send tweets ~20% are fake/spam.
Sure that's a different question from what proportion of all users are fake/spam, but this is still a perfectly valid question to ask, and the fact that they're only considering active users is in the title so I really don't get your complaint.
If you want an analysis that attempts to answer a different question go find or write one that addresses the question you want answered...
The article clearly states (emphasis mine):
> This represents the largest set of accounts on Twitter we could acquire, but it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit Twitter’s definition of mDAUs (monetizable Daily Active Users).
From the linked Twitter earnings report:
> We define monetizable daily active usage or users (mDAU) as Twitter users who logged in or were otherwise authenticated and accessed Twitter on any given day through Twitter.com or Twitter applications that are able to show ads.
EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.
When you are in the context of :
- Twitter determine the active status of an account using login
- People are wondering the % of active users as defined per the twitter metrics
But then use your own definition of active and write only a one liner on the difference with no reflection on the impact it might have and no warning on the fact you are answering a different question.
Then my conclusion is you want people to make this mistake.
> EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.
Made me laugh because you had to add it and made more effort than the author of the article to prevent the confusion :D.
Interesting. This could be a bracketing error, because I read
> it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit
> Twitter’s definition of mDAUs (monetizable Daily Active Users)
As implying that they think accounts that haven't tweeted in the past 90 days don't fit Twitter's mDAU definition. Given the placement of the qualifying phrase, I think that's a reasonable parsing of the sentence, but I see your point that they could be trying to imply their set doesn't fit the definition. If so, that sentence is very badly constructed.
The article is just clickbait. The title is obviously clickbait (based on your edit you've realized that "active account" !== "accounts that tweet"). Then they try to define active account:
> “Spam or Fake Twitter accounts are those that do not regularly have a human being personally composing the content of their tweets, consuming the activity on their timeline, or engaging in the Twitter ecosystem.”
Ok, but "consuming the activity on their timeline" is essentially unknowable outside of Twitter, since you can't see what tweets people are viewing. It turns out they're trying to infer this through some other signals like follower count, etc. But you can imagine why that might be sketchy.
Then they constrain the analysis:
> A more fair assessment of Mr. Musk’s Twitter following would only include accounts that have tweeted in the past 90 days
Let's be real, if you look at a list of Elon tweet replies, they might as well all be spam. Just search @elonmusk and sort by latest. Then compare that to the sorted tweet replies under an actual tweet. IDK how many millions of dollars and man-hours went into the AI that sorted this list, but it seems to just be putting the blue checks at the top and shrugging at the rest. I doubt this three man team is doing any better at spam detection.
For manipulation / spam purposes I don't really care about accounts that don't actively post/like/retweet/follow. The mDAU isn't useful at all for determining if the activity on Twitter is done largely by bots.
I do wonder how "fake" is calculated. Is @tweetsfrommydog fake? It's a real person making tweets that are funny and provide value to the platform, but it's not a real person as an individual tweeting their personal thoughts, are corporate accounts or parody accounts fake?
It is valid criticism because the context of this article is that Elon Musk wants to know whether Twitter's own claims of ~5% fake/spam accounts is accurate. We do really want an analysis that investigates that precise question and not a related one.
Lurkers are also the most important people. They consume the content. They are the meat of the business, the ones that respond to advertising and political messaging. If I were twitter I would champion all the lurker accounts, all the eyeballs to which twitter serves content. Nobody ever faulted the Nielson ratings scheme for "lurker" viewers who only watched but didn't themselves create television shows.
Definitely agree. I joined Twitter four months ago. I haven't tweeted yet, but I'm reading it daily on the app and occasionally liking tweets.
I've been so surprised at how effective the advertising has been on me. I've never experienced this level of engagement with online marketing.
Ads for TV shows, movies, live shows, musicians and comedians have been particularly effective.
I've found myself following a lot of show writers I've never heard of, and I even signed up for some new streaming services because of it. Google and Facebook ads never felt like they impacted me, though I know how important and dominant they are to business marketers. I've never clicked on a banner ad and my eyes glaze over sponsored links. Twitter's level of engagement with their marketing content is new to me, and I'm impressed.
And I thought it was common knowledge that lurkers always vastly outnumber people who post content on any platform. If lurkers outnumber posters by at least 3:1, then 20% goes to 5% and twitter’s “<5%” figure is accurate.
Lurkers are probably anywhere between 8-12:1. People actually posting stuff on the internet are in the vast, vast minority, creates a sort of echo chamber.
I am technically "logged into" twitter so I can click through and read the postage stamp-sized charts linked to through various articles and blogs, or watch a video about a riot in some far flung part of the planet. Once a year I tweet at airlines when they lose my luggage or whatever but otherwise don't tweet. Twitter isn't a good social media service, it just happens to be the image/video sharing platform of choice for journalists to promote themselves.
I created an account 5 years ago, followed one or two people, got bored and never logged in again.
Presumably their intention is to exclude abandoned accounts, like mine - is there any way they, viewing Twitter externally, could tell lurker accounts like yours and abandoned accounts like mine apart?
As a third party? Probably not. Which is why it's going to be very hard to disprove Twitter's assertion unless Twitter chooses to share their data.
That's part of why I find articles like this frustrating: I don't think they have the data to actually answer they question they're attempting to answer. Knowing that, what's the purpose of the article?
They could maybe use like activity in addition to just tweets? Inherently though this system is going to be less accurate than the dataset that Twitter has access to. If a large chunk of users only engage in Twitter through DMs then an external organization isn’t going to have insight into that.
I would imagine Twitter would have access to analytics that third parties don't have, which would allow them to pretty easily work out which accounts are logged in and used for browsing and which are actually abandoned.
If an account is in lurk mode, then its not a spammer so I'm okay with it being left out of that equation.
Where I might agree with you is a lurk mode account could become collateral damage in being considered fake. Lurkers don't retweet though. An account with a million followers isn't seen by everyone. Having a portion of that million like/retweet amplifies even further with their network now possibly seeing something from someone they are not following directly.
I'd be willing to accept that the number of lurkers that get lumped in with fake accounts when deciding the percentage of actual eyeballs on posts is not harmful. Those numbers are made up stats anyways. Like the old days of TV/Radio stations that covered large cities with millions of citizens. They would claim they have an audience in the millions even though a small fraction were actually watching/listening.
Except the question isn't about the pure number of spam/bot accounts, it's about the ratio of spam/bots to "authentic" users. If you leave out the lurkers, that ratio gets skewed to mistakenly inflate the bot count.
Edit: I guess it's true that lurkers won't be bots, unless they are clicking on ads or trying to simulate engagement to help certain twitter accounts seem popular.
That means that 20% of the posts that I see, as a lurker, are generated by bots. The bots are having a huge influence on conversations, and that's important to know.
> That means that 20% of the posts that I see, as a lurker, are generated by bots
I don't see how you can arrive at this conclusion. It depends on who you are following, with some additions by the algorithm (unless you use the chronological feed) and (speculating here) the algo pushes content from real humans.
No, since you choose who you follow, you're most likely filtering for interesting stuff. I'd wager that most of the spam bots are pretty obvious to spot, and makes up very little of a user's feed.
I don't know how many original tweets are made by bots but 20% of the replies to anyone with a 5 figure follower count seems to fall on the low side of what I would guess.
I was offered $300 for my twitter account, I suppose partially on the basis that I haven’t tweeted much, but I use it daily to weekly though don’t tweet often, one tweet in last 2 years or so.
Well, I've been actively trying to create a new Twitter account for a little under a month and Twitter thinks I'm a bot. I've made 1 tweet and followed 5 people.
Even paid for Twitter Blue...still thinks I'm not real. Support is unreachable.
My current plan is to wait til Elon completes the takeover and then build an entire site dedicated to getting Elon's attention to unlock my account...because that's the only way to contact somebody apparently.
Edit add: I find it horrible that we have companies that you can not contact, in fact they seem to be going out of their way to make hard to contact them.
Even things you pay money for, like airline tickets. They want you to email them, make the phone number hard to find. So you do, they don't respond and then you have to search and call them, wait an hour or more on hold. The agents are nice but the entire process is terrible.
Earlier I had to do that for a damaged luggage claim. Went through the automated phone assistant to get to damaged luggage claims and it gave the option to use text messages. So I give it a try, nope. They can't resolve the issue through text, has to be on the phone. So I had to call back, re-enter all the info through the automated system and then ignore it's pleadings to use the text system.
Probably forced to since they do not have access to login information. Especially since if you do not post but login you are certainly not a spammer ^^, could still be bot crawling.
But they probably should expand more on this and reflect on how much inaccuracy it adds. With a quick search you can find that less 50% of US users tweet five times a month (https://www.pewresearch.org/fact-tank/2022/03/16/5-facts-abo...). Or the study which, reported that the top 25% of user produce 97% of the content, the median user of the bottom 75% as posting 0 tweet a month (https://www.pewresearch.org/internet/2021/11/15/2-comparing-...). Those studies were done using survey I believe so should include only active users and no spam/bot.
So with random invalid maths, if you make the assumption that the 25% less active users might not even post every two month (exponential decrease of activity ?) then you need to add back a quarter of the 80% they found as active.
Not to say I believe the 5% number from twitter; and I was going to use the price for a thousands follower as an example, but seeing it appears to be at 30$ now (https://socialboss.org/buy-twitter-followers/ ?) when I remembered it at like 5$ then the twitter team might have done some good work ;).
But one can say that 20% of the content on the platform was distributed by bots.
Meaning that all the Lurkers have to consider if they are really interested in content, that was pushed by some bot-farms.
Technically, every user of this platform has to take a step back and evaluate, if anything they have seen is not pushed content by some bots.
20% is huge and I am curios if there will ever be some comparable "official" numbers to that.
No - you can say that 20% of the accounts actively posting are spam/bots.
It's possible they are posting MUCH more or less than 20% of the content.
If these are skewed toward the high end of producers - the 80/20 rule would say that as much as 80% of the content could come from them. Still - it's possible this content isn't interacted with much outside of other bots. You can't draw many conclusions from such a limited data point.
There was this suggestion to conduct a sting operation of displaying captcha to a sample of users to determine the % of the bots.
Probably picking the sample is still challenging but at least can somewhat tell if the accounts in the sample are genuine.
The method in this article is so flawed that Larry Ellison, founder of a famous law firm, would count as an inactive account since haven't tweeted since 2012[0] and that person apparently looks into investing in Twitter[1]. How can be investing a billion in Twitter when he doesn't use Twitter at all?
They point out that's their definitions of active accounts is a flaw in their methodology (inside the article). However, I think it's fair to say that while TWTR has better internal insight into an "active user", it's the best approximation one can do from the outside.
I do wonder about, given perfect knowledge, how the bot accounts would shake up. What percentage produce content (presumably propaganda, automatic tweets using it as an RSS like announcement service, and spam) vs follow people (boost follow accounts, sell likes)?
>They talk about "active" accounts (meaning have tweeted in the last 9 weeks), and do a bunch of filtering against that. That seems like a huge bias - lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely.
All true. However, do you really believe that a bot is more likely to be active than a real user? If so, fair play to you. If not, then we would expect inactive users to be bots in an even greater proportion than what we see among active users.
We can argue about what the article did and didn't imply, but what's interesting to me about the issue you raise is that among lurkers there is probably a much lower rate of fake/spam activity, since there are fewer reasons for a bot to log in and not tweet. Couple that with the fact that lurkers are generally the vast majority of users on any platform, and that alone could explain the discrepancy between Twitter's 5% number and SparkToro's 20%.
Services that sell followers and spammers "aging" accounts generally would look like lurkers. Twitter could probably get an accurate estimate with the amount of analytics they have for internal use only, but of course they might be incentivized to not try very hard.
> lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely.
I've spent many, many hours lurking on twitter, don't have an account at all, and mostly access it through nitter instances. Are they "biased" for not including me?
edit: should inactive users be counted as active users?
Yeah, and I fully expect that these numbers went up recently with Twitter requiring login to view threads.
The fact that they add a .42% is a red flag in itself, especially when they admit in their own post that they agree that their analysis is deserving of critique. Very misleading stuff.
Their analysis using purchased bots seems a bit more reasonable.
“Passive” accounts may actually be more likely to be bots as many services sell fake followers. It’s just harder to detect with public information rather than their IP addressees etc.
Similarly I don’t think there is any way to separate active vs abandoned passive accounts as a 3rd party.
> They talk about "active" accounts (meaning have tweeted in the last 9 weeks),
This is not their definition, that's what Twitter considers an active account in their revenue reports.
> has no relation
It has some relation, no? I wouldn't be surprised if there is a strong correlation between how frequently a user sends tweets and how monetizable that user is.
their TL;DAbstract refers to this as a 'conservative' methodology, that is 'rigorous', and 'likely undercounts.
Their definition:
> “Spam or Fake Twitter accounts are those that do not regularly have a human being personally composing the content of their tweets, consuming the activity on their timeline, or engaging in the Twitter ecosystem.”
They note the following to differentiate fake and spam:
> Many “fake” accounts under this definition are neither nefarious nor problematic. ... By contrast, most “spam” accounts are an unwanted nuisance.
Some general data analytics notes from their post:
* Then lump together fake and spam in their analysis - and this really matters! somewhere like NYT is both 'fake' meaning it isn't a real person and A HIGHLY VALUABLE ACCOUNT for twitter to have.
* They use a sample of 44,058 accounts (of ~1.047B)
* They look at a number of classifying variables (17), spam accounts met 10+ of those 17 criteria. They don't list all 17.
* The criteria were developed from a "machine learning process" that is undescribed, and was developed from a sample of 35,000 'known' fake twitter followers bought from 3 vendors and 50,000 claimed non-spam accounts. They appear (imply?) to have used 50% training 50% real data but dont't specify explicitly.
* They say their model is about 65% accurate, and unlikely to produce false positives ("almost never includes false positives") - however they don't list any specificity, sensitivity, etc. that would be useful to evaluating that claim.
* The analysis does no statistical tests, no confidence intervals, minimal information about how the model was tested or validated.
* Critically: they note, but do not describe or quantify, that a lot of the criteria are highly correlated
* then later in the article they suddenly seem to switch to a 10 point scale for quality away from their 17 point scale? with a threshold of 3 or below as low quality?
* My personal twitter account meets most of the metrics where they have listed a quantifiable threshold. And their fake followers tool lists it as pretty f'ing suspicious - i.e., low quality.
I'm not saying there wrong but I am saying good luck getting this from a blog post to any sort of respectable science publication. As they note at the end, they aren't even calculating the same metric - twitter uses monetizable daily active users - remember NYtimes? Absolutely a monetizable account - even if it isn't a real person.
anyone who thinks this is proof of Elon's 4D chess based on this article is, to me, frankly delusional.
Turning on my cynicism switch on a bit. The author is a very good content marketer. A hot topic in our corner of the world — which is author’s target audience — is Elon Musk buying Twitter. Musk tweeted that the percentage of bots is the main issue of the deal. He disputed Twitter’s number of 5%.
I believe the author writing prompt was just: a headline about fake Twitter accounts showing a number significantly higher that 5%. That’s it. Whatever the methodology, that was the author’s goal.
The article achieved this goal. Otherwise is completely irrelevant. Even for the person who wrote it.
My account was active until recently (deleted when Twitter accepted Musk's offer, I don't need to be a participant in a right wing cesspool). I have 0 tweets. I don't like things, because I don't want my name attached to someone else.
This study is a great example of how you can use the data you have available to talk yourself into your conclusions. The implicit point of the study is to refute the "5% of Twitter accounts are spam" stat from Twitter's 10-Q that was the basis of his putting the twitter acquisition "on hold".
Except - the baseline that they choose is entirely NOT comparable to that of Twitter's baseline. The study says:
> Followerwonk selected a random sample from only those accounts that had public tweets published to their profile in the last 90 days, a clear indication of “activity.” Further, Followerwonk regularly updates its profile database (every 30 days) to remove any protected or deleted accounts. We believe this sample is both large enough in size to be statistically significant, and curated to most closely resemble what Twitter might consider a monetizable Daily Active User (mDAU).
Except that we know what Twitter defines as a monetizable DAU:
> We define monetizable daily active usage or users (mDAU) as Twitter users who logged in and accessed Twitter on any given day through Twitter.com or Twitter applications that are able to show ads.
Nothing about posting, nothing about engagement at all - simply: were you able to see an ad?
So there isn't any reason to claim that this "might" represent what Twitter uses as an mDAU - we know, in fact, that is not how they measure it. A more honest statement would have been:
"We selected a random sample of (etc. etc.). We believe that this sample is large enough to be significantly significant, however, it can not be compared to Twitter's mDAU set, as it does not count passive consumers of Twitter content. Instead, this data can be used to suggest that a significant amount of the total posted content on Twitter is delivered by bots"
My guess is the number of consumers of content is greater than the posters of content by several orders of magnitude, though some of that would be mitigated by the longer time horizon.
The whole thing is BS. Twitter could have easily got an accurate number if they wanted to. And Elon could have forced them to do that as part of the deal or done a decent job on his own before the offer.
Both sides are bullshitting to negotiate a better price.
"We have performed an internal review of a sample of accounts and estimate that the average of false or spam accounts during the third quarter of 2020 represented fewer than 5% of our mDAU during the quarter. The false or spam accounts for a period represents the average of false or spam accounts in the samples during each monthly analysis period during the quarter"
"Are you challenging Twitter’s earnings report, saying that <5% of mDAUs are fake/spam?
We are not disputing Twitter’s claim. There’s no way to know what criteria Twitter uses to identify a “monetizable daily active user” (mDAU) nor how they classify “fake/spam” accounts. We believe our methodology (detailed above) to be the best system available to public researchers. But, internally, Twitter likely has unknowable processes that we cannot replicate with only their public data."
You only need to see who the author of this post is to know that the methodology is crap, the numbers are likely made up (19.42% is WAY too specific), and the post is just a grab for media attention on the coattails of some other internet meme garbage.
This guy (Rand Fishkin) has been selling SEO as a religion for the better part of this century, and is in no small part responsible for all the search-result-garbage style websites everyone is complaining about elsewhere on HN today and every other day.
He's a third-rate market-bro hack that's been taking advantage of web professionals who get thrown into SEO/Marketing jobs and have no idea what they're doing by relentlessly shoving half-assed corporate strategies through moz.com and now his new sparktoro.com, and calling himself the great SEO redeemer.
Wanna question his methodology? There is none. Wanna question his science? Totally devoid.
Unfortunately, this sort of thing is absolutely standard when third parties discuss Twitter spam. There is an astounding amount of academic research doing the same thing with methodologies just as silly.
I signed up with a vpn and got banned for life after a single nonsense tweet about not liking the feed and following 4-5 famous people. I don’t think bot detection techniques are very robust
> This methodology likely undercounts spam and fake accounts, but almost never includes false positives (i.e. claiming an account is fake when it isn’t).
In other words, their model performs well on their training set, and they don’t acknowledge that it may be over fitted or mislabeled, and they hand wave mistakes
they're robust at generating false positives! i got insta-locked for, i think, signing up using my own email domain, to make an account that shares pictures i took of geese. it took about 4 months to wait for them to unlock it
meanwhile, saudi bot armies apparently basically run rampant across the platform
Its get even worse than that. I have a test account with about 10,000 followers (mostly real people, you can tell), created via home-used IP, and few days after Musk tweet comparing Gates to Emoji of pregnant man, I did exact same tweet! With exact same punctuation, images, icons, everything.
Less than 2 days later I am banned for "spewing hatred based on sex, gender or religion". No amount of replies that my tweet was same as another but much much more popular account helped. Banned for life.
I'm 6 months into my account and they still haven't asked for a phone number. I mean, I have one ready if they ask for it. Maybe I've been whitelisted as a non-bad-faith actor due to my timezone and other factors.
They define active as having tweeted in the last 90 days.
I'm active on twitter, check it every other day, follow @ElonMusk but I don't tweet. Perhaps I'm a unique case, or their assumptions are a bit off.
Some metrics they consider suspicious (from the article):
1) Accounts that didn't tweet recently
2) Accounts with low number of tweets
3) Accounts with a low number of followers
4) Accounts that didn't set up their own profile image
Lurking != bot, and these data-points would all hit high for lurkers. I'm somewhat suspicious of their results, especially given the results from this pew study suggesting the majority of twitter users don't tweet very much.
I am the same, in fact my account has never posted a thing ever, never liked at thing ever, I just follow people I want to see updates from that is all.
I think it's connected to a more general phenomenon though. Pewdiepie has 111 million subscribers on YT and gets like 3-5 million views per video. Like 95% of his subscribers don't watch his videos.
The indicators of being a spambot they have in their post seem VERY iffy to me. "Not tweeting in the past 120 days", "Location set to a non resolving location", "Small number of followers", "default profile image", "No URL in bio or non-resolving URL in bio", "Not on many lists", "tweets in a different language than the person they're following" - Those all seem like extremely weak signals to me. My profile matches 6 of those, and I'm a human. I would like to see them hand-verify a subset of their results and see if their algorithm matches reality.
Also note that they define "active" differently than Twitter. They define "active" as having tweeted recently. Twitter gives spambot numbers as a percent of monetizable daily active users. I wonder if Twitter's given bot numbers are low because bots don't typically lurk or load ads. I can believe that the total bot count as a percentage of users or as a percentage of recently-tweeting-users is higher than 5%, but that only 5% of daily visitors seeing ads are bots.
This is a terrible metric. Real people use the location field for all sorts of non-location purposes, as well as more freeform descriptions of their location that wouldn't resolve mechanically.
Just out of interest, imagine you were in a hot desert. There is a tortoise in front of you. You reach down and you flip the tortoise over on its back. The turtoise lays on its back, its belly baking in the hot sun, beating its legs, trying to turn itself over but it can't, not without for your help. But you're not helping. Why?
The real question is how strong am I to flip a hundred fifty pound tortoise without injury?
Meanwhile, several accounts of mine that were predominantly run with bots would have passed as human with ease when they were active.
Are you sure you aren't a bot?
But hey maybe with these kind of analysis, and rando computer generated / un-appealable bans in the the future the "real accounts" will just mean "very elaborate bot".
Bots tweet and they usually have some sort of generic profile picture, so their methodology wouldn't even account for real bots. Bad.
Regardless, I do think that there are a lot of bots in TW and they are definitely more than 5% of total users.
Deleted Comment
"WTF, my account was closed because I didn't tweet in four months."
You can see it here (https://twitter.com/paraga/status/1526237578843672576).
Noteworthy highlights:
* Twitter estimates its <5% number from human analysis of multi-thousand user random samplings of mDAU
* Twitter allows that number to remain so high to avoid introducing friction like captcha into real users' experiences
* Twitter uses all sorts of internal private data in its analysis
* Parag says you cannot get a reliable indication of bot/not bot without this internal private data
Having just finished building a Twitter analysis tool, I agree with Parag that the Twitter API doesn't provide sufficient clarity to make decisions about spam. This article's analysis doesn't hold up - just because you can name several features you're going to use to generate a spam confidence score about an account does not mean that spam confidence score will have any precision.
Seeing how badly twitter has been managed, (for a laugh, check out their "R&D" expenditure), how mush of a loss making enterprise it has been, and how it always at risk of a take over, is it that surprising?
If it has been Elliot Management (the previous rumored takeover threat for twitter) a group far less prone to public display than Musk... would things have been any less different? The only difference is that Musk is being open about what he has been doing, which I see a public good, frankly.
Elliot's track record shows it is far more vicious in layoffs of cuts.
----
https://fortune.com/2013/10/25/why-is-twitter-spending-so-mu...
https://www.rndtoday.co.uk/latest-news/is-twitters-rd-provid...
https://www.axios.com/2021/11/30/jack-dorsey-twitter-departu...
https://www.forbes.com/sites/kevindowd/2022/02/27/wall-stree...
Deleted Comment
This doesn't pass the smell test in my opinion. Given that everyone who tries to create an account without a phone number has to go through the friction of getting their account locked immediately, they clearly don't care about this sort of friction. Not to mention the friction of just trying to view a tweet which has been discussed at length on HN before.
Yes
> Not to mention the friction of just trying to view a tweet which has been discussed at length on HN before.
Yes
Now imagine filling out 3 captchas every time you open the twitter app on your phone, 1 for ever time you tweet and 1 for every person you follow.
Most users of twitter, use either an app on their phone or the cookies in their browser suffer the friction of forgetting their password (likely password1 btw) because they have to log in so infrequently.
> The hard challenge is that many accounts which look fake superficially – are actually real people. And some of the spam accounts which are actually the most dangerous – and cause the most harm to our users – can look totally legitimate on the surface.
He's not talking about detecting bots. I.e. fake and automated accounts. He's talking about twitter users/bots that cause what they perceive to be harmful content. Which is a very different thing, and was the whole point of Musk's intended involvement in the first place.
Outside of that, bots cause the same problems that real people do, making twitter a place people don't want to spend time on/view ads on.
The advertisers need both bad groups removed
>* Parag says you cannot get a reliable indication of bot/not bot without this internal private data
"I have secret information so trust me" is an excellent reason to reject an assertion every time, whether it is made by an individual, a corporation, or a government. It doesn't mean it isn't true, but it means that absolutely nobody should put any credence in the assertion at all.
That's convenient isn't it?
This report made headlines because it aligns with everyone's experience with Twitter: almost everyone on Twitter is either a bot or a corporate managed account.
Twitter CEO: “Let’s talk about spam, with the benefit of data, context” - https://news.ycombinator.com/item?id=31399913 - May 2022 (13 comments)
Riiiight, and Craig Wright claims to have proof of being Satoshi Nakamoto but won't show anybody.
I don't know how good of a CEO Parag is, but he's not a very good bullshitter.
My markup. If I understand correctly, not having a public tweet is a marker for being a spam account. Isn't that kind of a lot of people? I know from other forums that there's a large ratio between lurkers and active posters.
Given that you need an account to customise your timeline, and, these days, pretty much for just reading a tweet, there may be loads of real reader accounts that never post and never bother customizing their profile.
A spam account is gonna spam right? But some real users of twitter may only tweet once a year. This study just doesn't include you. It isn't saying you are spam, just not including you in the count.
I'd almost certainly be marked as a bot.
1) They talk about "active" accounts (meaning have tweeted in the last 9 weeks), and do a bunch of filtering against that. That seems like a huge bias - lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely. Frankly, until recently, my twitter account would have been one of the ones they would have discarded as inactive. This one thing alone makes me question all of the rest of their results.
2) By the same token, the rate or frequency with which a user sends tweets has no relation to whether a user is monetizable. If they're seeing ads, they're monetizable...lurkers are just as monetizable as high-volume posters.
Sure that's a different question from what proportion of all users are fake/spam, but this is still a perfectly valid question to ask, and the fact that they're only considering active users is in the title so I really don't get your complaint.
If you want an analysis that attempts to answer a different question go find or write one that addresses the question you want answered...
The article clearly states (emphasis mine):
> This represents the largest set of accounts on Twitter we could acquire, but it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit Twitter’s definition of mDAUs (monetizable Daily Active Users).
From the linked Twitter earnings report:
> We define monetizable daily active usage or users (mDAU) as Twitter users who logged in or were otherwise authenticated and accessed Twitter on any given day through Twitter.com or Twitter applications that are able to show ads.
EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.
The fact that you had to do this proves the point. Nobody defines "active" the way they have here. The claim is nonsense.
Nope. ~20% of accounts that tweets are fake. A lurker (aka read-only) is by all meanings an active account.
But then use your own definition of active and write only a one liner on the difference with no reflection on the impact it might have and no warning on the fact you are answering a different question. Then my conclusion is you want people to make this mistake.
> EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.
Made me laugh because you had to add it and made more effort than the author of the article to prevent the confusion :D.
> it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit
> Twitter’s definition of mDAUs (monetizable Daily Active Users)
As implying that they think accounts that haven't tweeted in the past 90 days don't fit Twitter's mDAU definition. Given the placement of the qualifying phrase, I think that's a reasonable parsing of the sentence, but I see your point that they could be trying to imply their set doesn't fit the definition. If so, that sentence is very badly constructed.
> “Spam or Fake Twitter accounts are those that do not regularly have a human being personally composing the content of their tweets, consuming the activity on their timeline, or engaging in the Twitter ecosystem.”
Ok, but "consuming the activity on their timeline" is essentially unknowable outside of Twitter, since you can't see what tweets people are viewing. It turns out they're trying to infer this through some other signals like follower count, etc. But you can imagine why that might be sketchy.
Then they constrain the analysis: > A more fair assessment of Mr. Musk’s Twitter following would only include accounts that have tweeted in the past 90 days
Let's be real, if you look at a list of Elon tweet replies, they might as well all be spam. Just search @elonmusk and sort by latest. Then compare that to the sorted tweet replies under an actual tweet. IDK how many millions of dollars and man-hours went into the AI that sorted this list, but it seems to just be putting the blue checks at the top and shrugging at the rest. I doubt this three man team is doing any better at spam detection.
I've been so surprised at how effective the advertising has been on me. I've never experienced this level of engagement with online marketing. Ads for TV shows, movies, live shows, musicians and comedians have been particularly effective.
I've found myself following a lot of show writers I've never heard of, and I even signed up for some new streaming services because of it. Google and Facebook ads never felt like they impacted me, though I know how important and dominant they are to business marketers. I've never clicked on a banner ad and my eyes glaze over sponsored links. Twitter's level of engagement with their marketing content is new to me, and I'm impressed.
Those are indeed incredibly valuable. Engaged audience = your real audience.
I am technically "logged into" twitter so I can click through and read the postage stamp-sized charts linked to through various articles and blogs, or watch a video about a riot in some far flung part of the planet. Once a year I tweet at airlines when they lose my luggage or whatever but otherwise don't tweet. Twitter isn't a good social media service, it just happens to be the image/video sharing platform of choice for journalists to promote themselves.
I created an account 5 years ago, followed one or two people, got bored and never logged in again.
Presumably their intention is to exclude abandoned accounts, like mine - is there any way they, viewing Twitter externally, could tell lurker accounts like yours and abandoned accounts like mine apart?
That's part of why I find articles like this frustrating: I don't think they have the data to actually answer they question they're attempting to answer. Knowing that, what's the purpose of the article?
No. Which is why the only reasonable thing to say as an external party is "we don't know."
Where I might agree with you is a lurk mode account could become collateral damage in being considered fake. Lurkers don't retweet though. An account with a million followers isn't seen by everyone. Having a portion of that million like/retweet amplifies even further with their network now possibly seeing something from someone they are not following directly.
I'd be willing to accept that the number of lurkers that get lumped in with fake accounts when deciding the percentage of actual eyeballs on posts is not harmful. Those numbers are made up stats anyways. Like the old days of TV/Radio stations that covered large cities with millions of citizens. They would claim they have an audience in the millions even though a small fraction were actually watching/listening.
Edit: I guess it's true that lurkers won't be bots, unless they are clicking on ads or trying to simulate engagement to help certain twitter accounts seem popular.
I don't see how you can arrive at this conclusion. It depends on who you are following, with some additions by the algorithm (unless you use the chronological feed) and (speculating here) the algo pushes content from real humans.
Deleted Comment
Even paid for Twitter Blue...still thinks I'm not real. Support is unreachable.
My current plan is to wait til Elon completes the takeover and then build an entire site dedicated to getting Elon's attention to unlock my account...because that's the only way to contact somebody apparently.
Edit add: I find it horrible that we have companies that you can not contact, in fact they seem to be going out of their way to make hard to contact them.
Even things you pay money for, like airline tickets. They want you to email them, make the phone number hard to find. So you do, they don't respond and then you have to search and call them, wait an hour or more on hold. The agents are nice but the entire process is terrible.
Earlier I had to do that for a damaged luggage claim. Went through the automated phone assistant to get to damaged luggage claims and it gave the option to use text messages. So I give it a try, nope. They can't resolve the issue through text, has to be on the phone. So I had to call back, re-enter all the info through the automated system and then ignore it's pleadings to use the text system.
But they probably should expand more on this and reflect on how much inaccuracy it adds. With a quick search you can find that less 50% of US users tweet five times a month (https://www.pewresearch.org/fact-tank/2022/03/16/5-facts-abo...). Or the study which, reported that the top 25% of user produce 97% of the content, the median user of the bottom 75% as posting 0 tweet a month (https://www.pewresearch.org/internet/2021/11/15/2-comparing-...). Those studies were done using survey I believe so should include only active users and no spam/bot.
So with random invalid maths, if you make the assumption that the 25% less active users might not even post every two month (exponential decrease of activity ?) then you need to add back a quarter of the 80% they found as active.
Not to say I believe the 5% number from twitter; and I was going to use the price for a thousands follower as an example, but seeing it appears to be at 30$ now (https://socialboss.org/buy-twitter-followers/ ?) when I remembered it at like 5$ then the twitter team might have done some good work ;).
20% is huge and I am curios if there will ever be some comparable "official" numbers to that.
It's possible they are posting MUCH more or less than 20% of the content.
If these are skewed toward the high end of producers - the 80/20 rule would say that as much as 80% of the content could come from them. Still - it's possible this content isn't interacted with much outside of other bots. You can't draw many conclusions from such a limited data point.
Probably picking the sample is still challenging but at least can somewhat tell if the accounts in the sample are genuine.
The method in this article is so flawed that Larry Ellison, founder of a famous law firm, would count as an inactive account since haven't tweeted since 2012[0] and that person apparently looks into investing in Twitter[1]. How can be investing a billion in Twitter when he doesn't use Twitter at all?
[0]https://twitter.com/larryellison?lang=en
[1]https://www.grid.news/story/politics/2022/05/16/larry-elliso...
I do wonder about, given perfect knowledge, how the bot accounts would shake up. What percentage produce content (presumably propaganda, automatic tweets using it as an RSS like announcement service, and spam) vs follow people (boost follow accounts, sell likes)?
All true. However, do you really believe that a bot is more likely to be active than a real user? If so, fair play to you. If not, then we would expect inactive users to be bots in an even greater proportion than what we see among active users.
I've spent many, many hours lurking on twitter, don't have an account at all, and mostly access it through nitter instances. Are they "biased" for not including me?
edit: should inactive users be counted as active users?
The fact that they add a .42% is a red flag in itself, especially when they admit in their own post that they agree that their analysis is deserving of critique. Very misleading stuff.
Their analysis using purchased bots seems a bit more reasonable.
Similarly I don’t think there is any way to separate active vs abandoned passive accounts as a 3rd party.
This is not their definition, that's what Twitter considers an active account in their revenue reports.
> has no relation
It has some relation, no? I wouldn't be surprised if there is a strong correlation between how frequently a user sends tweets and how monetizable that user is.
Their definition:
> “Spam or Fake Twitter accounts are those that do not regularly have a human being personally composing the content of their tweets, consuming the activity on their timeline, or engaging in the Twitter ecosystem.”
They note the following to differentiate fake and spam: > Many “fake” accounts under this definition are neither nefarious nor problematic. ... By contrast, most “spam” accounts are an unwanted nuisance.
Some general data analytics notes from their post:
* Then lump together fake and spam in their analysis - and this really matters! somewhere like NYT is both 'fake' meaning it isn't a real person and A HIGHLY VALUABLE ACCOUNT for twitter to have.
* They use a sample of 44,058 accounts (of ~1.047B)
* They look at a number of classifying variables (17), spam accounts met 10+ of those 17 criteria. They don't list all 17.
* The criteria were developed from a "machine learning process" that is undescribed, and was developed from a sample of 35,000 'known' fake twitter followers bought from 3 vendors and 50,000 claimed non-spam accounts. They appear (imply?) to have used 50% training 50% real data but dont't specify explicitly.
* They say their model is about 65% accurate, and unlikely to produce false positives ("almost never includes false positives") - however they don't list any specificity, sensitivity, etc. that would be useful to evaluating that claim.
* The analysis does no statistical tests, no confidence intervals, minimal information about how the model was tested or validated.
* Critically: they note, but do not describe or quantify, that a lot of the criteria are highly correlated
* then later in the article they suddenly seem to switch to a 10 point scale for quality away from their 17 point scale? with a threshold of 3 or below as low quality?
* My personal twitter account meets most of the metrics where they have listed a quantifiable threshold. And their fake followers tool lists it as pretty f'ing suspicious - i.e., low quality.
I'm not saying there wrong but I am saying good luck getting this from a blog post to any sort of respectable science publication. As they note at the end, they aren't even calculating the same metric - twitter uses monetizable daily active users - remember NYtimes? Absolutely a monetizable account - even if it isn't a real person.
anyone who thinks this is proof of Elon's 4D chess based on this article is, to me, frankly delusional.
I believe the author writing prompt was just: a headline about fake Twitter accounts showing a number significantly higher that 5%. That’s it. Whatever the methodology, that was the author’s goal.
The article achieved this goal. Otherwise is completely irrelevant. Even for the person who wrote it.
It's almost like Inception isn't it? A PR stunt within a PR stunt within a PR stunt.
Deleted Comment
Except - the baseline that they choose is entirely NOT comparable to that of Twitter's baseline. The study says:
> Followerwonk selected a random sample from only those accounts that had public tweets published to their profile in the last 90 days, a clear indication of “activity.” Further, Followerwonk regularly updates its profile database (every 30 days) to remove any protected or deleted accounts. We believe this sample is both large enough in size to be statistically significant, and curated to most closely resemble what Twitter might consider a monetizable Daily Active User (mDAU).
Except that we know what Twitter defines as a monetizable DAU:
> We define monetizable daily active usage or users (mDAU) as Twitter users who logged in and accessed Twitter on any given day through Twitter.com or Twitter applications that are able to show ads.
Nothing about posting, nothing about engagement at all - simply: were you able to see an ad?
So there isn't any reason to claim that this "might" represent what Twitter uses as an mDAU - we know, in fact, that is not how they measure it. A more honest statement would have been:
"We selected a random sample of (etc. etc.). We believe that this sample is large enough to be significantly significant, however, it can not be compared to Twitter's mDAU set, as it does not count passive consumers of Twitter content. Instead, this data can be used to suggest that a significant amount of the total posted content on Twitter is delivered by bots"
My guess is the number of consumers of content is greater than the posters of content by several orders of magnitude, though some of that would be mitigated by the longer time horizon.
Both sides are bullshitting to negotiate a better price.
https://d18rn0p25nwr6d.cloudfront.net/CIK-0001418091/cb1d93d...
"We have performed an internal review of a sample of accounts and estimate that the average of false or spam accounts during the third quarter of 2020 represented fewer than 5% of our mDAU during the quarter. The false or spam accounts for a period represents the average of false or spam accounts in the samples during each monthly analysis period during the quarter"
This is not a new stat or new information.
"Are you challenging Twitter’s earnings report, saying that <5% of mDAUs are fake/spam?
This guy (Rand Fishkin) has been selling SEO as a religion for the better part of this century, and is in no small part responsible for all the search-result-garbage style websites everyone is complaining about elsewhere on HN today and every other day.
He's a third-rate market-bro hack that's been taking advantage of web professionals who get thrown into SEO/Marketing jobs and have no idea what they're doing by relentlessly shoving half-assed corporate strategies through moz.com and now his new sparktoro.com, and calling himself the great SEO redeemer.
Wanna question his methodology? There is none. Wanna question his science? Totally devoid.
https://blog.plan99.net/fake-science-part-ii-bots-that-are-n...
> This methodology likely undercounts spam and fake accounts, but almost never includes false positives (i.e. claiming an account is fake when it isn’t).
In other words, their model performs well on their training set, and they don’t acknowledge that it may be over fitted or mislabeled, and they hand wave mistakes
meanwhile, saudi bot armies apparently basically run rampant across the platform
Less than 2 days later I am banned for "spewing hatred based on sex, gender or religion". No amount of replies that my tweet was same as another but much much more popular account helped. Banned for life.
Somewhat alarming, if accurate.
I'm active on twitter, check it every other day, follow @ElonMusk but I don't tweet. Perhaps I'm a unique case, or their assumptions are a bit off.
Some metrics they consider suspicious (from the article):
1) Accounts that didn't tweet recently
2) Accounts with low number of tweets
3) Accounts with a low number of followers
4) Accounts that didn't set up their own profile image
Lurking != bot, and these data-points would all hit high for lurkers. I'm somewhat suspicious of their results, especially given the results from this pew study suggesting the majority of twitter users don't tweet very much.
25% of Twitter Users Produce 97% of All Tweets: https://www.pewresearch.org/internet/2021/11/15/2-comparing-...
I treat it like RSS not Social Media
If it's 1-1 then 20% of tweets being from bots isn't great - but it could even be more than that.
I think Youtube's UI "encourages" this behaviour even more as it has a highly algorithmic homepage rather than a feed of people you follow.