Nearly 20% of active Twitter accounts likely to be fake or spam

Parag apparently lost his patience with superficial and misleading claims about Twitter spam (like this analysis) and posted about it today.

You can see it here (https://twitter.com/paraga/status/1526237578843672576).

Noteworthy highlights:

* Twitter estimates its <5% number from human analysis of multi-thousand user random samplings of mDAU

* Twitter allows that number to remain so high to avoid introducing friction like captcha into real users' experiences

* Twitter uses all sorts of internal private data in its analysis

* Parag says you cannot get a reliable indication of bot/not bot without this internal private data

Having just finished building a Twitter analysis tool, I agree with Parag that the Twitter API doesn't provide sufficient clarity to make decisions about spam. This article's analysis doesn't hold up - just because you can name several features you're going to use to generate a spam confidence score about an account does not mean that spam confidence score will have any precision.

wonderwonder · 4 years ago

Musk responded to Parag's thread with a poop emoji. Not going to lie if I worked at twitter I would be a little nervous about my career at this point. For several reasons including what appears to be the potential for a very hostile work culture in the near term. Musk is being openly antagonistic towards twitter leadership and denigrating the people that work there. Although it does seem more and more like the deal is not going to close. I don't think Musk wants it anymore and is seizing on anything to get out of it.

ryzvonusef · 4 years ago

> Musk is being openly antagonistic towards twitter leadership and denigrating the people that work there.

Seeing how badly twitter has been managed, (for a laugh, check out their "R&D" expenditure), how mush of a loss making enterprise it has been, and how it always at risk of a take over, is it that surprising?

If it has been Elliot Management (the previous rumored takeover threat for twitter) a group far less prone to public display than Musk... would things have been any less different? The only difference is that Musk is being open about what he has been doing, which I see a public good, frankly.

Elliot's track record shows it is far more vicious in layoffs of cuts.

----

https://fortune.com/2013/10/25/why-is-twitter-spending-so-mu...

https://www.rndtoday.co.uk/latest-news/is-twitters-rd-provid...

https://www.axios.com/2021/11/30/jack-dorsey-twitter-departu...

https://www.forbes.com/sites/kevindowd/2022/02/27/wall-stree...

diob · 4 years ago

I don't think he wanted it in the first place. This is all performative bs for him to make another quick buck / get pr.

Deleted Comment

thewebcount · 4 years ago

> * Twitter allows that number to remain so high to avoid introducing friction like captcha into real users' experiences

This doesn't pass the smell test in my opinion. Given that everyone who tries to create an account without a phone number has to go through the friction of getting their account locked immediately, they clearly don't care about this sort of friction. Not to mention the friction of just trying to view a tweet which has been discussed at length on HN before.

thecleaner · 4 years ago

The difference is between posting a tweet everytime and going through the friction once. That's what he's talking about. You are comparing different things.

throwmeawayy · 4 years ago

I have a Twitter account with > 10k followers that is a few years old. Created without a phone number, and of course, immediately locked. Somehow, managed get it unlocked and still going on a few years later without a phone number. Though, I'm always in fear of the ban hammer.

shapefrog · 4 years ago

> Given that everyone who tries to create an account without a phone number has to go through the friction

Yes

> Not to mention the friction of just trying to view a tweet which has been discussed at length on HN before.

Yes

Now imagine filling out 3 captchas every time you open the twitter app on your phone, 1 for ever time you tweet and 1 for every person you follow.

Most users of twitter, use either an app on their phone or the cookies in their browser suffer the friction of forgetting their password (likely password1 btw) because they have to log in so infrequently.

ASalazarMX · 4 years ago

Also, bot networks are so kind as to label themselves with the same hashtag, I don't understand why Twitter doesn't analyze trending topics to detect bots. There's always dormant accounts with thousands of followed/followers who start the propaganda.

ramblerman · 4 years ago

But we were talking about bots. You and Parag are shifting the dialogue to spam

> The hard challenge is that many accounts which look fake superficially – are actually real people. And some of the spam accounts which are actually the most dangerous – and cause the most harm to our users – can look totally legitimate on the surface.

He's not talking about detecting bots. I.e. fake and automated accounts. He's talking about twitter users/bots that cause what they perceive to be harmful content. Which is a very different thing, and was the whole point of Musk's intended involvement in the first place.

8note · 4 years ago

And? Only advertisers see the harm from bots - being charged for bot impressions.

Outside of that, bots cause the same problems that real people do, making twitter a place people don't want to spend time on/view ads on.

The advertisers need both bad groups removed

StanislavPetrov · 4 years ago

>* Twitter uses all sorts of internal private data in its analysis

>* Parag says you cannot get a reliable indication of bot/not bot without this internal private data

"I have secret information so trust me" is an excellent reason to reject an assertion every time, whether it is made by an individual, a corporation, or a government. It doesn't mean it isn't true, but it means that absolutely nobody should put any credence in the assertion at all.

seydor · 4 years ago

It makes sense. The bots are just loud and tend to all follow the most famous people, so their numbers look larger when people look there

bobsmooth · 4 years ago

I think Elon's response reflects what a lot of us are thinking about the "<5%" number.

alaricus · 4 years ago

> Parag says you cannot get a reliable indication of bot/not bot without this internal private data

That's convenient isn't it?

This report made headlines because it aligns with everyone's experience with Twitter: almost everyone on Twitter is either a bot or a corporate managed account.

ryan_lane · 4 years ago

That doesn't align with my experience. The vast majority of what I see in my feed is real people. I do know, however, that if I look at the replies to any viral tweet that a large percentage of them will be through fake accounts.

dang · 4 years ago

Related thread:

Twitter CEO: “Let’s talk about spam, with the benefit of data, context” - https://news.ycombinator.com/item?id=31399913 - May 2022 (13 comments)

c7DJTLrn · 4 years ago

>Parag says you cannot get a reliable indication of bot/not bot without this internal private data

Riiiight, and Craig Wright claims to have proof of being Satoshi Nakamoto but won't show anybody.

I don't know how good of a CEO Parag is, but he's not a very good bullshitter.

modeless · 4 years ago

"But you don't know which ones we count as mDAUs" and "accounts that look like spam are actually real" are not as good a defense as he thinks. The product is still affected by spam and fraud even if it's excluded from advertising metrics, and accounts that look like spam are not good for the product either even if they happen to be real people for whatever reason.

christkv · 4 years ago

If they believe the claim to be false they can open the data they used to calculate the 5% so it can be verified by a third party.

berkut · 4 years ago

So you want them to publicly list / make download-able any phone numbers of the people in that 5%, and their full names and email addresses?

ahahahahah · 4 years ago

Absolutely not. No social media company should take a set of user's private data and "open the data" (especially just because some blowhard is trying to find any reason to back out of a deal). Even without the "open" bit, they shouldn't be providing that data to a third party.

Oh, do I have notes on their methodology.

1) They talk about "active" accounts (meaning have tweeted in the last 9 weeks), and do a bunch of filtering against that. That seems like a huge bias - lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely. Frankly, until recently, my twitter account would have been one of the ones they would have discarded as inactive. This one thing alone makes me question all of the rest of their results.

2) By the same token, the rate or frequency with which a user sends tweets has no relation to whether a user is monetizable. If they're seeing ads, they're monetizable...lurkers are just as monetizable as high-volume posters.

infogulch · 4 years ago

You seem to be arguing against something that the article doesn't claim. The article isn't equating inactivity and fake/spam, but that: of the accounts that actively send tweets ~20% are fake/spam.

Sure that's a different question from what proportion of all users are fake/spam, but this is still a perfectly valid question to ask, and the fact that they're only considering active users is in the title so I really don't get your complaint.

If you want an analysis that attempts to answer a different question go find or write one that addresses the question you want answered...

The article clearly states (emphasis mine):

> This represents the largest set of accounts on Twitter we could acquire, but it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit Twitter’s definition of mDAUs (monetizable Daily Active Users).

From the linked Twitter earnings report:

> We define monetizable daily active usage or users (mDAU) as Twitter users who logged in or were otherwise authenticated and accessed Twitter on any given day through Twitter.com or Twitter applications that are able to show ads.

EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.

_moof · 4 years ago

> EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.

The fact that you had to do this proves the point. Nobody defines "active" the way they have here. The claim is nonsense.

nend · 4 years ago

The point is that their definition of active is inaccurate. You can be an active user and not tweet.

darkwater · 4 years ago

> of the accounts that are active ~20% are fake/spam.

Nope. ~20% of accounts that tweets are fake. A lurker (aka read-only) is by all meanings an active account.

Timshel · 4 years ago

When you are in the context of : - Twitter determine the active status of an account using login - People are wondering the % of active users as defined per the twitter metrics

But then use your own definition of active and write only a one liner on the difference with no reflection on the impact it might have and no warning on the fact you are answering a different question. Then my conclusion is you want people to make this mistake.

> EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.

Made me laugh because you had to add it and made more effort than the author of the article to prevent the confusion :D.

g-clef · 4 years ago

Interesting. This could be a bracketing error, because I read

> it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit

> Twitter’s definition of mDAUs (monetizable Daily Active Users)

As implying that they think accounts that haven't tweeted in the past 90 days don't fit Twitter's mDAU definition. Given the placement of the qualifying phrase, I think that's a reasonable parsing of the sentence, but I see your point that they could be trying to imply their set doesn't fit the definition. If so, that sentence is very badly constructed.

wonnage · 4 years ago

The article is just clickbait. The title is obviously clickbait (based on your edit you've realized that "active account" !== "accounts that tweet"). Then they try to define active account:

> “Spam or Fake Twitter accounts are those that do not regularly have a human being personally composing the content of their tweets, consuming the activity on their timeline, or engaging in the Twitter ecosystem.”

Ok, but "consuming the activity on their timeline" is essentially unknowable outside of Twitter, since you can't see what tweets people are viewing. It turns out they're trying to infer this through some other signals like follower count, etc. But you can imagine why that might be sketchy.

Then they constrain the analysis: > A more fair assessment of Mr. Musk’s Twitter following would only include accounts that have tweeted in the past 90 days

Let's be real, if you look at a list of Elon tweet replies, they might as well all be spam. Just search @elonmusk and sort by latest. Then compare that to the sorted tweet replies under an actual tweet. IDK how many millions of dollars and man-hours went into the AI that sorted this list, but it seems to just be putting the blue checks at the top and shrugging at the rest. I doubt this three man team is doing any better at spam detection.

spullara · 4 years ago

For manipulation / spam purposes I don't really care about accounts that don't actively post/like/retweet/follow. The mDAU isn't useful at all for determining if the activity on Twitter is done largely by bots.

rezistik · 4 years ago

I do wonder how "fake" is calculated. Is @tweetsfrommydog fake? It's a real person making tweets that are funny and provide value to the platform, but it's not a real person as an individual tweeting their personal thoughts, are corporate accounts or parody accounts fake?

stdbrouw · 4 years ago

It is valid criticism because the context of this article is that Elon Musk wants to know whether Twitter's own claims of ~5% fake/spam accounts is accurate. We do really want an analysis that investigates that precise question and not a related one.

sandworm101 · 4 years ago

Lurkers are also the most important people. They consume the content. They are the meat of the business, the ones that respond to advertising and political messaging. If I were twitter I would champion all the lurker accounts, all the eyeballs to which twitter serves content. Nobody ever faulted the Nielson ratings scheme for "lurker" viewers who only watched but didn't themselves create television shows.

JauntTrooper · 4 years ago

Definitely agree. I joined Twitter four months ago. I haven't tweeted yet, but I'm reading it daily on the app and occasionally liking tweets.

I've been so surprised at how effective the advertising has been on me. I've never experienced this level of engagement with online marketing. Ads for TV shows, movies, live shows, musicians and comedians have been particularly effective.

I've found myself following a lot of show writers I've never heard of, and I even signed up for some new streaming services because of it. Google and Facebook ads never felt like they impacted me, though I know how important and dominant they are to business marketers. I've never clicked on a banner ad and my eyes glaze over sponsored links. Twitter's level of engagement with their marketing content is new to me, and I'm impressed.

r00fus · 4 years ago

Furthermore, there are the non-tweeting active users (ones who like only) and the ones who RT a lot but don't create organic tweets.

Those are indeed incredibly valuable. Engaged audience = your real audience.

candiddevmike · 4 years ago

Unlike passive media consumption though, Twitter needs users to submit content (tweets, replies) to give lurkers something to do.

6gvONxR4sf7o · 4 years ago

And I thought it was common knowledge that lurkers always vastly outnumber people who post content on any platform. If lurkers outnumber posters by at least 3:1, then 20% goes to 5% and twitter’s “<5%” figure is accurate.

hadlock · 4 years ago

Lurkers are probably anywhere between 8-12:1. People actually posting stuff on the internet are in the vast, vast minority, creates a sort of echo chamber.

I am technically "logged into" twitter so I can click through and read the postage stamp-sized charts linked to through various articles and blogs, or watch a video about a riot in some far flung part of the planet. Once a year I tweet at airlines when they lose my luggage or whatever but otherwise don't tweet. Twitter isn't a good social media service, it just happens to be the image/video sharing platform of choice for journalists to promote themselves.

michaelt · 4 years ago

> That seems like a huge bias - lurkers exist

I created an account 5 years ago, followed one or two people, got bored and never logged in again.

Presumably their intention is to exclude abandoned accounts, like mine - is there any way they, viewing Twitter externally, could tell lurker accounts like yours and abandoned accounts like mine apart?

g-clef · 4 years ago

As a third party? Probably not. Which is why it's going to be very hard to disprove Twitter's assertion unless Twitter chooses to share their data.

That's part of why I find articles like this frustrating: I don't think they have the data to actually answer they question they're attempting to answer. Knowing that, what's the purpose of the article?

dado3212 · 4 years ago

They could maybe use like activity in addition to just tweets? Inherently though this system is going to be less accurate than the dataset that Twitter has access to. If a large chunk of users only engage in Twitter through DMs then an external organization isn’t going to have insight into that.

dan1234 · 4 years ago

I would imagine Twitter would have access to analytics that third parties don't have, which would allow them to pretty easily work out which accounts are logged in and used for browsing and which are actually abandoned.

onion2k · 4 years ago

is there any way they, viewing Twitter externally, could tell lurker accounts like yours and abandoned accounts like mine apart?

No. Which is why the only reasonable thing to say as an external party is "we don't know."

dylan604 · 4 years ago

If an account is in lurk mode, then its not a spammer so I'm okay with it being left out of that equation.

Where I might agree with you is a lurk mode account could become collateral damage in being considered fake. Lurkers don't retweet though. An account with a million followers isn't seen by everyone. Having a portion of that million like/retweet amplifies even further with their network now possibly seeing something from someone they are not following directly.

I'd be willing to accept that the number of lurkers that get lumped in with fake accounts when deciding the percentage of actual eyeballs on posts is not harmful. Those numbers are made up stats anyways. Like the old days of TV/Radio stations that covered large cities with millions of citizens. They would claim they have an audience in the millions even though a small fraction were actually watching/listening.

g-clef · 4 years ago

Except the question isn't about the pure number of spam/bot accounts, it's about the ratio of spam/bots to "authentic" users. If you leave out the lurkers, that ratio gets skewed to mistakenly inflate the bot count.

fullshark · 4 years ago

And yet I find 20% more believable then under 5%

Edit: I guess it's true that lurkers won't be bots, unless they are clicking on ads or trying to simulate engagement to help certain twitter accounts seem popular.

rightbyte · 4 years ago

All those fake followers you can buy could just aswell be "inactive" lurkers though.

daenz · 4 years ago

That means that 20% of the posts that I see, as a lurker, are generated by bots. The bots are having a huge influence on conversations, and that's important to know.

raydev · 4 years ago

> That means that 20% of the posts that I see, as a lurker, are generated by bots

I don't see how you can arrive at this conclusion. It depends on who you are following, with some additions by the algorithm (unless you use the chronological feed) and (speculating here) the algo pushes content from real humans.

matsemann · 4 years ago

No, since you choose who you follow, you're most likely filtering for interesting stuff. I'd wager that most of the spam bots are pretty obvious to spot, and makes up very little of a user's feed.

brewdad · 4 years ago

I don't know how many original tweets are made by bots but 20% of the replies to anyone with a 5 figure follower count seems to fall on the low side of what I would guess.

Saint_Genet · 4 years ago

Doesn't have an url in profile is sort of a weird metric. Note everyone is there to self-promote

dralley · 4 years ago

I have a twitter account, but I have never tweeted or retweeted anything.

wil421 · 4 years ago

Same with my account. I only login from time to time when I am forced to sign in to view something.

prpl · 4 years ago

I was offered $300 for my twitter account, I suppose partially on the basis that I haven’t tweeted much, but I use it daily to weekly though don’t tweet often, one tweet in last 2 years or so.

Deleted Comment

brightball · 4 years ago

Well, I've been actively trying to create a new Twitter account for a little under a month and Twitter thinks I'm a bot. I've made 1 tweet and followed 5 people.

Even paid for Twitter Blue...still thinks I'm not real. Support is unreachable.

My current plan is to wait til Elon completes the takeover and then build an entire site dedicated to getting Elon's attention to unlock my account...because that's the only way to contact somebody apparently.

themaninthedark · 4 years ago

Have you tried tweeting at them :P

Edit add: I find it horrible that we have companies that you can not contact, in fact they seem to be going out of their way to make hard to contact them.

Even things you pay money for, like airline tickets. They want you to email them, make the phone number hard to find. So you do, they don't respond and then you have to search and call them, wait an hour or more on hold. The agents are nice but the entire process is terrible.

Earlier I had to do that for a damaged luggage claim. Went through the automated phone assistant to get to damaged luggage claims and it gave the option to use text messages. So I give it a try, nope. They can't resolve the issue through text, has to be on the phone. So I had to call back, re-enter all the info through the automated system and then ignore it's pleadings to use the text system.

Timshel · 4 years ago

Probably forced to since they do not have access to login information. Especially since if you do not post but login you are certainly not a spammer ^^, could still be bot crawling.

But they probably should expand more on this and reflect on how much inaccuracy it adds. With a quick search you can find that less 50% of US users tweet five times a month (https://www.pewresearch.org/fact-tank/2022/03/16/5-facts-abo...). Or the study which, reported that the top 25% of user produce 97% of the content, the median user of the bottom 75% as posting 0 tweet a month (https://www.pewresearch.org/internet/2021/11/15/2-comparing-...). Those studies were done using survey I believe so should include only active users and no spam/bot.

So with random invalid maths, if you make the assumption that the 25% less active users might not even post every two month (exponential decrease of activity ?) then you need to add back a quarter of the 80% they found as active.

Not to say I believe the 5% number from twitter; and I was going to use the price for a thousands follower as an example, but seeing it appears to be at 30$ now (https://socialboss.org/buy-twitter-followers/ ?) when I remembered it at like 5$ then the twitter team might have done some good work ;).

karxxm · 4 years ago

But one can say that 20% of the content on the platform was distributed by bots. Meaning that all the Lurkers have to consider if they are really interested in content, that was pushed by some bot-farms. Technically, every user of this platform has to take a step back and evaluate, if anything they have seen is not pushed content by some bots.

20% is huge and I am curios if there will ever be some comparable "official" numbers to that.

onlyrealcuzzo · 4 years ago

No - you can say that 20% of the accounts actively posting are spam/bots.

It's possible they are posting MUCH more or less than 20% of the content.

If these are skewed toward the high end of producers - the 80/20 rule would say that as much as 80% of the content could come from them. Still - it's possible this content isn't interacted with much outside of other bots. You can't draw many conclusions from such a limited data point.

ElCapitanMarkla · 4 years ago

100% this. I haven’t tweeted in nearly 3 years, and even that was a retweet. But I’m still logged in and consuming crap from Twitter all the time

icecap12 · 4 years ago

Same, last tweet from me was in December and I check Twitter daily. My last self-composed tweet is well over 2 years ago.

loceng · 4 years ago

If it's the 80/20 rule then there's 4x of the other 80.58% that are lurking - which brings down % of fake/spam accounts.

mrtksn · 4 years ago

There was this suggestion to conduct a sting operation of displaying captcha to a sample of users to determine the % of the bots.

Probably picking the sample is still challenging but at least can somewhat tell if the accounts in the sample are genuine.

The method in this article is so flawed that Larry Ellison, founder of a famous law firm, would count as an inactive account since haven't tweeted since 2012[0] and that person apparently looks into investing in Twitter[1]. How can be investing a billion in Twitter when he doesn't use Twitter at all?

[0]https://twitter.com/larryellison?lang=en

[1]https://www.grid.news/story/politics/2022/05/16/larry-elliso...

HWR_14 · 4 years ago

They point out that's their definitions of active accounts is a flaw in their methodology (inside the article). However, I think it's fair to say that while TWTR has better internal insight into an "active user", it's the best approximation one can do from the outside.

I do wonder about, given perfect knowledge, how the bot accounts would shake up. What percentage produce content (presumably propaganda, automatic tweets using it as an RSS like announcement service, and spam) vs follow people (boost follow accounts, sell likes)?

hammock · 4 years ago

>They talk about "active" accounts (meaning have tweeted in the last 9 weeks), and do a bunch of filtering against that. That seems like a huge bias - lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely.

All true. However, do you really believe that a bot is more likely to be active than a real user? If so, fair play to you. If not, then we would expect inactive users to be bots in an even greater proportion than what we see among active users.

kokanee · 4 years ago

We can argue about what the article did and didn't imply, but what's interesting to me about the issue you raise is that among lurkers there is probably a much lower rate of fake/spam activity, since there are fewer reasons for a bot to log in and not tweet. Couple that with the fact that lurkers are generally the vast majority of users on any platform, and that alone could explain the discrepancy between Twitter's 5% number and SparkToro's 20%.

pseudo0 · 4 years ago

Services that sell followers and spammers "aging" accounts generally would look like lurkers. Twitter could probably get an accurate estimate with the amount of analytics they have for internal use only, but of course they might be incentivized to not try very hard.

PartiallyTyped · 4 years ago

Perhaps they are attempting to argue that the value comes from the users that generate content more so than the eyes attached to the account?

pessimizer · 4 years ago

> lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely.

I've spent many, many hours lurking on twitter, don't have an account at all, and mostly access it through nitter instances. Are they "biased" for not including me?

edit: should inactive users be counted as active users?

chaps · 4 years ago

Yeah, and I fully expect that these numbers went up recently with Twitter requiring login to view threads.

The fact that they add a .42% is a red flag in itself, especially when they admit in their own post that they agree that their analysis is deserving of critique. Very misleading stuff.

Their analysis using purchased bots seems a bit more reasonable.

Retric · 4 years ago

“Passive” accounts may actually be more likely to be bots as many services sell fake followers. It’s just harder to detect with public information rather than their IP addressees etc.

Similarly I don’t think there is any way to separate active vs abandoned passive accounts as a 3rd party.

soheil · 4 years ago

> They talk about "active" accounts (meaning have tweeted in the last 9 weeks),

This is not their definition, that's what Twitter considers an active account in their revenue reports.

> has no relation

It has some relation, no? I wouldn't be surprised if there is a strong correlation between how frequently a user sends tweets and how monetizable that user is.

avs733 · 4 years ago

their TL;DAbstract refers to this as a 'conservative' methodology, that is 'rigorous', and 'likely undercounts.

Their definition:

They note the following to differentiate fake and spam: > Many “fake” accounts under this definition are neither nefarious nor problematic. ... By contrast, most “spam” accounts are an unwanted nuisance.

Some general data analytics notes from their post:

* Then lump together fake and spam in their analysis - and this really matters! somewhere like NYT is both 'fake' meaning it isn't a real person and A HIGHLY VALUABLE ACCOUNT for twitter to have.

* They use a sample of 44,058 accounts (of ~1.047B)

* They look at a number of classifying variables (17), spam accounts met 10+ of those 17 criteria. They don't list all 17.

* The criteria were developed from a "machine learning process" that is undescribed, and was developed from a sample of 35,000 'known' fake twitter followers bought from 3 vendors and 50,000 claimed non-spam accounts. They appear (imply?) to have used 50% training 50% real data but dont't specify explicitly.

* They say their model is about 65% accurate, and unlikely to produce false positives ("almost never includes false positives") - however they don't list any specificity, sensitivity, etc. that would be useful to evaluating that claim.

* The analysis does no statistical tests, no confidence intervals, minimal information about how the model was tested or validated.

* Critically: they note, but do not describe or quantify, that a lot of the criteria are highly correlated

* then later in the article they suddenly seem to switch to a 10 point scale for quality away from their 17 point scale? with a threshold of 3 or below as low quality?

* My personal twitter account meets most of the metrics where they have listed a quantifiable threshold. And their fake followers tool lists it as pretty f'ing suspicious - i.e., low quality.

I'm not saying there wrong but I am saying good luck getting this from a blog post to any sort of respectable science publication. As they note at the end, they aren't even calculating the same metric - twitter uses monetizable daily active users - remember NYtimes? Absolutely a monetizable account - even if it isn't a real person.

anyone who thinks this is proof of Elon's 4D chess based on this article is, to me, frankly delusional.

soneca · 4 years ago

Turning on my cynicism switch on a bit. The author is a very good content marketer. A hot topic in our corner of the world — which is author’s target audience — is Elon Musk buying Twitter. Musk tweeted that the percentage of bots is the main issue of the deal. He disputed Twitter’s number of 5%.

I believe the author writing prompt was just: a headline about fake Twitter accounts showing a number significantly higher that 5%. That’s it. Whatever the methodology, that was the author’s goal.

The article achieved this goal. Otherwise is completely irrelevant. Even for the person who wrote it.

Bubble_Pop_22 · 4 years ago

> The article achieved this goal. Otherwise is completely irrelevant. Even for the person who wrote it.

It's almost like Inception isn't it? A PR stunt within a PR stunt within a PR stunt.

grumple · 4 years ago

My account was active until recently (deleted when Twitter accepted Musk's offer, I don't need to be a participant in a right wing cesspool). I have 0 tweets. I don't like things, because I don't want my name attached to someone else.

Deleted Comment

We are not disputing Twitter’s claim. There’s no way to know what criteria Twitter uses to identify a “monetizable daily active user” (mDAU) nor how they classify “fake/spam” accounts. We believe our methodology (detailed above) to be the best system available to public researchers. But, internally, Twitter likely has unknowable processes that we cannot replicate with only their public data."