Readit News logoReadit News
542458 · 4 years ago
I don't think this post means as much as people are acting like it does.

The indicators of being a spambot they have in their post seem VERY iffy to me. "Not tweeting in the past 120 days", "Location set to a non resolving location", "Small number of followers", "default profile image", "No URL in bio or non-resolving URL in bio", "Not on many lists", "tweets in a different language than the person they're following" - Those all seem like extremely weak signals to me. My profile matches 6 of those, and I'm a human. I would like to see them hand-verify a subset of their results and see if their algorithm matches reality.

Also note that they define "active" differently than Twitter. They define "active" as having tweeted recently. Twitter gives spambot numbers as a percent of monetizable daily active users. I wonder if Twitter's given bot numbers are low because bots don't typically lurk or load ads. I can believe that the total bot count as a percentage of users or as a percentage of recently-tweeting-users is higher than 5%, but that only 5% of daily visitors seeing ads are bots.

mintplant · 4 years ago
> Location set to a non resolving location

This is a terrible metric. Real people use the location field for all sorts of non-location purposes, as well as more freeform descriptions of their location that wouldn't resolve mechanically.

remram · 4 years ago
And I don't see a reason why a bot/fake account wouldn't set a real random location.
alasdair_ · 4 years ago
> My profile matches 5 of those, and I'm a human.

Just out of interest, imagine you were in a hot desert. There is a tortoise in front of you. You reach down and you flip the tortoise over on its back. The turtoise lays on its back, its belly baking in the hot sun, beating its legs, trying to turn itself over but it can't, not without for your help. But you're not helping. Why?

kurthr · 4 years ago
... I'll tell you about my mother.
8note · 4 years ago
The tortoise is fine. You think it's never been on its back before? There's a thing called resonance that lets it's get back on its feet.

The real question is how strong am I to flip a hundred fifty pound tortoise without injury?

StanislavPetrov · 4 years ago
Because that tortoise is a politician.
kingcharles · 4 years ago
Does it please you to think that I am not helping. Why?
vidarh · 4 years ago
My main account until recently would have qualified as a bot by those criteria until recently.

Meanwhile, several accounts of mine that were predominantly run with bots would have passed as human with ease when they were active.

the_only_law · 4 years ago
My main account literally has “fake account” in the description as sort of a joke, because I really use Twitter exclusively for browsing, and maybe liking/RT’ing stuff.
Smoosh · 4 years ago
> ... until recently ... until recently.

Are you sure you aren't a bot?

duxup · 4 years ago
I guess I'm a spambot too.

But hey maybe with these kind of analysis, and rando computer generated / un-appealable bans in the the future the "real accounts" will just mean "very elaborate bot".

bryanrasmussen · 4 years ago
I think I match all of them and I too am a trustworthy human, fellow human.
dylan604 · 4 years ago
Exactly what a bot would say.
orwin · 4 years ago
I think i match all of this. Also, a simple user account with an URL in bio is definitely more sketchy in my eyes than one without.
guerrilla · 4 years ago
That seems like most accounts. I know many real people with exactly those properties.
moralestapia · 4 years ago
Also,

Bots tweet and they usually have some sort of generic profile picture, so their methodology wouldn't even account for real bots. Bad.

Regardless, I do think that there are a lot of bots in TW and they are definitely more than 5% of total users.

Deleted Comment

UncleMeat · 4 years ago
Yup. Imagine the disdain there'd be on this forum if Twitter used these signals for policy enforcement and somebody was hit by a FP.

"WTF, my account was closed because I didn't tweet in four months."

mongol · 4 years ago
Mine as well. I deliberately have not set a profile image, and have not attracted many followers. I probably should not bother with Twitter but I am around and am a real human.
status200 · 4 years ago
They did mention that no one feature was a clear indication of being a spam account, but rather a combination of them.
brandall10 · 4 years ago
I match these as well, though I'm not an active user.
nathias · 4 years ago
The opposite is also true, it misses a lot of real bots.
jeffbee · 4 years ago
Seriously. Having a URL in your bio is suspicious IMHO.
j4pe · 4 years ago
Parag apparently lost his patience with superficial and misleading claims about Twitter spam (like this analysis) and posted about it today.

You can see it here (https://twitter.com/paraga/status/1526237578843672576).

Noteworthy highlights:

* Twitter estimates its <5% number from human analysis of multi-thousand user random samplings of mDAU

* Twitter allows that number to remain so high to avoid introducing friction like captcha into real users' experiences

* Twitter uses all sorts of internal private data in its analysis

* Parag says you cannot get a reliable indication of bot/not bot without this internal private data

Having just finished building a Twitter analysis tool, I agree with Parag that the Twitter API doesn't provide sufficient clarity to make decisions about spam. This article's analysis doesn't hold up - just because you can name several features you're going to use to generate a spam confidence score about an account does not mean that spam confidence score will have any precision.

wonderwonder · 4 years ago
Musk responded to Parag's thread with a poop emoji. Not going to lie if I worked at twitter I would be a little nervous about my career at this point. For several reasons including what appears to be the potential for a very hostile work culture in the near term. Musk is being openly antagonistic towards twitter leadership and denigrating the people that work there. Although it does seem more and more like the deal is not going to close. I don't think Musk wants it anymore and is seizing on anything to get out of it.
ryzvonusef · 4 years ago
> Musk is being openly antagonistic towards twitter leadership and denigrating the people that work there.

Seeing how badly twitter has been managed, (for a laugh, check out their "R&D" expenditure), how mush of a loss making enterprise it has been, and how it always at risk of a take over, is it that surprising?

If it has been Elliot Management (the previous rumored takeover threat for twitter) a group far less prone to public display than Musk... would things have been any less different? The only difference is that Musk is being open about what he has been doing, which I see a public good, frankly.

Elliot's track record shows it is far more vicious in layoffs of cuts.

----

https://fortune.com/2013/10/25/why-is-twitter-spending-so-mu...

https://www.rndtoday.co.uk/latest-news/is-twitters-rd-provid...

https://www.axios.com/2021/11/30/jack-dorsey-twitter-departu...

https://www.forbes.com/sites/kevindowd/2022/02/27/wall-stree...

diob · 4 years ago
I don't think he wanted it in the first place. This is all performative bs for him to make another quick buck / get pr.

Deleted Comment

thewebcount · 4 years ago
> * Twitter allows that number to remain so high to avoid introducing friction like captcha into real users' experiences

This doesn't pass the smell test in my opinion. Given that everyone who tries to create an account without a phone number has to go through the friction of getting their account locked immediately, they clearly don't care about this sort of friction. Not to mention the friction of just trying to view a tweet which has been discussed at length on HN before.

thecleaner · 4 years ago
The difference is between posting a tweet everytime and going through the friction once. That's what he's talking about. You are comparing different things.
throwmeawayy · 4 years ago
I have a Twitter account with > 10k followers that is a few years old. Created without a phone number, and of course, immediately locked. Somehow, managed get it unlocked and still going on a few years later without a phone number. Though, I'm always in fear of the ban hammer.
shapefrog · 4 years ago
> Given that everyone who tries to create an account without a phone number has to go through the friction

Yes

> Not to mention the friction of just trying to view a tweet which has been discussed at length on HN before.

Yes

Now imagine filling out 3 captchas every time you open the twitter app on your phone, 1 for ever time you tweet and 1 for every person you follow.

Most users of twitter, use either an app on their phone or the cookies in their browser suffer the friction of forgetting their password (likely password1 btw) because they have to log in so infrequently.

ASalazarMX · 4 years ago
Also, bot networks are so kind as to label themselves with the same hashtag, I don't understand why Twitter doesn't analyze trending topics to detect bots. There's always dormant accounts with thousands of followed/followers who start the propaganda.
ramblerman · 4 years ago
But we were talking about bots. You and Parag are shifting the dialogue to spam

> The hard challenge is that many accounts which look fake superficially – are actually real people. And some of the spam accounts which are actually the most dangerous – and cause the most harm to our users – can look totally legitimate on the surface.

He's not talking about detecting bots. I.e. fake and automated accounts. He's talking about twitter users/bots that cause what they perceive to be harmful content. Which is a very different thing, and was the whole point of Musk's intended involvement in the first place.

8note · 4 years ago
And? Only advertisers see the harm from bots - being charged for bot impressions.

Outside of that, bots cause the same problems that real people do, making twitter a place people don't want to spend time on/view ads on.

The advertisers need both bad groups removed

StanislavPetrov · 4 years ago
>* Twitter uses all sorts of internal private data in its analysis

>* Parag says you cannot get a reliable indication of bot/not bot without this internal private data

"I have secret information so trust me" is an excellent reason to reject an assertion every time, whether it is made by an individual, a corporation, or a government. It doesn't mean it isn't true, but it means that absolutely nobody should put any credence in the assertion at all.

seydor · 4 years ago
It makes sense. The bots are just loud and tend to all follow the most famous people, so their numbers look larger when people look there
bobsmooth · 4 years ago
I think Elon's response reflects what a lot of us are thinking about the "<5%" number.
alaricus · 4 years ago
> Parag says you cannot get a reliable indication of bot/not bot without this internal private data

That's convenient isn't it?

This report made headlines because it aligns with everyone's experience with Twitter: almost everyone on Twitter is either a bot or a corporate managed account.

ryan_lane · 4 years ago
That doesn't align with my experience. The vast majority of what I see in my feed is real people. I do know, however, that if I look at the replies to any viral tweet that a large percentage of them will be through fake accounts.
dang · 4 years ago
Related thread:

Twitter CEO: “Let’s talk about spam, with the benefit of data, context” - https://news.ycombinator.com/item?id=31399913 - May 2022 (13 comments)

c7DJTLrn · 4 years ago
>Parag says you cannot get a reliable indication of bot/not bot without this internal private data

Riiiight, and Craig Wright claims to have proof of being Satoshi Nakamoto but won't show anybody.

I don't know how good of a CEO Parag is, but he's not a very good bullshitter.

modeless · 4 years ago
"But you don't know which ones we count as mDAUs" and "accounts that look like spam are actually real" are not as good a defense as he thinks. The product is still affected by spam and fraud even if it's excluded from advertising metrics, and accounts that look like spam are not good for the product either even if they happen to be real people for whatever reason.
christkv · 4 years ago
If they believe the claim to be false they can open the data they used to calculate the 5% so it can be verified by a third party.
berkut · 4 years ago
So you want them to publicly list / make download-able any phone numbers of the people in that 5%, and their full names and email addresses?
ahahahahah · 4 years ago
Absolutely not. No social media company should take a set of user's private data and "open the data" (especially just because some blowhard is trying to find any reason to back out of a deal). Even without the "open" bit, they shouldn't be providing that data to a third party.
morsch · 4 years ago
"We undercount active users whose accounts are protected, accounts that view tweets but don’t send any, and accounts that log in and engage in other ways beyond tweeting (like favoriting or adding profiles to lists)."

My markup. If I understand correctly, not having a public tweet is a marker for being a spam account. Isn't that kind of a lot of people? I know from other forums that there's a large ratio between lurkers and active posters.

Given that you need an account to customise your timeline, and, these days, pretty much for just reading a tweet, there may be loads of real reader accounts that never post and never bother customizing their profile.

hotpotamus · 4 years ago
Yeah, I never knew I was a bot. I thought I just had a passing interest in a few people on Twitter and not much of interest to share of my own.
tinsmith · 4 years ago
I remember when I found out I was a bot. It was during a harrowing judgement conflict with an image-based captcha about traffic lights, and caused a complete shutdown of my higher order processes. To this day, I still don't know the threshold of how much traffic light actually needs to be in a square to be considered for selection, so I just stopped logging in to everything.
dylan604 · 4 years ago
That's because you took the blue pill.
kurthr · 4 years ago
Yep, me too.
vorpalhex · 4 years ago
If you have not tweeted.. you are excluded from this analysis.

A spam account is gonna spam right? But some real users of twitter may only tweet once a year. This study just doesn't include you. It isn't saying you are spam, just not including you in the count.

nickff · 4 years ago
There are probably many 'fake' or bot accounts that don't tweet; they'd be used to prop up the 'likes' or views of other accounts, either customers paying for exposure, or other bots.
mintplant · 4 years ago
The population of accounts that tweet is going to contain a larger proportion of spammers than the population of active Twitter accounts. Presenting the former as the latter is disingenuous.
ahahahahah · 4 years ago
Yeah, but it's just clickbait then. An honest title would be "We sampled users that act like bots on Twitter and found out that lots of them were bots".
zachrip · 4 years ago
Twitter is as hostile to unauthenticated users nowadays as reddit is. Really sad.
curiousllama · 4 years ago
Yea, I have an account that is literally just for reading 3 peoples' tweets (they're former generals that frequently comment on Russia's invasion of Ukraine). I have their timelines bookmarked, and just read the threads they post.

I'd almost certainly be marked as a bot.

londons_explore · 4 years ago
I don't think accounts like yours are marked as bots... They're just not counted.
burner556 · 4 years ago
Can you share the accounts please? Sounds interesting.
matthewdgreen · 4 years ago
There's even a name for it: https://en.m.wikipedia.org/wiki/1%25_rule
avs733 · 4 years ago
so much of this thread is 'proof' that supports whatever elon musk is trying to do, but seems to not realize this is a totally independent actor making a dubious set of assumptions to come to a number. This analysis might be interesting because it is timely but it has no bearing on the musk twitter acquisition anymore than me running such an analysis does.
g-clef · 4 years ago
Oh, do I have notes on their methodology.

1) They talk about "active" accounts (meaning have tweeted in the last 9 weeks), and do a bunch of filtering against that. That seems like a huge bias - lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely. Frankly, until recently, my twitter account would have been one of the ones they would have discarded as inactive. This one thing alone makes me question all of the rest of their results.

2) By the same token, the rate or frequency with which a user sends tweets has no relation to whether a user is monetizable. If they're seeing ads, they're monetizable...lurkers are just as monetizable as high-volume posters.

infogulch · 4 years ago
You seem to be arguing against something that the article doesn't claim. The article isn't equating inactivity and fake/spam, but that: of the accounts that actively send tweets ~20% are fake/spam.

Sure that's a different question from what proportion of all users are fake/spam, but this is still a perfectly valid question to ask, and the fact that they're only considering active users is in the title so I really don't get your complaint.

If you want an analysis that attempts to answer a different question go find or write one that addresses the question you want answered...

The article clearly states (emphasis mine):

> This represents the largest set of accounts on Twitter we could acquire, but it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit Twitter’s definition of mDAUs (monetizable Daily Active Users).

From the linked Twitter earnings report:

> We define monetizable daily active usage or users (mDAU) as Twitter users who logged in or were otherwise authenticated and accessed Twitter on any given day through Twitter.com or Twitter applications that are able to show ads.

EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.

_moof · 4 years ago
> EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.

The fact that you had to do this proves the point. Nobody defines "active" the way they have here. The claim is nonsense.

nend · 4 years ago
The point is that their definition of active is inaccurate. You can be an active user and not tweet.
darkwater · 4 years ago
> of the accounts that are active ~20% are fake/spam.

Nope. ~20% of accounts that tweets are fake. A lurker (aka read-only) is by all meanings an active account.

Timshel · 4 years ago
When you are in the context of : - Twitter determine the active status of an account using login - People are wondering the % of active users as defined per the twitter metrics

But then use your own definition of active and write only a one liner on the difference with no reflection on the impact it might have and no warning on the fact you are answering a different question. Then my conclusion is you want people to make this mistake.

> EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.

Made me laugh because you had to add it and made more effort than the author of the article to prevent the confusion :D.

g-clef · 4 years ago
Interesting. This could be a bracketing error, because I read

> it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit

> Twitter’s definition of mDAUs (monetizable Daily Active Users)

As implying that they think accounts that haven't tweeted in the past 90 days don't fit Twitter's mDAU definition. Given the placement of the qualifying phrase, I think that's a reasonable parsing of the sentence, but I see your point that they could be trying to imply their set doesn't fit the definition. If so, that sentence is very badly constructed.

wonnage · 4 years ago
The article is just clickbait. The title is obviously clickbait (based on your edit you've realized that "active account" !== "accounts that tweet"). Then they try to define active account:

> “Spam or Fake Twitter accounts are those that do not regularly have a human being personally composing the content of their tweets, consuming the activity on their timeline, or engaging in the Twitter ecosystem.”

Ok, but "consuming the activity on their timeline" is essentially unknowable outside of Twitter, since you can't see what tweets people are viewing. It turns out they're trying to infer this through some other signals like follower count, etc. But you can imagine why that might be sketchy.

Then they constrain the analysis: > A more fair assessment of Mr. Musk’s Twitter following would only include accounts that have tweeted in the past 90 days

Let's be real, if you look at a list of Elon tweet replies, they might as well all be spam. Just search @elonmusk and sort by latest. Then compare that to the sorted tweet replies under an actual tweet. IDK how many millions of dollars and man-hours went into the AI that sorted this list, but it seems to just be putting the blue checks at the top and shrugging at the rest. I doubt this three man team is doing any better at spam detection.

spullara · 4 years ago
For manipulation / spam purposes I don't really care about accounts that don't actively post/like/retweet/follow. The mDAU isn't useful at all for determining if the activity on Twitter is done largely by bots.
rezistik · 4 years ago
I do wonder how "fake" is calculated. Is @tweetsfrommydog fake? It's a real person making tweets that are funny and provide value to the platform, but it's not a real person as an individual tweeting their personal thoughts, are corporate accounts or parody accounts fake?
stdbrouw · 4 years ago
It is valid criticism because the context of this article is that Elon Musk wants to know whether Twitter's own claims of ~5% fake/spam accounts is accurate. We do really want an analysis that investigates that precise question and not a related one.
sandworm101 · 4 years ago
Lurkers are also the most important people. They consume the content. They are the meat of the business, the ones that respond to advertising and political messaging. If I were twitter I would champion all the lurker accounts, all the eyeballs to which twitter serves content. Nobody ever faulted the Nielson ratings scheme for "lurker" viewers who only watched but didn't themselves create television shows.
JauntTrooper · 4 years ago
Definitely agree. I joined Twitter four months ago. I haven't tweeted yet, but I'm reading it daily on the app and occasionally liking tweets.

I've been so surprised at how effective the advertising has been on me. I've never experienced this level of engagement with online marketing. Ads for TV shows, movies, live shows, musicians and comedians have been particularly effective.

I've found myself following a lot of show writers I've never heard of, and I even signed up for some new streaming services because of it. Google and Facebook ads never felt like they impacted me, though I know how important and dominant they are to business marketers. I've never clicked on a banner ad and my eyes glaze over sponsored links. Twitter's level of engagement with their marketing content is new to me, and I'm impressed.

r00fus · 4 years ago
Furthermore, there are the non-tweeting active users (ones who like only) and the ones who RT a lot but don't create organic tweets.

Those are indeed incredibly valuable. Engaged audience = your real audience.

candiddevmike · 4 years ago
Unlike passive media consumption though, Twitter needs users to submit content (tweets, replies) to give lurkers something to do.
6gvONxR4sf7o · 4 years ago
And I thought it was common knowledge that lurkers always vastly outnumber people who post content on any platform. If lurkers outnumber posters by at least 3:1, then 20% goes to 5% and twitter’s “<5%” figure is accurate.
hadlock · 4 years ago
Lurkers are probably anywhere between 8-12:1. People actually posting stuff on the internet are in the vast, vast minority, creates a sort of echo chamber.

I am technically "logged into" twitter so I can click through and read the postage stamp-sized charts linked to through various articles and blogs, or watch a video about a riot in some far flung part of the planet. Once a year I tweet at airlines when they lose my luggage or whatever but otherwise don't tweet. Twitter isn't a good social media service, it just happens to be the image/video sharing platform of choice for journalists to promote themselves.

michaelt · 4 years ago
> That seems like a huge bias - lurkers exist

I created an account 5 years ago, followed one or two people, got bored and never logged in again.

Presumably their intention is to exclude abandoned accounts, like mine - is there any way they, viewing Twitter externally, could tell lurker accounts like yours and abandoned accounts like mine apart?

g-clef · 4 years ago
As a third party? Probably not. Which is why it's going to be very hard to disprove Twitter's assertion unless Twitter chooses to share their data.

That's part of why I find articles like this frustrating: I don't think they have the data to actually answer they question they're attempting to answer. Knowing that, what's the purpose of the article?

dado3212 · 4 years ago
They could maybe use like activity in addition to just tweets? Inherently though this system is going to be less accurate than the dataset that Twitter has access to. If a large chunk of users only engage in Twitter through DMs then an external organization isn’t going to have insight into that.
dan1234 · 4 years ago
I would imagine Twitter would have access to analytics that third parties don't have, which would allow them to pretty easily work out which accounts are logged in and used for browsing and which are actually abandoned.
onion2k · 4 years ago
is there any way they, viewing Twitter externally, could tell lurker accounts like yours and abandoned accounts like mine apart?

No. Which is why the only reasonable thing to say as an external party is "we don't know."

dylan604 · 4 years ago
If an account is in lurk mode, then its not a spammer so I'm okay with it being left out of that equation.

Where I might agree with you is a lurk mode account could become collateral damage in being considered fake. Lurkers don't retweet though. An account with a million followers isn't seen by everyone. Having a portion of that million like/retweet amplifies even further with their network now possibly seeing something from someone they are not following directly.

I'd be willing to accept that the number of lurkers that get lumped in with fake accounts when deciding the percentage of actual eyeballs on posts is not harmful. Those numbers are made up stats anyways. Like the old days of TV/Radio stations that covered large cities with millions of citizens. They would claim they have an audience in the millions even though a small fraction were actually watching/listening.

g-clef · 4 years ago
Except the question isn't about the pure number of spam/bot accounts, it's about the ratio of spam/bots to "authentic" users. If you leave out the lurkers, that ratio gets skewed to mistakenly inflate the bot count.
fullshark · 4 years ago
And yet I find 20% more believable then under 5%

Edit: I guess it's true that lurkers won't be bots, unless they are clicking on ads or trying to simulate engagement to help certain twitter accounts seem popular.

rightbyte · 4 years ago
All those fake followers you can buy could just aswell be "inactive" lurkers though.
daenz · 4 years ago
That means that 20% of the posts that I see, as a lurker, are generated by bots. The bots are having a huge influence on conversations, and that's important to know.
raydev · 4 years ago
> That means that 20% of the posts that I see, as a lurker, are generated by bots

I don't see how you can arrive at this conclusion. It depends on who you are following, with some additions by the algorithm (unless you use the chronological feed) and (speculating here) the algo pushes content from real humans.

matsemann · 4 years ago
No, since you choose who you follow, you're most likely filtering for interesting stuff. I'd wager that most of the spam bots are pretty obvious to spot, and makes up very little of a user's feed.
brewdad · 4 years ago
I don't know how many original tweets are made by bots but 20% of the replies to anyone with a 5 figure follower count seems to fall on the low side of what I would guess.
Saint_Genet · 4 years ago
Doesn't have an url in profile is sort of a weird metric. Note everyone is there to self-promote
dralley · 4 years ago
I have a twitter account, but I have never tweeted or retweeted anything.
wil421 · 4 years ago
Same with my account. I only login from time to time when I am forced to sign in to view something.
prpl · 4 years ago
I was offered $300 for my twitter account, I suppose partially on the basis that I haven’t tweeted much, but I use it daily to weekly though don’t tweet often, one tweet in last 2 years or so.

Deleted Comment

brightball · 4 years ago
Well, I've been actively trying to create a new Twitter account for a little under a month and Twitter thinks I'm a bot. I've made 1 tweet and followed 5 people.

Even paid for Twitter Blue...still thinks I'm not real. Support is unreachable.

My current plan is to wait til Elon completes the takeover and then build an entire site dedicated to getting Elon's attention to unlock my account...because that's the only way to contact somebody apparently.

themaninthedark · 4 years ago
Have you tried tweeting at them :P

Edit add: I find it horrible that we have companies that you can not contact, in fact they seem to be going out of their way to make hard to contact them.

Even things you pay money for, like airline tickets. They want you to email them, make the phone number hard to find. So you do, they don't respond and then you have to search and call them, wait an hour or more on hold. The agents are nice but the entire process is terrible.

Earlier I had to do that for a damaged luggage claim. Went through the automated phone assistant to get to damaged luggage claims and it gave the option to use text messages. So I give it a try, nope. They can't resolve the issue through text, has to be on the phone. So I had to call back, re-enter all the info through the automated system and then ignore it's pleadings to use the text system.

Timshel · 4 years ago
Probably forced to since they do not have access to login information. Especially since if you do not post but login you are certainly not a spammer ^^, could still be bot crawling.

But they probably should expand more on this and reflect on how much inaccuracy it adds. With a quick search you can find that less 50% of US users tweet five times a month (https://www.pewresearch.org/fact-tank/2022/03/16/5-facts-abo...). Or the study which, reported that the top 25% of user produce 97% of the content, the median user of the bottom 75% as posting 0 tweet a month (https://www.pewresearch.org/internet/2021/11/15/2-comparing-...). Those studies were done using survey I believe so should include only active users and no spam/bot.

So with random invalid maths, if you make the assumption that the 25% less active users might not even post every two month (exponential decrease of activity ?) then you need to add back a quarter of the 80% they found as active.

Not to say I believe the 5% number from twitter; and I was going to use the price for a thousands follower as an example, but seeing it appears to be at 30$ now (https://socialboss.org/buy-twitter-followers/ ?) when I remembered it at like 5$ then the twitter team might have done some good work ;).

karxxm · 4 years ago
But one can say that 20% of the content on the platform was distributed by bots. Meaning that all the Lurkers have to consider if they are really interested in content, that was pushed by some bot-farms. Technically, every user of this platform has to take a step back and evaluate, if anything they have seen is not pushed content by some bots.

20% is huge and I am curios if there will ever be some comparable "official" numbers to that.

onlyrealcuzzo · 4 years ago
No - you can say that 20% of the accounts actively posting are spam/bots.

It's possible they are posting MUCH more or less than 20% of the content.

If these are skewed toward the high end of producers - the 80/20 rule would say that as much as 80% of the content could come from them. Still - it's possible this content isn't interacted with much outside of other bots. You can't draw many conclusions from such a limited data point.

ElCapitanMarkla · 4 years ago
100% this. I haven’t tweeted in nearly 3 years, and even that was a retweet. But I’m still logged in and consuming crap from Twitter all the time
icecap12 · 4 years ago
Same, last tweet from me was in December and I check Twitter daily. My last self-composed tweet is well over 2 years ago.
loceng · 4 years ago
If it's the 80/20 rule then there's 4x of the other 80.58% that are lurking - which brings down % of fake/spam accounts.
mrtksn · 4 years ago
There was this suggestion to conduct a sting operation of displaying captcha to a sample of users to determine the % of the bots.

Probably picking the sample is still challenging but at least can somewhat tell if the accounts in the sample are genuine.

The method in this article is so flawed that Larry Ellison, founder of a famous law firm, would count as an inactive account since haven't tweeted since 2012[0] and that person apparently looks into investing in Twitter[1]. How can be investing a billion in Twitter when he doesn't use Twitter at all?

[0]https://twitter.com/larryellison?lang=en

[1]https://www.grid.news/story/politics/2022/05/16/larry-elliso...

HWR_14 · 4 years ago
They point out that's their definitions of active accounts is a flaw in their methodology (inside the article). However, I think it's fair to say that while TWTR has better internal insight into an "active user", it's the best approximation one can do from the outside.

I do wonder about, given perfect knowledge, how the bot accounts would shake up. What percentage produce content (presumably propaganda, automatic tweets using it as an RSS like announcement service, and spam) vs follow people (boost follow accounts, sell likes)?

hammock · 4 years ago
>They talk about "active" accounts (meaning have tweeted in the last 9 weeks), and do a bunch of filtering against that. That seems like a huge bias - lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely.

All true. However, do you really believe that a bot is more likely to be active than a real user? If so, fair play to you. If not, then we would expect inactive users to be bots in an even greater proportion than what we see among active users.

kokanee · 4 years ago
We can argue about what the article did and didn't imply, but what's interesting to me about the issue you raise is that among lurkers there is probably a much lower rate of fake/spam activity, since there are fewer reasons for a bot to log in and not tweet. Couple that with the fact that lurkers are generally the vast majority of users on any platform, and that alone could explain the discrepancy between Twitter's 5% number and SparkToro's 20%.
pseudo0 · 4 years ago
Services that sell followers and spammers "aging" accounts generally would look like lurkers. Twitter could probably get an accurate estimate with the amount of analytics they have for internal use only, but of course they might be incentivized to not try very hard.
PartiallyTyped · 4 years ago
Perhaps they are attempting to argue that the value comes from the users that generate content more so than the eyes attached to the account?
pessimizer · 4 years ago
> lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely.

I've spent many, many hours lurking on twitter, don't have an account at all, and mostly access it through nitter instances. Are they "biased" for not including me?

edit: should inactive users be counted as active users?

chaps · 4 years ago
Yeah, and I fully expect that these numbers went up recently with Twitter requiring login to view threads.

The fact that they add a .42% is a red flag in itself, especially when they admit in their own post that they agree that their analysis is deserving of critique. Very misleading stuff.

Their analysis using purchased bots seems a bit more reasonable.

Retric · 4 years ago
“Passive” accounts may actually be more likely to be bots as many services sell fake followers. It’s just harder to detect with public information rather than their IP addressees etc.

Similarly I don’t think there is any way to separate active vs abandoned passive accounts as a 3rd party.

soheil · 4 years ago
> They talk about "active" accounts (meaning have tweeted in the last 9 weeks),

This is not their definition, that's what Twitter considers an active account in their revenue reports.

> has no relation

It has some relation, no? I wouldn't be surprised if there is a strong correlation between how frequently a user sends tweets and how monetizable that user is.

avs733 · 4 years ago
their TL;DAbstract refers to this as a 'conservative' methodology, that is 'rigorous', and 'likely undercounts.

Their definition:

> “Spam or Fake Twitter accounts are those that do not regularly have a human being personally composing the content of their tweets, consuming the activity on their timeline, or engaging in the Twitter ecosystem.”

They note the following to differentiate fake and spam: > Many “fake” accounts under this definition are neither nefarious nor problematic. ... By contrast, most “spam” accounts are an unwanted nuisance.

Some general data analytics notes from their post:

* Then lump together fake and spam in their analysis - and this really matters! somewhere like NYT is both 'fake' meaning it isn't a real person and A HIGHLY VALUABLE ACCOUNT for twitter to have.

* They use a sample of 44,058 accounts (of ~1.047B)

* They look at a number of classifying variables (17), spam accounts met 10+ of those 17 criteria. They don't list all 17.

* The criteria were developed from a "machine learning process" that is undescribed, and was developed from a sample of 35,000 'known' fake twitter followers bought from 3 vendors and 50,000 claimed non-spam accounts. They appear (imply?) to have used 50% training 50% real data but dont't specify explicitly.

* They say their model is about 65% accurate, and unlikely to produce false positives ("almost never includes false positives") - however they don't list any specificity, sensitivity, etc. that would be useful to evaluating that claim.

* The analysis does no statistical tests, no confidence intervals, minimal information about how the model was tested or validated.

* Critically: they note, but do not describe or quantify, that a lot of the criteria are highly correlated

* then later in the article they suddenly seem to switch to a 10 point scale for quality away from their 17 point scale? with a threshold of 3 or below as low quality?

* My personal twitter account meets most of the metrics where they have listed a quantifiable threshold. And their fake followers tool lists it as pretty f'ing suspicious - i.e., low quality.

I'm not saying there wrong but I am saying good luck getting this from a blog post to any sort of respectable science publication. As they note at the end, they aren't even calculating the same metric - twitter uses monetizable daily active users - remember NYtimes? Absolutely a monetizable account - even if it isn't a real person.

anyone who thinks this is proof of Elon's 4D chess based on this article is, to me, frankly delusional.

soneca · 4 years ago
Turning on my cynicism switch on a bit. The author is a very good content marketer. A hot topic in our corner of the world — which is author’s target audience — is Elon Musk buying Twitter. Musk tweeted that the percentage of bots is the main issue of the deal. He disputed Twitter’s number of 5%.

I believe the author writing prompt was just: a headline about fake Twitter accounts showing a number significantly higher that 5%. That’s it. Whatever the methodology, that was the author’s goal.

The article achieved this goal. Otherwise is completely irrelevant. Even for the person who wrote it.

Bubble_Pop_22 · 4 years ago
> The article achieved this goal. Otherwise is completely irrelevant. Even for the person who wrote it.

It's almost like Inception isn't it? A PR stunt within a PR stunt within a PR stunt.

grumple · 4 years ago
My account was active until recently (deleted when Twitter accepted Musk's offer, I don't need to be a participant in a right wing cesspool). I have 0 tweets. I don't like things, because I don't want my name attached to someone else.

Deleted Comment

mattzito · 4 years ago
This study is a great example of how you can use the data you have available to talk yourself into your conclusions. The implicit point of the study is to refute the "5% of Twitter accounts are spam" stat from Twitter's 10-Q that was the basis of his putting the twitter acquisition "on hold".

Except - the baseline that they choose is entirely NOT comparable to that of Twitter's baseline. The study says:

> Followerwonk selected a random sample from only those accounts that had public tweets published to their profile in the last 90 days, a clear indication of “activity.” Further, Followerwonk regularly updates its profile database (every 30 days) to remove any protected or deleted accounts. We believe this sample is both large enough in size to be statistically significant, and curated to most closely resemble what Twitter might consider a monetizable Daily Active User (mDAU).

Except that we know what Twitter defines as a monetizable DAU:

> We define monetizable daily active usage or users (mDAU) as Twitter users who logged in and accessed Twitter on any given day through Twitter.com or Twitter applications that are able to show ads.

Nothing about posting, nothing about engagement at all - simply: were you able to see an ad?

So there isn't any reason to claim that this "might" represent what Twitter uses as an mDAU - we know, in fact, that is not how they measure it. A more honest statement would have been:

"We selected a random sample of (etc. etc.). We believe that this sample is large enough to be significantly significant, however, it can not be compared to Twitter's mDAU set, as it does not count passive consumers of Twitter content. Instead, this data can be used to suggest that a significant amount of the total posted content on Twitter is delivered by bots"

My guess is the number of consumers of content is greater than the posters of content by several orders of magnitude, though some of that would be mitigated by the longer time horizon.

gitfan86 · 4 years ago
The whole thing is BS. Twitter could have easily got an accurate number if they wanted to. And Elon could have forced them to do that as part of the deal or done a decent job on his own before the offer.

Both sides are bullshitting to negotiate a better price.

mattzito · 4 years ago
The only one bullshitting is Elon Musk - that 5% number has been in Twitter's 10-Q for literally years. Here's the 10-Q from Q3 2020:

https://d18rn0p25nwr6d.cloudfront.net/CIK-0001418091/cb1d93d...

"We have performed an internal review of a sample of accounts and estimate that the average of false or spam accounts during the third quarter of 2020 represented fewer than 5% of our mDAU during the quarter. The false or spam accounts for a period represents the average of false or spam accounts in the samples during each monthly analysis period during the quarter"

This is not a new stat or new information.

andreyk · 4 years ago
To be fair, they do admit this:

"Are you challenging Twitter’s earnings report, saying that <5% of mDAUs are fake/spam?

    We are not disputing Twitter’s claim. There’s no way to know what criteria Twitter uses to identify a “monetizable daily active user” (mDAU) nor how they classify “fake/spam” accounts. We believe our methodology (detailed above) to be the best system available to public researchers. But, internally, Twitter likely has unknowable processes that we cannot replicate with only their public data."

mgas · 4 years ago
You only need to see who the author of this post is to know that the methodology is crap, the numbers are likely made up (19.42% is WAY too specific), and the post is just a grab for media attention on the coattails of some other internet meme garbage.

This guy (Rand Fishkin) has been selling SEO as a religion for the better part of this century, and is in no small part responsible for all the search-result-garbage style websites everyone is complaining about elsewhere on HN today and every other day.

He's a third-rate market-bro hack that's been taking advantage of web professionals who get thrown into SEO/Marketing jobs and have no idea what they're doing by relentlessly shoving half-assed corporate strategies through moz.com and now his new sparktoro.com, and calling himself the great SEO redeemer.

Wanna question his methodology? There is none. Wanna question his science? Totally devoid.

mike_hearn · 4 years ago
Unfortunately, this sort of thing is absolutely standard when third parties discuss Twitter spam. There is an astounding amount of academic research doing the same thing with methodologies just as silly.

https://blog.plan99.net/fake-science-part-ii-bots-that-are-n...

nonethewiser · 4 years ago
19.42 is a number rounded to 2dp. They rounded to "nearly 20" for the headline. I'm not sure how they could express things differently in this regard.
seydor · 4 years ago
well if his science is SEO he seems to be good at it
gentleman11 · 4 years ago
I signed up with a vpn and got banned for life after a single nonsense tweet about not liking the feed and following 4-5 famous people. I don’t think bot detection techniques are very robust

> This methodology likely undercounts spam and fake accounts, but almost never includes false positives (i.e. claiming an account is fake when it isn’t).

In other words, their model performs well on their training set, and they don’t acknowledge that it may be over fitted or mislabeled, and they hand wave mistakes

fivre · 4 years ago
they're robust at generating false positives! i got insta-locked for, i think, signing up using my own email domain, to make an account that shares pictures i took of geese. it took about 4 months to wait for them to unlock it

meanwhile, saudi bot armies apparently basically run rampant across the platform

joering2 · 4 years ago
Its get even worse than that. I have a test account with about 10,000 followers (mostly real people, you can tell), created via home-used IP, and few days after Musk tweet comparing Gates to Emoji of pregnant man, I did exact same tweet! With exact same punctuation, images, icons, everything.

Less than 2 days later I am banned for "spewing hatred based on sex, gender or religion". No amount of replies that my tweet was same as another but much much more popular account helped. Banned for life.

DerekBickerton · 4 years ago
I'm 6 months into my account and they still haven't asked for a phone number. I mean, I have one ready if they ask for it. Maybe I've been whitelisted as a non-bad-faith actor due to my timezone and other factors.
gone35 · 4 years ago
"70.23% of @ElonMusk followers are unlikely to be authentic, active users who see his tweets."

Somewhat alarming, if accurate.

ramblerman · 4 years ago
They define active as having tweeted in the last 90 days.

I'm active on twitter, check it every other day, follow @ElonMusk but I don't tweet. Perhaps I'm a unique case, or their assumptions are a bit off.

Some metrics they consider suspicious (from the article):

1) Accounts that didn't tweet recently

2) Accounts with low number of tweets

3) Accounts with a low number of followers

4) Accounts that didn't set up their own profile image

Lurking != bot, and these data-points would all hit high for lurkers. I'm somewhat suspicious of their results, especially given the results from this pew study suggesting the majority of twitter users don't tweet very much.

25% of Twitter Users Produce 97% of All Tweets: https://www.pewresearch.org/internet/2021/11/15/2-comparing-...

phpisthebest · 4 years ago
I am the same, in fact my account has never posted a thing ever, never liked at thing ever, I just follow people I want to see updates from that is all.

I treat it like RSS not Social Media

jacquesm · 4 years ago
With Twitters' changes to ask people to log in to see most content the number of lurkers probably shot up.
loceng · 4 years ago
I'm now wondering what % of overall tweets are from that 1/5th of bots.

If it's 1-1 then 20% of tweets being from bots isn't great - but it could even be more than that.

_joel · 4 years ago
Same here I never tweet but follow a few people.
Inu · 4 years ago
I think it's connected to a more general phenomenon though. Pewdiepie has 111 million subscribers on YT and gets like 3-5 million views per video. Like 95% of his subscribers don't watch his videos.
madeofpalk · 4 years ago
I've subscribed to a bunch of people on Youtube whos videos I no longer watch.

I think Youtube's UI "encourages" this behaviour even more as it has a highly algorithmic homepage rather than a feed of people you follow.

schleck8 · 4 years ago
Sounds accurate if you go through the comments under his tweets, a ton of url spam and scam 'trader' bots
jjoonathan · 4 years ago
The ratio of fake Elon Musk accounts to real Elon Musk accounts is probably even funnier.