- anything that allows file upload -> porn / warez / movies / any form of copyright violation you care to come up with.
- anything that allows anonymous file upload -> childporn + all of the above.
- anything that allows communications -> spam, harassment, bots
- anything that measures something -> destruction of that something (for instance, google, the links between pages)
- any platform where the creator did not think long and hard about how it might be abused -> all of the abuse that wasn't dealt with beforehand.
- anything that isn't secured -> all of the above.
Going through a risk analysis exercise and detecting the abuse potential of whatever you are trying to build prior to launching it can go a long way towards ensuring that doesn't happen. Reacting very swiftly to any 'off label' uses for what you've built and shutting down categorically any form of abuse and you might even keep it alive. React too slow and before you know it your real users are drowned out by the trash.
It's sad, but that's the state of affairs on the web as we have it today.
FWIW I made a website for our yearly project conference that allowed anybody to create an account and post material. But 1) all postings had to be moderated until the account was verified (either manually by me or by a code sent only to conference attendees) 2) I was pretty active in monitoring new posts and deleting posts and accounts that were clearly spam.
And of course the normal stuff, links all prefixed with "nofollow", only whitelisted markdown content / HTML, &c.
The first year we had a handful of spammers try to post stuff, but it didn't take much time at all to filter it out. This year we didn't have any spam accounts at all.
So my suspicion is that getting ahead of the "spam problem" by doing heavy manual moderation early on is an investment. If he'd just spent 5 minutes a day looking at the accounts and deleting rubbish, he would never have been swarmed by spammers, and the moderation load would have remained relatively low, until he got popular enough that he could afford to actually spend some real effort automating / crowdsourcing the spam-fighting capabilities.
The key here is manual moderation. At NodeBB we also have had our fair share of spam companies trying to build scripts to post things, and the only foolproof solution is manual moderation via a post queue for new users.
The downside, of course, is that they takes effort to maintain, and is a barrier to entry for new accounts.
100% this. As soon as an IP / Service appears for the first time it is scanned and hammered to look for low hanging fruit. Once the initial wave of exploit bots have found out they don't get far, your 'thing' is marked as 'too much faff' and malicious traffic drops off and is left alone (usually, y.m.m.v etc etc etc).
My experience is different. I manage a wiki. It was a wonderful success in terms of getting people to contribute, however the wiki itself (moinmoin) has lousy authorisation and authentication tools, so I wrote my own. A few versions later changed to moinmoin broke those tools, but by that time most of time content has settled down so I just reverted locking down the system by making the user database read only (literally: "chmod -R a-w"). That worked well enough.
However every so often you would have to create a new account. To do that the chmod had to be undone, but only for a minute or two. In that minute or two typically 2 or 3 spam accounts where created, and maybe 20 wiki pages spammed or defaced.
In short: defending the site had no effect whatsoever for us. The bots were always probing, every few seconds, and it never stopped even years after the site was totally locked down.
PS: SPAM wise the bots were just an annoyance. But every page hit runs moinmoin's Python code, and it's not the fastest thing. We were running on low end VPS's that took a dim view of anybody using too much CPU. Our VM regularly got shut down because of those bloody bots.
Interesting, thanks for sharing. I'll be launching a service that allows users to upload content and I've been thinking about how to handle anti-spam. I really appreciate you sharing your experience.
Manual moderation in this way could open you up to other forms of liability that you may have easier defense against without it.
I don't think it'd be an issue for a hardware forum, although mass approving comments and finding that one is defamatory about person or company X could get some lawyer's drool glands working for example..
Also section 230 appears to be under threat by the current administration and how things are interpreted - and I just read where Biden has been talking about removing it completely..
So this situation is in a state of flux at the moment I believe.
It's sad and also not always true. I think most people were pleasantly surprised by how the "get into a stranger's car" and "go stay with a stranger for a few days" models worked out for AirBNB and Uber.
Obviously there have been some issues and obviously they have processes in place to minimize and respond to abuse of their platforms, but overall both platforms rely on "trust by default" which is interesting and leads me to think that your comment might lean too far towards pessimism.
AirBnB and Uber are a bit different though, because being online gives you a sense of immunity and relieves a lot of inhibition that you'd have in real life. To some degree, we all have a different sense of morality online and in real life (like most people would never steal even a fruit in the supermarket but see no problem pirating movies and software) and some push it very far.
> I think most people were pleasantly surprised by how the "get into a stranger's car" and "go stay with a stranger for a few days" models worked out for AirBNB and Uber.
Most Americans for the former, maybe. "Get into a stranger's car" is literally an everyday routine in other places in the world, and has been for a long time.
> It's sad, but that's the state of affairs on the web as we have it today.
Hasn't this been the meta since ever since?
Imagining you'd know all about that with camarades. It was my first exposure to web streaming as a teen way back in the day. Initially, it didn't strike me as a porn platform, but it didn't take long before it felt that way.
At one point there was a movie shot at SAIL with rather open minded approaches to computer-human interaction. In the early 1970's (as in the twenties?), "hangups" were for inhibited squares. (If they'd been running unix instead of waits, they'd probably have invoked nohup(1) for the shoot.)
Yes, that is one of the reasons. Also, helping a number of other platforms to combat all kinds of forms of abuse. Unfortunately, as you correctly surmise, I have some relevant experience which some people find useful to tap into.
Camarades went 'downhill' about 6 months in when it went mainstream and more and more people that had other ideas of where we should go with it joined. Then they brought their buddies and it was 'game over', we had to decide what to do about it. This roughly coincided with the ad market collapse and drove the decision to put the porn behind a paywall to finance the rest of it. Which worked well for more than a decade.
There was an article on HN about Disney internet apps. Disney management consider "safe for children" to be a critical part of their brand. Their stance was very clear: no communication between users under any circumstances, ever.
yes and i'm always looking at HN Entries which show the next file upload etc. and think 'puh holy shit they are fearless' because thats why i'm not building something like this.
I would create a company to have legal seperation for my private assets, i would use a ton of mechnism upfront to make sure i do my best to not support childporn and stuff and when i have analyzed how much work the proper way is, i will just stop thinking about it :)
Are there recommended services where I can point to...say...an s3 object and validate it doesnt fall into some /all of the above categories? Seems like a good business and also something i'd pay for to remove this risk.
Yes, look at Amazon Rekognition. I use it (very successfully) for a website that used to encounter similar issues with file upload abuse.
In my case, image uploads aren't mandatory for users, but they are very helpful in identifying the spammers (for some reason, the spammers almost always try to upload images that get filtered, so it makes it easier to spot them). That, combined with an IP check (getipintel.net) has almost completely eliminated the spam issues on my site.
Was going to say generally the same thing. Back in the day it was anonymous FTP, if you enabled it in your server you didn't have to wait long and suddenly all the things would start showing up.
It reminded me a little bit of the guy that started a "buy a gift card with bitcoin" site, not realizing he was effectively creating a way to convert bitcoin into actual cash while avoiding an exchange. It was wildly successful until he realized that it was really really hard to get large quantities of bitcoin rapidly converted into cash.
It is a very coarse filter. Your ratio will go up but there will be plenty of people who feel that since they paid you they now really get to do as they please.
That would depend on the lifetime revenue per user gained spamming your system. There definitely is some sufficiently high signup fee, but it’s different for every spammer. (And for the child pornographers, there may be no fee you could charge to stop them.)
I believe it works for Metafilter, but you also run into problems with Credit Card abuse, fraudsters who create accounts and test which cards are not yet reported as having failed etc.
I predicted that about send the day it was launched. It helped that I had done a very close review of a large competitor in that space so seeing 'send' open up gave a lot of bad people a new playground with predictable results.
This is so true. We built Jumpshare (https://jumpshare.com) for file sharing and visual communication, so you can imagine the abuse we got. We had to spend a considerable time and resources to fight this abuse and adds checks and balances which slowed down our product roadmap.
Does something like email verification resolve this in some small way.
I ask because we are looking at extending our platform to include enterprise file sharing and rendering for niche file types, bit don't want to expose ourselves to the extra work you've described.
I've been maintaining a community forum for more than a decade. We had some abusive users, so we introduced pre-moderation. Meaning, any new user is on probation for a few posts, and all anonymous posts have to be manually approved by an admin to be posted publicly. This has pretty much completely stopped the visible abuse.
However, for about 10 years, there are bots registering every day, some of them even make realistic accounts with cool and unique username, email, even description. Strangely, the email and username never match... And they make post with html links embedded. They even know to actually select the 'HTML' content type, which is an extra select input. Some bots even make a few innocent posts, before the link spam. But just too few to get past probation. Obviously, the spam posts never get approved, and accounts never get out of pre-moderation queue and yet they're still trying every day... Not intelligent enough to make more than 2 dumb posts.
Similarly, I have some work projects that have user registration with a human on-boarding process, where another person has to add the user to their group for them to share any data, none of which is ever public. So these bots are tirelessly registering, and staying in limbo forever. Thousands of useless accounts.
It boggles the mind how much energy is wasted, but I guess it must my profitable enough.
It's amazing how some people can't seem to spot a bot/spam post, no matter how obvious.
One of the websites I host is for a retired elderly middle-aged academic who pens articles about his former field and invites comment.
Almost every week I'll get an email from him along the lines of "Is this worth following up?" and attached will be either a comment reading something like "I love your article. So much good info and useful to me.." or an email from some g3gergergew@gmail.com address saying "We love your site. It has much good infos and is useful but we are notice your CEO could being better..."
No matter how many times I tell him to look for the telltale signs...
* Random gibberish email from address
* Half a dozen links in the comment
* Broken English
* Generic text which could apply to ANY article or website in the bloody world
...he'll still forward them to me and ask if they're worth getting in contact with. Every time, I'm gobsmacked how he can fall for such inept spamming.
Ironically, there is another side to this too: you have to teach people that if they want a response, they need to include relevant information that relates to the person you are reaching out to, a "serious-looking" e-mail address, a proper greeting, good grammar...
Especially young professionals struggle to understand why any of this is important, and are left wondering why they are not getting a response.
The ineptitude of the spamming is actually a sign of eptitude -- they intentionally include the signs you look for because you're a waste of their time. They have figured out the precise line to walk where bad marks like you avoid them but easy marks like him engage.
Many people never look at email addresses (just like they don't look at full URLs - the programs have made it harder to see them by default).
Many - most - native English speaking people don't really know 'bad' English. People I've known my entire life who've grown up in the US and went to university still have trouble writing more than a few coherent sentences. Folks are bad at writing, and I think tend to not look critically at bad writing.
I think part of the reason he "falls" for it is the contents of the spam are things he wants to be true. In other cases he'd see the warning signs and know it's spam, but with things he actually wants to be true, he would rather someone he trusts to tell him there's no way.
> This has pretty much completely stopped the visible abuse.
This process works well but it can also turn away a ton of people from ever joining.
I know there's this one product that only offers support through public forums and new accounts need to be reviewed by a moderator and then you're not allowed to post any threads until you've made a few replies to other threads and it's been white listed by a moderator, and then on top of that and you also can't use links until you've met some other criteria.
But in order to get proper support it requires linking to large files (videos) that aren't able to be uploaded directly through the forums.
It becomes such a pain in the ass to open a support request. It could easily take over a week just to post your question and the worst part about it is the forum software orders posts by date, and your post will be buried on the 8th page before it's visible because it takes your original pre-moderated post date as when it was created.
That’s a very good point. I have experienced this myself, but I don’t see another way to stop abuse except tweaking the requirements. It sounds like too many hoops in your example. On our site if the user makes just one relevant comment I instantly promote them. It’s that easy to tell. I wish this deterrent wasn’t justified by walls of spam as the alternative.
There's one website I run where we were getting some spam through the "contact us" form. No problem, I'll just install captcha. I looked at the docs and realized it was more effort than I was willing to spend to kill a few spam messages a week. So, I just added an extra input field where the user has to type in "123". No spam since.
I thought it was kind of funny how easy it was to defeat the spammers, but on the other hand, there are so many easy targets that it's not worth their effort to try to overcome something so elementary just for our site. Conversely, there are much higher value targets where captcha is very valuable and worth the effort to implement because it is worth it for the abusers to try to defeat them.
> They even know to actually select the 'HTML' content type, which is an extra select input.
It's not surprising because spammers write scripts for all sorts of platforms (WordPress, vBulletin, etc.). There's probably no custom code written to attack your site.
They detect your platform and use a script from their pre-existing library to post their spam.
For realistic usernames, they can just reuse names they've seen on other platforms.
Goes the same for emails. Any existing email list could be a source for genuine sounding names. Just throw a couple of numbers at the end of the name and you've got a unique name that a human already came up with.
Captchas are the absolute spawn of the devil. Every time I see one on a website, I want to punch the face of the person who invented them, until my fists hurt. Unless I really NEED to use that site, I just go elsewhere, once I get a Captcha shoved in my face.
"Take a competitor product, remove all features you don’t need, and make it crazy fast."
Seems to me there are hundreds of lifestyle businesses just waiting to happen by following this formula.
So many good ideas out there could be made so much better by reducing them to their essentials, but making them elegant and "crazy fast".
I think you may have just re-discovered Disruptive Innovation (sometimes also called Disruption Theory): incumbents over-serve their customers by adding lots of features, complexity, and cost. Upstarts can attack them by focusing on only a few core features and/or low price. The incumbents can't respond without annoying their existing customers who have grown accustomed to all the features the incumbent provides.
YouTube seems to be playing the boiling frog experiment with people. I started YouTubing in 2006 and there were no ads. Then, eventually they added monetization and you could place a single ad at the beginning of your videos. Now I see video with 10x two ads interspersed. You have to constantly click to bypass them. It's getting incredibly annoying, and it just seems greedy, especially coming from Google.
That, combined with generally treating their content creators like they are completely disposable. I hope someday someone disrupts YouTube. It seems to me, besides the "network effect", the main difficulty here is unfortunately the cost of bandwidth. I could host a reddit clone from my home machine or some cheap VPS if I wanted and scale up to several thousand users, but video content at 5 megabits per second... How do you get bandwidth cheap enough to host that? Are there hosting providers that will just serve files over HTTP for super cheap?
One place I worked wanted multiple people to "own" a story, but JIRA doesn't work like that, so they implemented a totally new custom "owner" field that did allow it and then told everyone not to use the native owner field. Now you had to track everything two ways.
How interesting, my experience is completely different - I'm a UX Designer working in a squad and I/we find JIRA super easy to use, allows you to customise ticket types pretty much to our heart's content (including removing stuff you don't need), ditto customising the board and other features, allows us to track changes, comment on tickets, and works reliably. Almost nothing else allows us to do the same. Admittedly Jira has been overhauled recently and is much better than it was, plus some new features have been added. Oh, and there's an app which allows me to do most of what I need to do on my phone super quickly, partially thanks to the notifications.
I got so annoyed when someone that I shared JIRA admin credentials with changed the front page to a Kanban board. All I wanted was bloody issue tracking! KISS.
That sums up so perfectly what I did with a B2B SaaS product of mine. I never found words so perfectly as this quote does to describe what I was aiming to do.
Yep. The late 90's and early 2000's was littered with people trying to make "light" copies of MS Word. The problem is that journalists need the wordcount feature, and teachers need the wordart feature. Remove either, you lose a demographic.
That having been said, there are a lot of products out there that made their product intending it to be free, and then when they hit 1m users they started thinking "hmmm, if I could get a dollar out of every user, I could buy a house". They try to stuff a monetization model in sideways and damage their product in the process. Taking a moderately successful product that's crippled by attempting to shoehorn in monetization and redesigning it to have reasonable monetization from the beginning might be a better strategy.
Yup. If you build a communications platform, it will be used for spam. If you build a hosting platform, it will be used for porn. If you build a linking platform, it will be used to spam links for porn.
Anything else requires a constant uphill battle of content filtering and deletion. You could call it censorship, but it's a necessary reality.
Which, I expect, will be used for [unlawful] extreme porn in one direction and non-porn copyright infringement in the other direction.
Mind you all roads built get used for crime. There's a point at which mitigations for advise of their services become unreasonable to expect of a company.
I suspect that this is one reason you're seeing virtual conferences that are charging $50 or whatever. Yeah, it's a few dollars to offset costs. But it also keeps out the "riff-raff" so to speak.
Charging nominal fees are going to keep out the most casual users but I'm not sure it's a bad idea.
I have my own hobby side-project that allows user-generated content. Do you have some tips for minimizing the ways it could be used by spammers/porn?
For example, my project is a PWA which (I assume) makes it harder for spammers to use because there are fewer direct links that could be used for spam.
What about using verified emails? Google's captcha (or similar)?
It's all a joke and it is censorship. On one hand FB is hiding behind freedom of speech but on the other hand I cannot send specific form of links even in private via messenger.
We were running a privacy focussed Chat-App-Network dating platform in 2017 which was accelerated by Facebook[1].
i.e. A network between, Messenger<->Viber<->Telegram<->Line App.
By design no media sharing was allowed(to prevent pornography) and the user profile images were received from the platform itself. But we soon faced unique challenge of people from certain countries using their children picture as profile picture(often just the children), there were people with group photo as profile image and then there were people who using explicit images as profile pictures.
So we integrated Amazon Rekognition to identify children, group and explicit images. Those using explicit images were banned immediately and those with children/group photo image(Face detection not Facial recognition) were asked to change their profile image(Their profile was not shown to anyone until they changed their profile picture with just them). We were processing >200,000 profile images per month as people change their profile images often.
But, as we very well know Amazon Rekognition or for the fact any such ML solution is not 100% accurate, we faced issues with people with darker skin color(Amazon told me that they were working to fix the issue; exactly why this type of half baked tech shouldn't be used for things which can cause harm) and so we had to reduce the confidence levels to such an extent that anything resembling a child would be flagged by the system(False positives are better in this case than false negatives).
> privacy focused Chat-App-Network... accelerated by Facebook
Why would any company focusing on privacy partner with Facebook or Google? I would guess that some ardent supporters of such a product/company would be put off by such a partnership, no?
Messenger had >1Billion users and so its users were 98% of our user base. We enabled them to communicate with users of other chat apps and vice versa. We didn't even use the 'Name' of the users.
I applied for their bootstrap phase under their FbStart program but they directly selected it for the Acceleration phase.
As for why I applied for Facebook if I care about Privacy?
I was a disabled, solopreneur from a village in India, without any kind of network strength competing with Valley behemoths and any kind of help is not just a force multiplier but life or death(But my product was selected meritoriously by FB). Facebook privacy issues(Cambridge Analytica) started only several months after I launched the product, so the image of Facebook was not like what's today. But it did bother me and just after a year of running the platform successfully, I had to close my startup due to my health issues[1]. I did not sell my platform to safe guard the privacy of the users.
Wow this is incredible for multiple reasons. I'm also working on a dating app and pondered this "security/content" issue, and I am also a victim of spinal damage. I had cervical myelopathy in 2018 and had to have an emergency fusion on C5-C7. I still experience various neurological issues for which they can't trace (they claim it's not related to the spine issues), but causes symptoms very similar to multiple sclerosis (although that's been ruled out). Your condition looks and sounds even more serious than that. I'm sorry you've gone through so much. I hope your health improves and I wish you success!
Thank you for your kind words. I can visualise what you had to go through. I share mutual respect and wishes for your recovery or should I say 'Management of our conditions'.
>I still experience various neurological issues for which they can't trace
The main issue I had (Tingling on the Face) has been successfully resolved after the surgery. Any other discomfort I had has been largely due to anxiety and post-traumatic stress from surgery, loosing my hard earned startup etc.
So targeted efforts in bringing down the anxiety helps me a lot[Staying in present, taking lesser sensory inputs(had dozens of phone calls earlier, now its zero phone calls, only email].
Side note: Does wearing hear rate monitor on the wrist, like Fitbit/Apple watch hurt you after 15-30 mins?[1].
Thank you, I think we incurred ~ $200 for ~200,000 images for Rekognition API and I also think Amazon had fairly large free-tier limit for Rekognition at that time, since the service was relatively new.
A market platform I recently worked on allowed users (free sign up) to create multiple wishlists and then send those wishlists to arbitrary email addresses. The user could set a custom title, limited to 100 characters or so.
We soon discovered a similar problem to OPs - bot accounts (mostly @qq.com addresses) were registering by the hundreds per day to create wishlists and then send those wishlists to other @qq.com addresses. They were setting the titles to arbitrary code blocks.
I found it fascinating, if terribly inefficient. Some colleagues and I were speculating on the purpose, perhaps someone experimenting some kind of laundered botnet control path?
We tried all kinds of measures to prevent it but ultimately we blocked all @qq.com accounts and eventually disabled the wishlist feature altogether as it had such little real usage.
We allow users to sign up for a free trial for our product, you have to put in your name & email address. After the trial expires, we send an email that says "Hey So-and-so, your trial ran out, click here to give us money, etc." Some enterprising spammers filled in the name field with spam URLs and the email field with victims' email addresses, in order to spam them. So the victim would get "Hey hxxp://buyfreerolex.com/, your trial ran out..." spam emails, from our email server. Obviously we've fixed it since, but it's absolutely wild the length spammers will go to.
Ha, thank you! That explains it... I've had several signups via Tor to my service, none of them confirmed, every few days... I guess they were checking if they can somehow abuse the mails.
We had somebody signing up to a website with Russian email addresses. What they did though was set the the Personal name to something like the following in Russian:
So when we sent out the email to verify the signup the receiver saw some English text they couldn't read and the above instructions in Russian telling them to click on the link.
This was a Magento site so I assume it was a standard bot.
We've had multiple spammer attacks over the years. Our platform allows users to create and publish their own content. Our primary target is education, teachers and students. But naturally it's being abused by spammers. It's been an interesting cat and mouse game to counter them.
- One time, they used our platform to publish links to their streaming websites for the quarter finals of the 2018 Champions League. Suddenly we ended up being first result on Google for "arsenal v barcelona". It was fifteen minutes before the game so you can imagine that we got a lot of traffic. On the one hand it was kind of flattering that the SEO ranking of our domain was so strong. On the other hand, it wasn't great nor beneficial for the platform to be abused like that. As a counter-measure, we decided to block indexing of project pages for 24 hours when they're first made public. The spammers never came back.
- Another time, we got an email from AWS that our SES bounce rate was 15%, and it was rising fast. Being blocked from sending emails by AWS would have been a disaster. It turned out that our invitation system was abused. A creator of a project can invite an external person by email. That person receives an email saying "John Doe invites you to collaborate on 'A nice project about the 2018 Champions League'" with a link to the project. Replace "A nice project about the 2018 Champions League" by a Chinese ad and you've got yourself spammers who are sending thousands of emails a second to a random collection of email addresses. Naturally a lot of these bounce, which caused AWS to warn us. So we had to start verifying the MX validity of invited email addresses and throttle the system to a maximum of 100 emails in a window of 24 hours.
- We still get a lot of spammers publishing obvious spammy projects. One thing that has helped is the Clearbit Risk API. You send them an email address and it comes back with an assessment of how spammy the address is. We use it for certain domains (protonmail.com, yandex.com,...) and it frequently flags someone as a spammer right after signup. They can still use the platform but can't make stuff public, completely defeating the purpose of them being spammy.
I'm sure they'll keep finding creative ways to get around the limitations we put in. The toughest is to find a way to counter them without it hampering the experience for all the other users.
I later discovered that Instagram banned all mylink.fyi links from the platform. A customer also confirmed to me via email that Snapchat started blocking links. Heh, I’m banned by Instagram and Snapchat!
[...]
If you’re interested in acquiring the domain name, and/or the app, let’s talk.
If you're one of those loud folks who dislike instagram/facebook (like me) then this is a nice way to ensure your content and data does not end up on the platform.
Of course, they're only enforcing it themselves, so it's unlikely to be permanent. :(
- anything that allows anonymous file upload -> childporn + all of the above.
- anything that allows communications -> spam, harassment, bots
- anything that measures something -> destruction of that something (for instance, google, the links between pages)
- any platform where the creator did not think long and hard about how it might be abused -> all of the abuse that wasn't dealt with beforehand.
- anything that isn't secured -> all of the above.
Going through a risk analysis exercise and detecting the abuse potential of whatever you are trying to build prior to launching it can go a long way towards ensuring that doesn't happen. Reacting very swiftly to any 'off label' uses for what you've built and shutting down categorically any form of abuse and you might even keep it alive. React too slow and before you know it your real users are drowned out by the trash.
It's sad, but that's the state of affairs on the web as we have it today.
And of course the normal stuff, links all prefixed with "nofollow", only whitelisted markdown content / HTML, &c.
The first year we had a handful of spammers try to post stuff, but it didn't take much time at all to filter it out. This year we didn't have any spam accounts at all.
So my suspicion is that getting ahead of the "spam problem" by doing heavy manual moderation early on is an investment. If he'd just spent 5 minutes a day looking at the accounts and deleting rubbish, he would never have been swarmed by spammers, and the moderation load would have remained relatively low, until he got popular enough that he could afford to actually spend some real effort automating / crowdsourcing the spam-fighting capabilities.
The downside, of course, is that they takes effort to maintain, and is a barrier to entry for new accounts.
However every so often you would have to create a new account. To do that the chmod had to be undone, but only for a minute or two. In that minute or two typically 2 or 3 spam accounts where created, and maybe 20 wiki pages spammed or defaced.
In short: defending the site had no effect whatsoever for us. The bots were always probing, every few seconds, and it never stopped even years after the site was totally locked down.
PS: SPAM wise the bots were just an annoyance. But every page hit runs moinmoin's Python code, and it's not the fastest thing. We were running on low end VPS's that took a dim view of anybody using too much CPU. Our VM regularly got shut down because of those bloody bots.
I don't think it'd be an issue for a hardware forum, although mass approving comments and finding that one is defamatory about person or company X could get some lawyer's drool glands working for example..
Also section 230 appears to be under threat by the current administration and how things are interpreted - and I just read where Biden has been talking about removing it completely..
So this situation is in a state of flux at the moment I believe.
/not a lawyer / doctor yada yada
Obviously there have been some issues and obviously they have processes in place to minimize and respond to abuse of their platforms, but overall both platforms rely on "trust by default" which is interesting and leads me to think that your comment might lean too far towards pessimism.
Most Americans for the former, maybe. "Get into a stranger's car" is literally an everyday routine in other places in the world, and has been for a long time.
Hasn't this been the meta since ever since?
Imagining you'd know all about that with camarades. It was my first exposure to web streaming as a teen way back in the day. Initially, it didn't strike me as a porn platform, but it didn't take long before it felt that way.
related, SFW: https://virtuallyfun.com/wordpress/wp-content/uploads/2010/0... and https://de.wikipedia.org/wiki/ASCII-Art#/media/Datei:Schreib...
At one point there was a movie shot at SAIL with rather open minded approaches to computer-human interaction. In the early 1970's (as in the twenties?), "hangups" were for inhibited squares. (If they'd been running unix instead of waits, they'd probably have invoked nohup(1) for the shoot.)
Going back yet another century prior, we find the Victorian equivalent to OnlyFans: https://news.ycombinator.com/item?id=23791112
Camarades went 'downhill' about 6 months in when it went mainstream and more and more people that had other ideas of where we should go with it joined. Then they brought their buddies and it was 'game over', we had to decide what to do about it. This roughly coincided with the ad market collapse and drove the decision to put the porn behind a paywall to finance the rest of it. Which worked well for more than a decade.
I would create a company to have legal seperation for my private assets, i would use a ton of mechnism upfront to make sure i do my best to not support childporn and stuff and when i have analyzed how much work the proper way is, i will just stop thinking about it :)
In my case, image uploads aren't mandatory for users, but they are very helpful in identifying the spammers (for some reason, the spammers almost always try to upload images that get filtered, so it makes it easier to spot them). That, combined with an IP check (getipintel.net) has almost completely eliminated the spam issues on my site.
It reminded me a little bit of the guy that started a "buy a gift card with bitcoin" site, not realizing he was effectively creating a way to convert bitcoin into actual cash while avoiding an exchange. It was wildly successful until he realized that it was really really hard to get large quantities of bitcoin rapidly converted into cash.
It is a nightmare.
[1] https://www.zdnet.com/article/mozilla-suspends-firefox-send-...
Put any technical book title into google with site:github.com in it and click the first PDF that shows up.
For fun, see https://github.com/topics/pornography
I ask because we are looking at extending our platform to include enterprise file sharing and rendering for niche file types, bit don't want to expose ourselves to the extra work you've described.
However, for about 10 years, there are bots registering every day, some of them even make realistic accounts with cool and unique username, email, even description. Strangely, the email and username never match... And they make post with html links embedded. They even know to actually select the 'HTML' content type, which is an extra select input. Some bots even make a few innocent posts, before the link spam. But just too few to get past probation. Obviously, the spam posts never get approved, and accounts never get out of pre-moderation queue and yet they're still trying every day... Not intelligent enough to make more than 2 dumb posts.
Similarly, I have some work projects that have user registration with a human on-boarding process, where another person has to add the user to their group for them to share any data, none of which is ever public. So these bots are tirelessly registering, and staying in limbo forever. Thousands of useless accounts.
It boggles the mind how much energy is wasted, but I guess it must my profitable enough.
One of the websites I host is for a retired elderly middle-aged academic who pens articles about his former field and invites comment.
Almost every week I'll get an email from him along the lines of "Is this worth following up?" and attached will be either a comment reading something like "I love your article. So much good info and useful to me.." or an email from some g3gergergew@gmail.com address saying "We love your site. It has much good infos and is useful but we are notice your CEO could being better..."
No matter how many times I tell him to look for the telltale signs...
* Random gibberish email from address
* Half a dozen links in the comment
* Broken English
* Generic text which could apply to ANY article or website in the bloody world
...he'll still forward them to me and ask if they're worth getting in contact with. Every time, I'm gobsmacked how he can fall for such inept spamming.
Especially young professionals struggle to understand why any of this is important, and are left wondering why they are not getting a response.
Many - most - native English speaking people don't really know 'bad' English. People I've known my entire life who've grown up in the US and went to university still have trouble writing more than a few coherent sentences. Folks are bad at writing, and I think tend to not look critically at bad writing.
This process works well but it can also turn away a ton of people from ever joining.
I know there's this one product that only offers support through public forums and new accounts need to be reviewed by a moderator and then you're not allowed to post any threads until you've made a few replies to other threads and it's been white listed by a moderator, and then on top of that and you also can't use links until you've met some other criteria.
But in order to get proper support it requires linking to large files (videos) that aren't able to be uploaded directly through the forums.
It becomes such a pain in the ass to open a support request. It could easily take over a week just to post your question and the worst part about it is the forum software orders posts by date, and your post will be buried on the 8th page before it's visible because it takes your original pre-moderated post date as when it was created.
I thought it was kind of funny how easy it was to defeat the spammers, but on the other hand, there are so many easy targets that it's not worth their effort to try to overcome something so elementary just for our site. Conversely, there are much higher value targets where captcha is very valuable and worth the effort to implement because it is worth it for the abusers to try to defeat them.
It's not surprising because spammers write scripts for all sorts of platforms (WordPress, vBulletin, etc.). There's probably no custom code written to attack your site.
They detect your platform and use a script from their pre-existing library to post their spam.
For realistic usernames, they can just reuse names they've seen on other platforms.
Goes the same for emails. Any existing email list could be a source for genuine sounding names. Just throw a couple of numbers at the end of the name and you've got a unique name that a human already came up with.
Seems to me there are hundreds of lifestyle businesses just waiting to happen by following this formula. So many good ideas out there could be made so much better by reducing them to their essentials, but making them elegant and "crazy fast".
https://hbr.org/2015/12/what-is-disruptive-innovation
That, combined with generally treating their content creators like they are completely disposable. I hope someday someone disrupts YouTube. It seems to me, besides the "network effect", the main difficulty here is unfortunately the cost of bandwidth. I could host a reddit clone from my home machine or some cheap VPS if I wanted and scale up to several thousand users, but video content at 5 megabits per second... How do you get bandwidth cheap enough to host that? Are there hosting providers that will just serve files over HTTP for super cheap?
Wow. Traded all useablity for endless features.
Our setup barely works. And creating a story is such a massive pain. Sometimes things just won’t work.
So I refuse to use it for stories beyond one mega story.
That sums up so perfectly what I did with a B2B SaaS product of mine. I never found words so perfectly as this quote does to describe what I was aiming to do.
I would love to take a look at your B2B project for inspiration if that's possible :)
That having been said, there are a lot of products out there that made their product intending it to be free, and then when they hit 1m users they started thinking "hmmm, if I could get a dollar out of every user, I could buy a house". They try to stuff a monetization model in sideways and damage their product in the process. Taking a moderately successful product that's crippled by attempting to shoehorn in monetization and redesigning it to have reasonable monetization from the beginning might be a better strategy.
Anything else requires a constant uphill battle of content filtering and deletion. You could call it censorship, but it's a necessary reality.
"I made a great fast, free porn hosting site. But then these nerds showed up and started uploading their Git repos onto it."
Mind you all roads built get used for crime. There's a point at which mitigations for advise of their services become unreasonable to expect of a company.
Depends on how you structure it. If they're paying to list, then it makes things a lot easier.
Charging nominal fees are going to keep out the most casual users but I'm not sure it's a bad idea.
For example, my project is a PWA which (I assume) makes it harder for spammers to use because there are fewer direct links that could be used for spam.
What about using verified emails? Google's captcha (or similar)?
Have actual (paid) humans review the content that users upload.
Porn and spam always take over eventually, because they're consumed by, ahem, highly motivated individuals.
i.e. A network between, Messenger<->Viber<->Telegram<->Line App.
By design no media sharing was allowed(to prevent pornography) and the user profile images were received from the platform itself. But we soon faced unique challenge of people from certain countries using their children picture as profile picture(often just the children), there were people with group photo as profile image and then there were people who using explicit images as profile pictures.
So we integrated Amazon Rekognition to identify children, group and explicit images. Those using explicit images were banned immediately and those with children/group photo image(Face detection not Facial recognition) were asked to change their profile image(Their profile was not shown to anyone until they changed their profile picture with just them). We were processing >200,000 profile images per month as people change their profile images often.
But, as we very well know Amazon Rekognition or for the fact any such ML solution is not 100% accurate, we faced issues with people with darker skin color(Amazon told me that they were working to fix the issue; exactly why this type of half baked tech shouldn't be used for things which can cause harm) and so we had to reduce the confidence levels to such an extent that anything resembling a child would be flagged by the system(False positives are better in this case than false negatives).
[1]https://hitstartup.com/about/#FindDate
Why would any company focusing on privacy partner with Facebook or Google? I would guess that some ardent supporters of such a product/company would be put off by such a partnership, no?
> Chat-App-Network
Messenger had >1Billion users and so its users were 98% of our user base. We enabled them to communicate with users of other chat apps and vice versa. We didn't even use the 'Name' of the users.
I applied for their bootstrap phase under their FbStart program but they directly selected it for the Acceleration phase.
As for why I applied for Facebook if I care about Privacy?
I was a disabled, solopreneur from a village in India, without any kind of network strength competing with Valley behemoths and any kind of help is not just a force multiplier but life or death(But my product was selected meritoriously by FB). Facebook privacy issues(Cambridge Analytica) started only several months after I launched the product, so the image of Facebook was not like what's today. But it did bother me and just after a year of running the platform successfully, I had to close my startup due to my health issues[1]. I did not sell my platform to safe guard the privacy of the users.
[1]https://abishekmuthian.com/i-was-told-i-would-become-quadrip...
Thank you for your kind words. I can visualise what you had to go through. I share mutual respect and wishes for your recovery or should I say 'Management of our conditions'.
>I still experience various neurological issues for which they can't trace
The main issue I had (Tingling on the Face) has been successfully resolved after the surgery. Any other discomfort I had has been largely due to anxiety and post-traumatic stress from surgery, loosing my hard earned startup etc.
So targeted efforts in bringing down the anxiety helps me a lot[Staying in present, taking lesser sensory inputs(had dozens of phone calls earlier, now its zero phone calls, only email].
Side note: Does wearing hear rate monitor on the wrist, like Fitbit/Apple watch hurt you after 15-30 mins?[1].
[1]https://abishekmuthian.com/my-experience-with-fitbit-charge-...
We soon discovered a similar problem to OPs - bot accounts (mostly @qq.com addresses) were registering by the hundreds per day to create wishlists and then send those wishlists to other @qq.com addresses. They were setting the titles to arbitrary code blocks.
I found it fascinating, if terribly inefficient. Some colleagues and I were speculating on the purpose, perhaps someone experimenting some kind of laundered botnet control path?
We tried all kinds of measures to prevent it but ultimately we blocked all @qq.com accounts and eventually disabled the wishlist feature altogether as it had such little real usage.
"Click here to see a translation http://whatever.r/?username"
So when we sent out the email to verify the signup the receiver saw some English text they couldn't read and the above instructions in Russian telling them to click on the link.
This was a Magento site so I assume it was a standard bot.
- One time, they used our platform to publish links to their streaming websites for the quarter finals of the 2018 Champions League. Suddenly we ended up being first result on Google for "arsenal v barcelona". It was fifteen minutes before the game so you can imagine that we got a lot of traffic. On the one hand it was kind of flattering that the SEO ranking of our domain was so strong. On the other hand, it wasn't great nor beneficial for the platform to be abused like that. As a counter-measure, we decided to block indexing of project pages for 24 hours when they're first made public. The spammers never came back.
- Another time, we got an email from AWS that our SES bounce rate was 15%, and it was rising fast. Being blocked from sending emails by AWS would have been a disaster. It turned out that our invitation system was abused. A creator of a project can invite an external person by email. That person receives an email saying "John Doe invites you to collaborate on 'A nice project about the 2018 Champions League'" with a link to the project. Replace "A nice project about the 2018 Champions League" by a Chinese ad and you've got yourself spammers who are sending thousands of emails a second to a random collection of email addresses. Naturally a lot of these bounce, which caused AWS to warn us. So we had to start verifying the MX validity of invited email addresses and throttle the system to a maximum of 100 emails in a window of 24 hours.
- We still get a lot of spammers publishing obvious spammy projects. One thing that has helped is the Clearbit Risk API. You send them an email address and it comes back with an assessment of how spammy the address is. We use it for certain domains (protonmail.com, yandex.com,...) and it frequently flags someone as a spammer right after signup. They can still use the platform but can't make stuff public, completely defeating the purpose of them being spammy.
I'm sure they'll keep finding creative ways to get around the limitations we put in. The toughest is to find a way to counter them without it hampering the experience for all the other users.
That domain name must have a negative value now?
If you're one of those loud folks who dislike instagram/facebook (like me) then this is a nice way to ensure your content and data does not end up on the platform.
Of course, they're only enforcing it themselves, so it's unlikely to be permanent. :(