I've dealt with some spammers to various degrees. I think one of the most effective ways of dealing with spammers is to - "shadowban" them. Allow them to use your service, but don't indicate to them that you've identified them as malicious. For instance, when dealing with chat spammers - allow them to chat, but do not show their chats to other users. Another level would be to allow them to chat, but only show their chat to other shadowbanned users. For the author's use case, perhaps something like - if the ip address that created the link shortener accesses the link, they get the real redirect, and if a different ip address accesses it, they get the scam warning page. If the malicious actor doesn't know they've been marked as malicious, they do not know they need to change their behavior.
The second most effective thing is making the malicious actor use some sort of resource. Such as a payment (the author uses), or a time commitment (eg new accounts can only create 1 link a day), or some other source of friction. The idea is that for legitimate users the friction is acceptably low, but for consistent spammers the cost becomes too high.
The 3rd thing I've found effective is that lots of spam comes from robots - or perhaps robots farming tasks to humans. If you can determine how the traffic is coming in and then filter that traffic effectively without indicating failure, robots can happily spam away and you can happily filter away.
IPv6 doesn’t solve this really. You’ll still ban at least /64 and you’ll switch to /48 for the particularly nasty ones. There’s zero reason to ban a specific ipv6 address.
That's actually a very interesting idea I hadn't seen before. Certainly makes it less obvious that one has been shadowbanned, and probably would help keep (non-bots) happy. I wonder if it'd be worth the investment to implement.
Shadowbanning is extremely hostile to users that have been mis-identified as spammers (which will happen) while spammers will quickly and easily figure out a way to determine if they've been shadowbanned. That approach needs to stop.
I've employed shadow banning on an online service deal with some deranged ban-evading individuals. It does help a lot. Granted, some of the more savvy users may figure out what you're doing, but you're often not dealing with the brightest minds. Given that your typical online service will maybe employ one moderator per 100k people, any reduction in workload is welcome.
> Shadowbanning is extremely hostile to users that have been mis-identified as spammers (which will happen)
It should always be a manual action and moderators should continue to see messages of shadowbanned users. You can always lift it in case of a mistake.
If you're going to have a free tier on your service and your service has any sort of interaction going on between users that could be degraded by spammers and the mentally insane, you're going to need shadowbanning. It's either shadowbanning or upping the hurdle to creating an account considerably.
I don't understand why shadowbanning would be so effective. It's trivial for any competent spammer to check their submissions from different ip addresses, they will very quickly discover if they are shadowbanned.
The risk of misidentifying legit users and shadowbanning them outweighs the potential gain.
> I don't understand why shadowbanning would be so effective
Because if done correctly the user never knows they are shadow-banned. It sounds trivial when you know _how_ the shadowban is done. But for instance, instead of an IP check, perhaps it's a time check - after 3 days it comes into play. Or a combination of different checks. So imagine that you are accessing a service that appears to be working correctly .... you would basically need to a) determine that that service even does shadowbanning, and b) think of infinite ways that you might be shadowbanned and try to determine if that's the case.
If you had the time and inclination you could even seed their account with mock stat's. I.e. when the link shortened is accessed, correctly log all of the metrics to their account so they have solid metrics indicating its working, but fail the actual consumer requests
Logging their metrics correctly is going to take resource. Instead, just set a flag on their account which, if true, means they just see some randomised junk stats.
A big problem that came up at the domain level was what I'd call
a _trustworthy domain with untrustworthy subdomains_, specifically
where those subdomains represent user-generated content.
The Public Suffix List (PSL) [1] to the rescue! It can help with this kind of disambiguation.
Paraphrasing, it's a list of domains where subdomains should be treated as separate sites (e.g. for cookie purposes). So `blogger.com` on the list means `*.blogger.com` are separate "sites".
What's the benefit of a link shortener, these days?
It made sense back before Twitter had one of their own. And I know that some people use it to get link analytics. I've also occasionally seen it used for printed materials, to get pretty URLs that are easy to hand-type.
People also use it for malicious purposes, such as hiding malware, or disguising referral links, or otherwise trying to obfuscate where a link is going. (Note: I'm not calling referral links malicious, I'm calling disguised referral links malicious.)
Other than printed materials (which need pretty URLs and thus often need a dedicated first-party URL shortener) and analytics, what are people using third-party URL shorteners for today?
I have written my own URL shortener. I do it partly to get URLs that are nice to type in printed materials.
I also use it to hedge my risks from using SaaS. For my org, we host some things that we offer to the public on different services. Sometimes a vendor doesn't work out. We use our shortened URLs in public communications, and I can redirect them to our new service if we need to switch. It was a way to address my discomfort with URLs that break too easily when you host on 3rd party services.
URL shorteners never made sense. Twitter was a dumb artificial limit. In 99,9% of cases, it's only used for tracking or obfuscation purposes. And URL shorteners die every day, leaving ArchiveTeam to clean up the mess again.
As someone who runs a small discussion forum its a great way for people who like to spam CSAM, malware, and other stuff I don't want in a way that gets past filters.
I think a conservative estimate of link shorteners usage is that 99% of cases are used by bad actors, and if they would all die out my life would be a lot easier. But, every week it seems some new one pops up and theres a new wave of spam to deal with.
At least thanks to this post I can add a new one to the filters before a wave of spam, so yay?
Sometimes reddit (and likely others) will try to parse a URL's valid characters as formatting and deadlink them (e.g. some wikipedia links with special characters)
They are useful for links that need to outlive the infrastructure they are hosted on. Think about them as a layer of abstraction. Ie. Links in paper published to a journal like nature. It might be valid for 10 years but the links embedded in it will rot quickly as organisations change cms's, domains names change. Organisations merge and disappear.
Also places where the cost to change the url is expensive, bus shelter adverts etc.
I think this is important but also hits the trust problem: open shorteners are basically training users to be phished but a controlled namespace doesn’t have that problem. Ideally you can use a domain you control for everything to get full control of your reputation while still retaining the flexibility to redirect links as needed.
As a user, I’m much more likely to click on the second link. Too many link shorteners come with ads and other annoyances that I’d rather not touch them. redirect-checker.org if I must
I use a link shortener for mailto links that include precomposed to/subject/body. It's handy to have the customer email you, and you reply, since your reply won't ever be marked as spam. If you gathered info via a webform and then emailed the customer, then it would be somewhat more likely to go to spam.
Which is most unfortunate... QR/Camera apps usually just show the domain anyways, and QR codes can easily fit large URL's. I imagine shorteners are used just so that they can choose a lower QR version and include a pretty logo in the middle.
I use them for easy memorization of tools and deployment stuff I use in my day-to-day IT work. It's also nice to be able to track if someone did what they were supposed to do.
This is a valid use case, my company does this, but I would never outsource it when a link expander isn't difficult to build exactly to the spec you want/need.
No disrespect to the folks at y_gy who are clearly doing their best. But link shorteners, even when used by good faith actors, are problematic because they hide the destination of the link, and of course that's an invitation for bad faith actors to exploit, so the battle will be endless. Shorteners got popular on Twitter back in the days when all the characters in the URL counted against a very short limit. But there's less need to use them these days, and I am very reluctant to click on shortened links and don't think that this is unusual.
> But link shorteners, even when used by good faith actors, are problematic because they hide the destination of the link
In a sense, Google Search is even more evil because they change the destination link on-click. So hovering on a search result link doesn't show you the true destination.
Semi related. When I worked at Visa, I developed some ideas around making QR codes slightly more resilient to malicious hijacking when used in the context of a payments or commerce usecase. The idea was for the scanning app to look not just for a QR but also look for adjacent payment acceptance marks (e.g. branded Visa, MC, PayPal, or a merchant's brandmark etc.) and then dynamically only resolve URLs to registered domains associate with those marks. The idea was that QR codes not human readable, and URLs are a lot to ask the average person to reliable parse. So instead, have the scanner also see and understand the same contextual cues that the human can see and understand. And for the human, give them the confidence to scan QRs that will take them to a domain they would expect, and not to a Rick Astley video or worse.
I was recently discussing this subject and I have to wonder if some combination of human readable symbols that is also optimized for machine scanning will emerge.
Right now any phone should be able to parse a url if it can read the type, and so what is the point of QR besides the ubiquity?
QR codes provide built-in error correction so will stand up to serious wear-and-tear, partially obscured images, etc. - and it won't confuse O with 0 and i with l
I can really relate to this article! I created T.LY URL Shortener in 2018, and I've encountered all these issues and more! I found out the hard way when my hosting company shut down my servers for malicious content about a week into launching the site. Malicious actors will go to all sorts of lengths to achieve their goals.
Be careful relying on Stripe to prevent these users. Next they will start using stolen credit cards to create accounts then you will face disputes. If you get too many, Stripe will prevent you from processing payments.
About a year ago, I launched a service called Link Shield. It's an API that returns risk scores (0-100) on URLs. It uses AI and other services to score if a URL is malicious. Check it out and let me know if you would be interested in trying it linkshieldapi.com/
This. And related: I don't want to have to try your system in order to get pricing. I've seen that a couple times, particularly for things that are in beta, where you don't even see pricing until the end of the trial period.
Integrating a new system requires some effort. And there are some systems, like the one in question here, where there's a real cap on how much value they could possibly provide for me, even if they're perfect.
If I can't see whether the pricing falls in that range before I need to sign up, I'm just not going to seriously consider it for most services.
This is really one of the worst patterns in the SAAS market.
I don't want to provide my data to multiple services just to be able to compare their prices and find out which one I'm actually gonna use. At first this will lead to countless automated mails from all those "founders" asking why I haven't started paying yet, and if I'm unlucky my credentials end up on haveibeenpwned.com…
privacy policy does not address data retention, maybe there is none, I just assume that some data would be collected by an API service
I would not use something like this to send my customer data to the thing to check a link, but if it was something that could be self hosted on my vps and a script to attach wordpress chat system to check with it - maybe..
But with no pricing showing, I am assuming I can't afford it anyhow.
What worries me the most about things like these is that it makes it seem like it's impossible to make "free for all" products like these anymore if you're not an established player already. You will get blacklisted and you will receive emails from your host telling you to shut it down...
Established players like bitly and tinyurl didn't have all the resources to deal with the problem when they started out either, and they arguably still don't, yet they get favored by the antivirus vendors and "safe"search blacklists, since they're well-known services. It doesn't seem fair.
Is this really the way it should be? I wonder if they could've explained the situation to the antivirus vendors: The site itself doesn't host malware and doesn't allow the discovery of said malware through its service. It requires a user to receive an exact URL, just like they could've received any other link, and the blocklists should operate on what's hidden behind it instead of the redirect in front. Maybe y.gy could've been hooked into the safesearch API to automatically nuke any URLs blacklisted already by them, or another antivirus vendor.
The second most effective thing is making the malicious actor use some sort of resource. Such as a payment (the author uses), or a time commitment (eg new accounts can only create 1 link a day), or some other source of friction. The idea is that for legitimate users the friction is acceptably low, but for consistent spammers the cost becomes too high.
The 3rd thing I've found effective is that lots of spam comes from robots - or perhaps robots farming tasks to humans. If you can determine how the traffic is coming in and then filter that traffic effectively without indicating failure, robots can happily spam away and you can happily filter away.
If there’s anything I have learned about IP based blocking, it’s very unreliable. Especially in a NAT’d world.
Great you “shadowbanned” an IP, but you also impacted many other people and devices behind that public IP including the bad actor.
IPv6 is supposed to make NAT irrelevant but adoption is still very low despite IPv4 deprecated more than 2 decades ago.
In this scenario it doesn't matter. Some user might be able to access the malware still, but that's better than not blocking it at all.
And IMHO, NAT won the fight against IPv6 because it’s backward compatible.
Deleted Comment
https://twitter.com/nearcyan/status/1532076277947330561
> Shadowbanning is extremely hostile to users that have been mis-identified as spammers (which will happen)
It should always be a manual action and moderators should continue to see messages of shadowbanned users. You can always lift it in case of a mistake.
If you're going to have a free tier on your service and your service has any sort of interaction going on between users that could be degraded by spammers and the mentally insane, you're going to need shadowbanning. It's either shadowbanning or upping the hurdle to creating an account considerably.
Deleted Comment
The risk of misidentifying legit users and shadowbanning them outweighs the potential gain.
Because if done correctly the user never knows they are shadow-banned. It sounds trivial when you know _how_ the shadowban is done. But for instance, instead of an IP check, perhaps it's a time check - after 3 days it comes into play. Or a combination of different checks. So imagine that you are accessing a service that appears to be working correctly .... you would basically need to a) determine that that service even does shadowbanning, and b) think of infinite ways that you might be shadowbanned and try to determine if that's the case.
What's the risk?
Paraphrasing, it's a list of domains where subdomains should be treated as separate sites (e.g. for cookie purposes). So `blogger.com` on the list means `*.blogger.com` are separate "sites".
[1] https://en.wikipedia.org/wiki/Public_Suffix_List
It looks like the repo where the list is maintained [1] is pretty active. YMMV, I'm not a maintainer or anything..
[1] https://github.com/publicsuffix/list
It made sense back before Twitter had one of their own. And I know that some people use it to get link analytics. I've also occasionally seen it used for printed materials, to get pretty URLs that are easy to hand-type.
People also use it for malicious purposes, such as hiding malware, or disguising referral links, or otherwise trying to obfuscate where a link is going. (Note: I'm not calling referral links malicious, I'm calling disguised referral links malicious.)
Other than printed materials (which need pretty URLs and thus often need a dedicated first-party URL shortener) and analytics, what are people using third-party URL shorteners for today?
I also use it to hedge my risks from using SaaS. For my org, we host some things that we offer to the public on different services. Sometimes a vendor doesn't work out. We use our shortened URLs in public communications, and I can redirect them to our new service if we need to switch. It was a way to address my discomfort with URLs that break too easily when you host on 3rd party services.
https://wiki.archiveteam.org/index.php?title=URLTeam
I think a conservative estimate of link shorteners usage is that 99% of cases are used by bad actors, and if they would all die out my life would be a lot easier. But, every week it seems some new one pops up and theres a new wave of spam to deal with.
At least thanks to this post I can add a new one to the filters before a wave of spam, so yay?
Dead Comment
Also places where the cost to change the url is expensive, bus shelter adverts etc.
A link shortener doesn't solve any of those problems
“q.ly/abc” or “website.com/20240229/my-blog-title-here/1”
But as some have mentioned, QR codes have easily replaced URL shorteners for this purpose anyways.
Also I guess for the very small number of people without a device that can’t read QR codes, a shortened url would help them engage
e.g. https://mutraction.dev/link/pv
But most public ones don’t let you change the redirect.
In a sense, Google Search is even more evil because they change the destination link on-click. So hovering on a search result link doesn't show you the true destination.
(Also note the difference between the length of the "Advantages" and "Disadvantages" sections)
Right now any phone should be able to parse a url if it can read the type, and so what is the point of QR besides the ubiquity?
Be careful relying on Stripe to prevent these users. Next they will start using stolen credit cards to create accounts then you will face disputes. If you get too many, Stripe will prevent you from processing payments.
About a year ago, I launched a service called Link Shield. It's an API that returns risk scores (0-100) on URLs. It uses AI and other services to score if a URL is malicious. Check it out and let me know if you would be interested in trying it linkshieldapi.com/
Integrating a new system requires some effort. And there are some systems, like the one in question here, where there's a real cap on how much value they could possibly provide for me, even if they're perfect.
If I can't see whether the pricing falls in that range before I need to sign up, I'm just not going to seriously consider it for most services.
I don't want to provide my data to multiple services just to be able to compare their prices and find out which one I'm actually gonna use. At first this will lead to countless automated mails from all those "founders" asking why I haven't started paying yet, and if I'm unlucky my credentials end up on haveibeenpwned.com…
I would not use something like this to send my customer data to the thing to check a link, but if it was something that could be self hosted on my vps and a script to attach wordpress chat system to check with it - maybe..
But with no pricing showing, I am assuming I can't afford it anyhow.
Established players like bitly and tinyurl didn't have all the resources to deal with the problem when they started out either, and they arguably still don't, yet they get favored by the antivirus vendors and "safe"search blacklists, since they're well-known services. It doesn't seem fair.
Is this really the way it should be? I wonder if they could've explained the situation to the antivirus vendors: The site itself doesn't host malware and doesn't allow the discovery of said malware through its service. It requires a user to receive an exact URL, just like they could've received any other link, and the blocklists should operate on what's hidden behind it instead of the redirect in front. Maybe y.gy could've been hooked into the safesearch API to automatically nuke any URLs blacklisted already by them, or another antivirus vendor.