Readit News logoReadit News
Posted by u/stanislavb 3 years ago
Ask HN: Someone is proxy-mirroring my website, can I do anything?
Hi Hacker News community,

I'm trying to deal with a very interesting (to me) case. Someone is proxy-mirroring all content of my website under a different domain name.

- Original: https://www.saashub.com

- Abuser/Proxy-mirror: https://sukuns.us.to

My ideas of resolution:

1) Block them by IP - That doesn't work as they are rotating the IP from which the request is coming.

2) Block them by User Agent - They are duplicating the user-agent of the person making the request to sukuns.us.to

3) Add some JavaScript to redirect to the original domain-name - They are stripping all JS.

4) Use absolute URLs everywhere - they are rewriting everything www.saashub.com to their domain name.

i.e. I'm out of ideas. Any suggestions would be highly appreciated.

p.s. what is more, Bing is indexing all of SaaSHub's content under sukuns.us.to ¯\_(ツ)_/¯. I've reported a copyright infringement, but I have a feeling that it could take ages to get resolved.

santah · 3 years ago
Same thing happened to me and my service (https://next-episode.net) almost 2 years ago.

I wrote a HN post about it as well: https://news.ycombinator.com/item?id=26105890, but to spare you all the irrelevant details and digging in the comments for updates - here is what worked for me - you can block all their IPs, even though they may have A LOT and can change them on each call:

1) I prepared a fake URL that no legitimate user will ever visit (like website_proxying_mine.com/search?search=proxy_mirroring_hacker_tag)

2) I loaded that URL like 30 thousand times

3) from my logs, I extracted all IPs that searched for "proxy_mirroring_hacker_tag" (which, from memory, was something like 4 or 5k unique IPs)

4) I blocked all of them

After doing the above, the offending domains were showing errors for 2-3 days and then they switched to something else and left me alone.

I still go back and check them every few months or so ...

P.S. My advice is to remove their URL from your post here. This will not help with search engines picking up their domain and ranking it with your content ...

bvinc · 3 years ago
Might I suggest a spin on this: instead of blocking the IPs, consider serving up different content to those IPs.

You could make a page that shames their domain name for stealing content. You could make a redirect page that redirects people to your website. Or you could make a page with absolutely disgusting content. I think it would discourage them from playing the cat and mouse game with you and fixing it by getting new IPs.

hedora · 3 years ago
One possibility: Serve different content, but only if the user agent is a search engine scraper. Wait a bit to poison their search rankings, then block them.
beirut_bootleg · 3 years ago
I've tried this with zip bombs, but I can't tell how well it worked out.
nomel · 3 years ago
> Or you could make a page with absolutely disgusting content.

Not if you value the people who might move to the real domain.

antifa · 3 years ago
If those IPs are VPN services, you might be negatively affecting all VPN users in addition to the proxy.
sprior · 3 years ago
"Or you could make a page with absolutely disgusting content." You've never heard of Rule 34, have you...
marklit · 3 years ago
As soon as you have a few of their IPs, look them up on ipinfo.io/1.2.3.4 and you'll find they probably belong to a handful of hosting firms. You can get each firm's entire IP list on that page and add all of those CIDRs to your block list. Saves you needing to make 30K web requests.

In most countries in the western world, there are 3-4 major ISPs and this is where 99% of your legit traffic comes from. Regular people don't browse the web proxying via hosting centres as Cloudflare will treat them with suspicion on all the websites they protect.

reincoder · 3 years ago
The site seems to be hosted on OVH cloud. OP should report this to them.

https://www.ovh.com/abuse/

Found the hosting information from here: https://host.io/us.to

rexreed · 3 years ago
For 2) you mean you loaded it from the adversary's proxy site, just to clarify?
santah · 3 years ago
Yes, constructed the honeypot URL using the proxy site and called it (thousands of times) so I can get them to fetch it from my server through their IP so I can log it.

Deleted Comment

blinding-streak · 3 years ago
Side note: great idea for a website. This could be really helpful. You got a new user here.
mhlakhani · 3 years ago
I have to agree, my SO has been looking for something like this for a long time. Signing up today!
focusedone · 3 years ago
Wow, hadn't seen this before. Awesome site!
santah · 3 years ago
Thanks!
NullPrefix · 3 years ago
>4) I blocked all of them

Don't block them. Show dicks instead

otikik · 3 years ago
Once you have their IP addresses you can make them serve anything you want. Set your imagination free.

For starters: copyright-infringing material.

layer8 · 3 years ago
Unless you hold the necessary rights to the copyrighted material, that would make you a copyright infringer yourself.
chris_wot · 3 years ago
Makes me wonder if you could switch serving content based on the URLs. So they redirect back to your website. Or display images marked as copyrighted.
santah · 3 years ago
I tried but couldn't redirect back to my website as they stripped / rewrote all JS.

Deleted Comment

stanislavb · 3 years ago
Thanks for the advice. I will give a go to some of these. p.s. I can't remove the URL as the post is not editable anymore. I'm just waking up... in Australia.
DoreenMichele · 3 years ago
The mod can though, if you email him at hn@ycombinator.com.

Deleted Comment

khiqxj · 3 years ago
8chan like every forum ever has dumb moderators who dont know how to do their job / over extend their hand (and the moderation position of web forums seems to attract people with certain mental disorders that make them seek out perceived microinjustices which the definition thereof changes from day to day)

there were a bunch of sites mirroring 8chan to steal content

these were useful because they had both a simpler / lighter / better user interface (aside from images being missing), and posts / threads that were deleted would stay on the mirrors. being able to see deleted posts / threads was highly useful as the moderation on such sites tends to be utterly useless and the output of a random number generator. it was hilarious reading "zigforum" instead of "8chan" in all the posts as the mirror replaced certain words to thinly veil their operation. they even had a reply button that didnt seem to work or was just fake.

tl;dr the web is broken and only is good when "abused" by proxy/mirrors

nuccy · 3 years ago
Instead of blocking by IP, just check SERVER_NAME/HTTP_SERVER variables in your backend/web server (or even in JavaScript of the page check window.location.hostname) and in case those include anything but original hostname, redirect to the original website (or serve different content with a warning to the visitor). If you have apache2/nginx this can be easily achieved by creating a default virtualhost (which is not your website), and additionally creating explicitly your website virtualhost. Then the default virtualhost can have a proper redirect while serving any other hostname.

Those variables are populated by the browser, unless proxying server is rewring them, your web-server will be able to detect imposter and serve him/her with a redirect. If rewrites are indeed in place, then check in the frontend. Blocking by IP is the last option if nothing else works.

michaelmior · 3 years ago
As the OP mentioned, JS is stripped and URLs are being written, so I doubt either of those approaches will work.
musabg · 3 years ago
1. Create fake url endpoint. And go to that endpoint in the adversary's website, when your server gets request, flag the ip. Do this nonstop with a script.

2. Create fake html elements and put unique strings inside. And you can search that string in search engines for finding similar fake sites on different domains.

3. Create fake html element and put all request details in encrypted format. Visit adversary's website and look for that element and flag that ip OR flag the headers.

4. Buy proxy databases, and when any user requests your webpage, check if its a proxy.

5. Instead of banning them, return fake content (fake titles and fake images etc) if proxy is detected OR the ip is flagged.

6. Don't ban the flagged ip's. She/He's gonna find another one. Make them angry and their user's angry so they give up on you.

7. Maybe write some bad words to the user on random places in the HTML when you detect flagged ip's :D So the user's will leave the site and this will reduce the SEO point of the adversary. Will be downranked.

8. Enable image hotlinking protection. Increase the cost of proxying for them.

9. Use @document CSS to hide the stuff when the URL is different.

10. Send abuse mail request to the hosting site.

11. Send abuse mail request to the domain provider.

12. Look for the flagged IPs and try to find the proxy provider. If you find, send mail to them too.

Edit: More ideas sparkled in my mind when I was in toilet:

1. Create fake big css files (10MB etc). And repeatedly download that from the adversary's website. This should cost them too much money on proxies.

2. When you detect proxy, return too big fake HTML files (10GB) etc. That could crash their server if they load the HTML into the memory when parsing.

mkoryak · 3 years ago
I like how you think. These are all great ideas!

Reminds me of a time some real estate website hotlinked a ton of images from my website. After I asked them to stop and they ignored me I added an nginx rewrite rule to send them a bunch of pictures of houses that were on fire.

For some reason they stopped using my website as their image host after that.

smaudet · 3 years ago
Is the primary motivator to do this?

I'm curious if they are stealing anything else, e.g. are they selling ads/tracking, do they replace order forms with their own...

spmurrayzzz · 3 years ago
Signal boosting suggestion #1 here. Great idea.

Additionally if they decide to blackhole the fake/honeypot url, since you mentioned they pass along the user agent, you could mixin some token in a randomized user agent string that your scraper uses so that you could duck-type the request on your end to signal when to capture the egress ip.

pwdisswordfish9 · 3 years ago
#5 and #6 are key. Don't try to block them directly, just get them delisted. When you've worked out a way to identify which requests belong to the scammer, feed them content that the search engines and their ad partners will penalize them for.
davidrupp · 3 years ago
Bummed that I can upvote this only once. Excellent work.
graderjs · 3 years ago
LOL! Thank you for the laugh. This is great.
egberts1 · 3 years ago
What a sure-fire way to toast them! Kudos!
DoctorOW · 3 years ago
In my search for this I found @document isn't super supported [0] I suggested something like:

    a[href*= "sukuns.us.to"] {
     display:none; 
    }
Then use SRI to enforce that CSS.

[0]: https://caniuse.com/mdn-css_at-rules_document

ChrisMarshallNY · 3 years ago
How about something like...

    body[href*= "<OFFENDING URL>"] {
        background-image: url("http://goatse..."); 
    }
Ala: http://ascii.textfiles.com/archives/1011

JohnAaronNelson · 3 years ago
Seems like it would be fairly easy to use this pseudo selector, and apply it to every element on the page. Making them show up as empty to the user
sublinear · 3 years ago
I know this is just a game that never ends, but if they're already rewriting the HTTP requests what's stopping them from rewriting the page contents in the response?

SRI is for the situation where a CDN has been poisoned, not this.

ignoramous · 3 years ago
If they're rewriting html, I guess sanitizing css won't be beyond them.
blantonl · 3 years ago
Shadow nefarious techniques are the best. Don't give them clear indications that there is a problem.

For example, I had an app developer start stealing API content, so once I determined points to key from them, instead of blocking them I simply randomized the API content details returned to their user's apps.

Hey, API calls look good, the app looks like it is working, no problem right? Well, the users of the app were pissed and the negative reviews rolled in. It was glorious.

kokekolo · 3 years ago
Serious question — is there a way to defend from this "stealing the API" thing? E.g. building an authentication of some sort and then including a key with your app?
LinuxBender · 3 years ago
These are the best ideas, especially SEO poisoning and alternate images. If their point is to steal content and rankings then poisoning the well should discourage this in the future. I suspect their actual goal is to have a low-effort high SEO site to abuse as a watering hole for phishing attacks.

As a side note, their domain is linked in this thread so they are seeing HN in their access logs and probably reading this. It should make for an interesting arms race. Or red/blue team event.

IMSAI8080 · 3 years ago
They said the attacker was passing through the client's user agent. If they get a user agent that is GoogleBot, they could check if the requesting IP is actually a valid Google data centre (there is a published list of IPs). If the IP is not Google directly, they could return a blank page therefore causing Google to index nothing through the mirrored site.
eloff · 3 years ago
Seems like a good use case for a zip bomb. Return some tiny gzipped content that expands to 1gb.
christophilus · 3 years ago
Yeah. Their proxy is parsing the HTML and stripping it / modifying it, so they're obviously unzipping the responses on their servers. Create the honeypot endpoint, and if you get a request from that endpoint, reply with a zip bomb.

Then, write a little script that repeatedly hits that honeypot URL. I quite like this idea.

spiffytech · 3 years ago
> 5. Instead of banning them, return fake content (fake titles and fake images etc) if proxy is detected OR the ip is flagged.

> 6. Don't ban the flagged ip's. She/He's gonna find another one. Make them angry and their user's angry so they give up on you.

There's a popular blog that no longer gets linked on HN.

The author didn't like the discussions HN had around his writing, so any visitors with HN as the referer are shown goatse, a notorious upsetting image, instead of the blog content.

mschuster91 · 3 years ago
Goatse? I assume you're referring to jwz - that blog shows a testicle in an egg cup if it sees a HN referrer.
GTP · 3 years ago
Out of curiosity, which blog are you talking about?
someweirdperson · 3 years ago
Does anyone not have their referer header supressed or faked?
aliswe · 3 years ago
Why return big files when you can return small files at excruciatingly slow speeds? modems are hot again!
luch · 3 years ago
that's probably the best advice. Instead of denying the proxy, just make it shitty to use for the end-user.
dspillett · 3 years ago
> Maybe write some bad words to the user on random places in the HTML

> Create fake big css files (10MB etc). And repeatedly download that from the adversary's website. This should cost them too much money on proxies.

Be careful when doing things like this, including the shock image option mentioned in other comments, as then it could become an arsehole race with them trying to DoS your site in retribution. Then again, going through more official channels could also get the same reaction, so…

> When you detect proxy, return too big fake HTML files (10GB) etc. That could crash their server if they load the HTML into the memory when parsing.

Make sure you are setup to always compress outgoing content, so that you can send GBs of mostly single-token content with MBs of bandwidth.

scarmig · 3 years ago
> Create fake big css files (10MB etc). And repeatedly download that from the adversary's website. This should cost them too much money on proxies.

Doesn't that also cost you an equal amount? You'll be serving them an equal amount that they proxy to the end user.

It's not even necessarily a cost for them; you're assuming that the host is owned and paid for by the abuser. If it's simply been hijacked (quite possible), you're just racking up costs for another victim.

MadVikingGod · 3 years ago
I remember years ago there was a way to DDoS a server by opening the connection and sending data REALLY slow, like 1 byte a second. I wonder if there is a way to do the opposite of that, where ever request is handed off to a worker which slow enough to keep the connection alive. I doubt this can scale well, but just a thought.
macNchz · 3 years ago
The “opposite” thing you’re describing sounds like a tarpit: https://en.m.wikipedia.org/wiki/Tarpit_(networking)
zhfliz · 3 years ago
you can have some fun with nginx if you can identify on your backend whether the request is coming from a malicious source, e.g. with X-Accel-Limit-Rate
rich_sasha · 3 years ago
I read once a suggestion to serve gzipped requests which, gzipped, are tiny, but un-gzipped are enormous. Like GBs of 0s.

Not sure how you actually do it and if it serves your purpose but sounded neat.

e1g · 3 years ago
It's called a "zip bomb" (popularized by Silicon Valley [1]), and there is a good guide (and pre-generated 42kB .zip file to blow up most web clients) at https://www.bamsoftware.com/hacks/zipbomb/

[1] https://www.youtube.com/watch?v=jnDk8BcqoR0

rgrieselhuber · 3 years ago
Any recommendations on proxy database providers?
gary_0 · 3 years ago
http://iplists.firehol.org/ looks free and very comprehensive. It has whole bunch of sub-lists of IPs that are likely to be sources of abuse, including datacenters and VPNs, and it gets updated frequently. Github: https://github.com/firehol/firehol

Deleted Comment

RektBoy · 3 years ago
> 1. Create fake big css files (10MB etc). And repeatedly download that from the adversary's website. This should cost them too much money on proxies.

Nope, since anybody doing this and it has at least minimum intelligence are using residential botnets as proxies.

Deleted Comment

tgtweak · 3 years ago
Going defcon3 on proxies

You can also write some obfuscated inline JavaScript that checks the current hostname and compares to the expected one and redirects when not aligned.

aembleton · 3 years ago
They are stripping all JS.
geocrasher · 3 years ago
Passive Aggressive FTW. These are all fantastic ideas.
jwsteigerwalt · 3 years ago
I really like #9, this seems like a simple way to make your site unusable except via the methods you desire.
stanislavb · 3 years ago
Oh, I love these. I will use some of them. Many thanks!
auselen · 3 years ago
Fake 10GB html can be a zip bomb?
habibur · 3 years ago
point no.1 will do. that's the solution.
politelemon · 3 years ago
Add a link rel="canonical" to your pages as well, it should give engines a hint that your domain is the legit one.

https://webmasters.stackexchange.com/questions/56326/canonic...

I noticed that the other domain is hotlinking your images. So you can disable image hotlinking, by only allowing certain domains as the referers. If you block hotlinked images then the other domain will not look as good. Remember to do it for SVGs too.

https://ubiq.co/tech-blog/prevent-image-hotlinking-nginx/

Finally I also see they are using a CDN called Statically to host some assets off your domain. You can block their scrapers by user agent listed here:

http://statically.io/docs/whitelisting-statically/

stanislavb · 3 years ago
I think they are replacing all mentions of saashub.com with their domain. Also, I'm not using statically.io, that's something they are prepending in front of all images. Automatically.
CGamesPlay · 3 years ago
But Statically isn't forwarding the User-Agent of the visitor, and they publish the list of User-Agents that they use, which you can block.
matt_heimer · 3 years ago
Sometimes the replacement is done with simple pattern matching. Try different forms of encoding you domain to see if you can get through their replacement.
politelemon · 3 years ago
It's adding the CDN for some of the images but not all of them, so you'd have to cover both
halifaxbeard · 3 years ago
Setup Cloudflare on the domain and turn on “bot fight mode”.

If the TLS ciphers the client proposes for negotiation doesn’t align with the client’s User-Agent they get a CAPTCHA.

I would suspect that whoever is doing this proxy-mirroring isn’t smart enough to ensure the TLS ciphers align with the User-Agent they’re passing through.

nezirus · 3 years ago
I would agree with the above, as an easier version of TLS fingerprinting. One could also ise nginx/haproxy to extract enough TLS info, and detect requests xoming through proxy Magic string: JA3 fingerprint
strictnein · 3 years ago
This is the correct first step.

Deleted Comment

supriyo-biswas · 3 years ago
On the free tier, does bot fight mode do anything other than simply detect bots based on JavaScript detections?
JW_00000 · 3 years ago
What about a slightly alternative approach, where instead of trying to block the abuser, you try to make it clear to end users what the real website is? E.g. in your logo image, include the real domain name "saashub.com". Have some introduction text on your home page "Here at saashub.com, we compare SaaS products ...." When your images are hotlinked, replace them with text like "This is a fraudulent website, find us at saashub.com". Anything that can make it obvious to end users that they're on the wrong website when they visit the abuser's URL.

By the way, I've also reported the abuser as a phishing/fraud website through https://safebrowsing.google.com/safebrowsing/report_phish/?u...

riz_ · 3 years ago
Not sure if this would help since:

> 4) Use absolute URLs everywhere - they are rewriting everything www.saashub.com to their domain name.

lukevp · 3 years ago
Embed the welcome text in an image then!
antifa · 3 years ago
Try some things like sa(zero-width-space)ssh<b></b>ub.com
wpietri · 3 years ago
One strategy tip: don't play cat and mouse. As you've demonstrated, if you change one thing, they will figure it out and change one thing. Not only does that not work, but you are training them that it's worth trying to beat your latest change.

Instead, plot a few different changes and throw them in all at once. Preferably in a way where they will have to solve all of the changes at the same time to figure out what happened and get things working again. Also, favor changes that are harder to detect. E.g., pure IP blocks are easier to detect than tarpitting and returning fake/corrupted content. The longer their feedback loops, the more likely it is that they'll just give up and go be a parasite somewhere else.

DamnInteresting · 3 years ago
> pure IP blocks are easier to detect than tarpitting and returning fake/corrupted content

I recently had to employ such a strategy against some extremely aggressive card testers (criminals with lists of stolen credit cards who automate stuffing card info into a donation form to test which cards are still working). Instead of blocking their IPs, I started feeding them randomly generated false responses with a statistically accurate "success" rate. They ran tens of thousands of card tests over many days, and 99% of the data they collected was bogus. It amuses me to know that I polluted their data and wasted so much of their time and effort. Jerks.

wpietri · 3 years ago
This warms my heart and it's a great example of lengthening the feedback loop.
yonixw · 3 years ago
I love it, also add a randomness, there is nothing more frustrating than a problem that only reproducers sometimes!
wpietri · 3 years ago
Excellent idea!
ycommentator · 3 years ago
My networking knowledge isn't great, so apologies if this is wrong. But if it's not wrong, it could help.

FIND THE IP FOR THE DOMAIN

  PS > ping sukuns.us.to
  Pinging sukuns.us.to [45.86.61.166] with 32 bytes of data:
  Reply from 45.86.61.166: bytes=32 time=319ms TTL=39
  ...
REVERSE DNS TO FIND HOST

  https://dnschecker.org/ip-whois-lookup.php?query=45.86.61.166
Apparently it's "Dedipath".

And that WHOIS lookup gives an abuse email address:

  "Abuse contact for '45.86.60.0 - 45.86.61.255' is 'abuse@dedipath.com'"
So you could try emailing that address. They may take the site down, or hopefully more than that...

mxuribe · 3 years ago
This is not a bad idea, though i would guess that if these guys change IPs, then it will be annoying to spend your time sneding emails, etc. But, then i thought: why not automate this with some simple scripts? You have al;ready outlined your recipe, so simply automate the steps...But the more i thought of the automation around this, you need to be creful not to turn into a "spammer of sorts, constantly sending emails...certainly, you wouild be sending legitimate emails, but if they change their IPs more often, that might trigger your automatiomn more often, somewhat turning you into a mild "spammer", right? :-) I'm not suggesting you abandon your apporoach, but simply to remember to not overdo it with big scale of emails sent out. ;-)
ycommentator · 3 years ago
Aha, some more good ideas there! But you're right, there's tradeoffs and dependencies and uncertainties throughout, so it's not easy to even guess in advance what would work or be worthwhile. Plus as you say there could be negative consequences from a kind of arms-race, with the solution becoming a problem in itself.

It's not the same thing, but I'm reminded now of email in the past, when you would usually get an undeliverable message if something went wrong. But later that was almost entirely stopped - because of spam. Massive volumes of spam was sent from forged addresses, and much of it led to those replies. So that made things worse by doubling the volume, plus the innocents whose addresses had been forged got deluges of confusing undeliverable messages!

I think you're right in that changing IPs would be easy for them. But, changing hosts would be significantly more work and hassle. So if the abuse reporting worked, that could have much more of an impact...

mmcgaha · 3 years ago
Block all of the prefixes that their AS announces too: https://bgp.tools/as/35913#prefixes
RockRobotRock · 3 years ago
Abuse contacts never work. I've never had any success hounding them about malicious sites they host.
ycommentator · 3 years ago
I have almost no experience of this, and nothing recent, so I don't know. But I'm not surprised at what you say, given the amount of abusive stuff that happens online nowadays.
mobilio · 3 years ago
actually works very well when it's combined with DMCA takedown request.
trinovantes · 3 years ago
They are probably using some public cloud service so simply banning all IPs from cloud ASNs [1] will usually be enough. Downside is you're also banning any users using VPNs

[1] https://github.com/brianhama/bad-asn-list

gary_0 · 3 years ago
Another resource that can be used to check for abusive client IPs is https://github.com/firehol/firehol
stanislavb · 3 years ago
Thanks, that seems like something I could work on if I can't find a better solution. Cheers.