Ask HN: Someone is proxy-mirroring my website, can I do anything?

Same thing happened to me and my service (https://next-episode.net) almost 2 years ago.

I wrote a HN post about it as well: https://news.ycombinator.com/item?id=26105890, but to spare you all the irrelevant details and digging in the comments for updates - here is what worked for me - you can block all their IPs, even though they may have A LOT and can change them on each call:

1) I prepared a fake URL that no legitimate user will ever visit (like website_proxying_mine.com/search?search=proxy_mirroring_hacker_tag)

2) I loaded that URL like 30 thousand times

3) from my logs, I extracted all IPs that searched for "proxy_mirroring_hacker_tag" (which, from memory, was something like 4 or 5k unique IPs)

4) I blocked all of them

After doing the above, the offending domains were showing errors for 2-3 days and then they switched to something else and left me alone.

I still go back and check them every few months or so ...

P.S. My advice is to remove their URL from your post here. This will not help with search engines picking up their domain and ranking it with your content ...

bvinc · 3 years ago

Might I suggest a spin on this: instead of blocking the IPs, consider serving up different content to those IPs.

You could make a page that shames their domain name for stealing content. You could make a redirect page that redirects people to your website. Or you could make a page with absolutely disgusting content. I think it would discourage them from playing the cat and mouse game with you and fixing it by getting new IPs.

hedora · 3 years ago

One possibility: Serve different content, but only if the user agent is a search engine scraper. Wait a bit to poison their search rankings, then block them.

beirut_bootleg · 3 years ago

I've tried this with zip bombs, but I can't tell how well it worked out.

nomel · 3 years ago

> Or you could make a page with absolutely disgusting content.

Not if you value the people who might move to the real domain.

antifa · 3 years ago

If those IPs are VPN services, you might be negatively affecting all VPN users in addition to the proxy.

sprior · 3 years ago

"Or you could make a page with absolutely disgusting content." You've never heard of Rule 34, have you...

marklit · 3 years ago

As soon as you have a few of their IPs, look them up on ipinfo.io/1.2.3.4 and you'll find they probably belong to a handful of hosting firms. You can get each firm's entire IP list on that page and add all of those CIDRs to your block list. Saves you needing to make 30K web requests.

In most countries in the western world, there are 3-4 major ISPs and this is where 99% of your legit traffic comes from. Regular people don't browse the web proxying via hosting centres as Cloudflare will treat them with suspicion on all the websites they protect.

reincoder · 3 years ago

The site seems to be hosted on OVH cloud. OP should report this to them.

https://www.ovh.com/abuse/

Found the hosting information from here: https://host.io/us.to

rexreed · 3 years ago

For 2) you mean you loaded it from the adversary's proxy site, just to clarify?

santah · 3 years ago

Yes, constructed the honeypot URL using the proxy site and called it (thousands of times) so I can get them to fetch it from my server through their IP so I can log it.

Deleted Comment

blinding-streak · 3 years ago

Side note: great idea for a website. This could be really helpful. You got a new user here.

mhlakhani · 3 years ago

I have to agree, my SO has been looking for something like this for a long time. Signing up today!

focusedone · 3 years ago

Wow, hadn't seen this before. Awesome site!

santah · 3 years ago

Thanks!

NullPrefix · 3 years ago

>4) I blocked all of them

Don't block them. Show dicks instead

otikik · 3 years ago

Once you have their IP addresses you can make them serve anything you want. Set your imagination free.

For starters: copyright-infringing material.

layer8 · 3 years ago

Unless you hold the necessary rights to the copyrighted material, that would make you a copyright infringer yourself.

chris_wot · 3 years ago

Makes me wonder if you could switch serving content based on the URLs. So they redirect back to your website. Or display images marked as copyrighted.

santah · 3 years ago

I tried but couldn't redirect back to my website as they stripped / rewrote all JS.

Deleted Comment

stanislavb · 3 years ago

Thanks for the advice. I will give a go to some of these. p.s. I can't remove the URL as the post is not editable anymore. I'm just waking up... in Australia.

DoreenMichele · 3 years ago

The mod can though, if you email him at hn@ycombinator.com.

Deleted Comment

khiqxj · 3 years ago

8chan like every forum ever has dumb moderators who dont know how to do their job / over extend their hand (and the moderation position of web forums seems to attract people with certain mental disorders that make them seek out perceived microinjustices which the definition thereof changes from day to day)

there were a bunch of sites mirroring 8chan to steal content

these were useful because they had both a simpler / lighter / better user interface (aside from images being missing), and posts / threads that were deleted would stay on the mirrors. being able to see deleted posts / threads was highly useful as the moderation on such sites tends to be utterly useless and the output of a random number generator. it was hilarious reading "zigforum" instead of "8chan" in all the posts as the mirror replaced certain words to thinly veil their operation. they even had a reply button that didnt seem to work or was just fake.

tl;dr the web is broken and only is good when "abused" by proxy/mirrors

nuccy · 3 years ago

Instead of blocking by IP, just check SERVER_NAME/HTTP_SERVER variables in your backend/web server (or even in JavaScript of the page check window.location.hostname) and in case those include anything but original hostname, redirect to the original website (or serve different content with a warning to the visitor). If you have apache2/nginx this can be easily achieved by creating a default virtualhost (which is not your website), and additionally creating explicitly your website virtualhost. Then the default virtualhost can have a proper redirect while serving any other hostname.

Those variables are populated by the browser, unless proxying server is rewring them, your web-server will be able to detect imposter and serve him/her with a redirect. If rewrites are indeed in place, then check in the frontend. Blocking by IP is the last option if nothing else works.

michaelmior · 3 years ago

As the OP mentioned, JS is stripped and URLs are being written, so I doubt either of those approaches will work.

1. Create fake url endpoint. And go to that endpoint in the adversary's website, when your server gets request, flag the ip. Do this nonstop with a script.

2. Create fake html elements and put unique strings inside. And you can search that string in search engines for finding similar fake sites on different domains.

3. Create fake html element and put all request details in encrypted format. Visit adversary's website and look for that element and flag that ip OR flag the headers.

4. Buy proxy databases, and when any user requests your webpage, check if its a proxy.

5. Instead of banning them, return fake content (fake titles and fake images etc) if proxy is detected OR the ip is flagged.

6. Don't ban the flagged ip's. She/He's gonna find another one. Make them angry and their user's angry so they give up on you.

7. Maybe write some bad words to the user on random places in the HTML when you detect flagged ip's :D So the user's will leave the site and this will reduce the SEO point of the adversary. Will be downranked.

8. Enable image hotlinking protection. Increase the cost of proxying for them.

9. Use @document CSS to hide the stuff when the URL is different.

10. Send abuse mail request to the hosting site.

11. Send abuse mail request to the domain provider.

12. Look for the flagged IPs and try to find the proxy provider. If you find, send mail to them too.

Edit: More ideas sparkled in my mind when I was in toilet:

1. Create fake big css files (10MB etc). And repeatedly download that from the adversary's website. This should cost them too much money on proxies.

2. When you detect proxy, return too big fake HTML files (10GB) etc. That could crash their server if they load the HTML into the memory when parsing.

mkoryak · 3 years ago

I like how you think. These are all great ideas!

Reminds me of a time some real estate website hotlinked a ton of images from my website. After I asked them to stop and they ignored me I added an nginx rewrite rule to send them a bunch of pictures of houses that were on fire.

For some reason they stopped using my website as their image host after that.

smaudet · 3 years ago

Is the primary motivator to do this?

I'm curious if they are stealing anything else, e.g. are they selling ads/tracking, do they replace order forms with their own...

spmurrayzzz · 3 years ago

Signal boosting suggestion #1 here. Great idea.

Additionally if they decide to blackhole the fake/honeypot url, since you mentioned they pass along the user agent, you could mixin some token in a randomized user agent string that your scraper uses so that you could duck-type the request on your end to signal when to capture the egress ip.

pwdisswordfish9 · 3 years ago

#5 and #6 are key. Don't try to block them directly, just get them delisted. When you've worked out a way to identify which requests belong to the scammer, feed them content that the search engines and their ad partners will penalize them for.

davidrupp · 3 years ago

Bummed that I can upvote this only once. Excellent work.

graderjs · 3 years ago

LOL! Thank you for the laugh. This is great.

egberts1 · 3 years ago

What a sure-fire way to toast them! Kudos!

DoctorOW · 3 years ago

In my search for this I found @document isn't super supported [0] I suggested something like:

    a[href*= "sukuns.us.to"] {
     display:none; 
    }

Then use SRI to enforce that CSS.

[0]: https://caniuse.com/mdn-css_at-rules_document

ChrisMarshallNY · 3 years ago

How about something like...

    body[href*= "<OFFENDING URL>"] {
        background-image: url("http://goatse..."); 
    }

Ala: http://ascii.textfiles.com/archives/1011

JohnAaronNelson · 3 years ago

Seems like it would be fairly easy to use this pseudo selector, and apply it to every element on the page. Making them show up as empty to the user

sublinear · 3 years ago

I know this is just a game that never ends, but if they're already rewriting the HTTP requests what's stopping them from rewriting the page contents in the response?

SRI is for the situation where a CDN has been poisoned, not this.

ignoramous · 3 years ago

If they're rewriting html, I guess sanitizing css won't be beyond them.

blantonl · 3 years ago

Shadow nefarious techniques are the best. Don't give them clear indications that there is a problem.

For example, I had an app developer start stealing API content, so once I determined points to key from them, instead of blocking them I simply randomized the API content details returned to their user's apps.

Hey, API calls look good, the app looks like it is working, no problem right? Well, the users of the app were pissed and the negative reviews rolled in. It was glorious.

kokekolo · 3 years ago

Serious question — is there a way to defend from this "stealing the API" thing? E.g. building an authentication of some sort and then including a key with your app?

LinuxBender · 3 years ago

These are the best ideas, especially SEO poisoning and alternate images. If their point is to steal content and rankings then poisoning the well should discourage this in the future. I suspect their actual goal is to have a low-effort high SEO site to abuse as a watering hole for phishing attacks.

As a side note, their domain is linked in this thread so they are seeing HN in their access logs and probably reading this. It should make for an interesting arms race. Or red/blue team event.

IMSAI8080 · 3 years ago

They said the attacker was passing through the client's user agent. If they get a user agent that is GoogleBot, they could check if the requesting IP is actually a valid Google data centre (there is a published list of IPs). If the IP is not Google directly, they could return a blank page therefore causing Google to index nothing through the mirrored site.

eloff · 3 years ago

Seems like a good use case for a zip bomb. Return some tiny gzipped content that expands to 1gb.

christophilus · 3 years ago

Yeah. Their proxy is parsing the HTML and stripping it / modifying it, so they're obviously unzipping the responses on their servers. Create the honeypot endpoint, and if you get a request from that endpoint, reply with a zip bomb.

Then, write a little script that repeatedly hits that honeypot URL. I quite like this idea.

spiffytech · 3 years ago

> 5. Instead of banning them, return fake content (fake titles and fake images etc) if proxy is detected OR the ip is flagged.

> 6. Don't ban the flagged ip's. She/He's gonna find another one. Make them angry and their user's angry so they give up on you.

There's a popular blog that no longer gets linked on HN.

The author didn't like the discussions HN had around his writing, so any visitors with HN as the referer are shown goatse, a notorious upsetting image, instead of the blog content.

mschuster91 · 3 years ago

Goatse? I assume you're referring to jwz - that blog shows a testicle in an egg cup if it sees a HN referrer.

GTP · 3 years ago

Out of curiosity, which blog are you talking about?

someweirdperson · 3 years ago

Does anyone not have their referer header supressed or faked?

aliswe · 3 years ago

Why return big files when you can return small files at excruciatingly slow speeds? modems are hot again!

luch · 3 years ago

that's probably the best advice. Instead of denying the proxy, just make it shitty to use for the end-user.

dspillett · 3 years ago

> Maybe write some bad words to the user on random places in the HTML

> Create fake big css files (10MB etc). And repeatedly download that from the adversary's website. This should cost them too much money on proxies.

Be careful when doing things like this, including the shock image option mentioned in other comments, as then it could become an arsehole race with them trying to DoS your site in retribution. Then again, going through more official channels could also get the same reaction, so…

> When you detect proxy, return too big fake HTML files (10GB) etc. That could crash their server if they load the HTML into the memory when parsing.

Make sure you are setup to always compress outgoing content, so that you can send GBs of mostly single-token content with MBs of bandwidth.

scarmig · 3 years ago

> Create fake big css files (10MB etc). And repeatedly download that from the adversary's website. This should cost them too much money on proxies.

Doesn't that also cost you an equal amount? You'll be serving them an equal amount that they proxy to the end user.

It's not even necessarily a cost for them; you're assuming that the host is owned and paid for by the abuser. If it's simply been hijacked (quite possible), you're just racking up costs for another victim.

MadVikingGod · 3 years ago

I remember years ago there was a way to DDoS a server by opening the connection and sending data REALLY slow, like 1 byte a second. I wonder if there is a way to do the opposite of that, where ever request is handed off to a worker which slow enough to keep the connection alive. I doubt this can scale well, but just a thought.

ambicapter · 3 years ago

Slow loris attack https://en.wikipedia.org/wiki/Slowloris_(computer_security)

macNchz · 3 years ago

The “opposite” thing you’re describing sounds like a tarpit: https://en.m.wikipedia.org/wiki/Tarpit_(networking)

zhfliz · 3 years ago

you can have some fun with nginx if you can identify on your backend whether the request is coming from a malicious source, e.g. with X-Accel-Limit-Rate

rich_sasha · 3 years ago

I read once a suggestion to serve gzipped requests which, gzipped, are tiny, but un-gzipped are enormous. Like GBs of 0s.

Not sure how you actually do it and if it serves your purpose but sounded neat.

e1g · 3 years ago

It's called a "zip bomb" (popularized by Silicon Valley [1]), and there is a good guide (and pre-generated 42kB .zip file to blow up most web clients) at https://www.bamsoftware.com/hacks/zipbomb/

[1] https://www.youtube.com/watch?v=jnDk8BcqoR0

rgrieselhuber · 3 years ago

Any recommendations on proxy database providers?

gary_0 · 3 years ago

http://iplists.firehol.org/ looks free and very comprehensive. It has whole bunch of sub-lists of IPs that are likely to be sources of abuse, including datacenters and VPNs, and it gets updated frequently. Github: https://github.com/firehol/firehol

Deleted Comment

RektBoy · 3 years ago

> 1. Create fake big css files (10MB etc). And repeatedly download that from the adversary's website. This should cost them too much money on proxies.

Nope, since anybody doing this and it has at least minimum intelligence are using residential botnets as proxies.

Deleted Comment

tgtweak · 3 years ago

Going defcon3 on proxies

You can also write some obfuscated inline JavaScript that checks the current hostname and compares to the expected one and redirects when not aligned.

aembleton · 3 years ago

They are stripping all JS.

geocrasher · 3 years ago

Passive Aggressive FTW. These are all fantastic ideas.

jwsteigerwalt · 3 years ago

I really like #9, this seems like a simple way to make your site unusable except via the methods you desire.

stanislavb · 3 years ago

Oh, I love these. I will use some of them. Many thanks!

auselen · 3 years ago

Fake 10GB html can be a zip bomb?

habibur · 3 years ago

point no.1 will do. that's the solution.