A great opportunity right now for CloudFlare to win some goodwill and PR by helping out EasyList for free right now.
But what about simply enable a firewall and show captcha or similar if the origin IP is from India and requesting that URL until the situation is under control? I did that with the free plan recently in CloudFlare in a similar situation and it worked perfectly (of course on a much smaller scale).
These apps behind cannot render the captcha, as the fetch is happening in the background.
However what you can do is match the user-agents, and return a global/catch-all adblocking rule that blocks all the content of all the pages (by blocking the body element).
The app developers are going to notice the issue very fast (because users are reporting the problem), and mirroring the lists or adding a cache is immediately going to be their priority.
Generally, I like the idea with the user agents filtering and “block everything” rule. No need for geoblocking. Insert a comment about why this is happening and ask for it to be changed.
However, as we’re living in the real world and the authors of the respective browsers strike me as lazy or uninterested, I also bet all that would change is the user agent.
Imagine, say, you update the list to block all URLs, and it impacts some municipal government worker’s ability to update some emergency alert service and causes hundreds of people to be permanently injured.
True but I bet 99% of CloudFlare's income comes from companies that wish to see EasyList die in a fire. I'm pretty sure this would factor into their strict enforcement of the 'rules'. I mean, this is something between github and CloudFlare right? And github sure hosts a ton of other .txt files and other stuff that's not 'web content'. They don't enforce it so strictly with other sites.
Still, I'm sure the 'community' can figure out how to keep something like this online. I'd be happy to pony up some cash for decent hosting and I'm sure many would be. If that doesn't work out, something like ipfs, a torrent or whatever.
I am following up internally. Looks like there's a combination of this data not being cached, our systems thinking a DDoS was happening (which it sort of was). But getting the full story now.
This doesn’t sound like bullshit to me. Serving a static text file that is primarily used by applications is not in line with their terms of service.
Cloudflare provides a significant service to the free and open web by subsidizing the hosting costs of static content for websites. They give that away for free under what appears to be reasonable terms.
Pwned Passwords project by Troy Hunt is served by CloudFlare cache. I don't know scale of bandwidth usage by Pwned Passwords. But CloudFlare can definitely make the similar arrangement here too.
This is a bit different though. You are basically taking away a main revenue stream from websites, your main clients. That sounds like bad optics for them.
Sounds like they'd probably be in for at least $500/mo on this which doesn't seem like a lot if you're serving the amount of data EasyList is doing, but is a lot if your previous hosting costs were "free".
I’m not sure a captcha would help though. These aren’t intentional attack requests, they’re “legitimate” requests by a clueless developer’s app that happened to get popular.
They just need to serve either an empty response or an intentionally broken rule to break the misbehaving browser and force its developers to fix it.
> EasyList is hosted on Github and proxied with CloudFlare. Unfortunately, CloudFlare does not allow non-enterprise users use that much traffic, and now all requests to the EasyList file are getting throttled.
> EasyList tried to reach out to CloudFlare support, but the latter said they could not help. Moreover, serving EasyList actually may violate the CloudFlare ToS.
Seeing the comments from Cloudflare here, looks like the HN machine has yet again worked its magic to get appropriate attention!
They are already serving access denied replies, so I assume they can identify the browsers via user agent or similar?
If so, returning a bogus file that blocks everything and adding a comment in that list asking the developers to use caching or mirroring the file should be fine.
I wonder if those browsers honor the list when fetching the update though. Would be awesome if you could just add easylist and lock out further requests right on the device.
Everyone in the world is impacted if the site goes down under load. Changing that to everyone in a particular country (perhaps with a given user agent if the free plan allows expressions) would still be an improvement even if other work is needed.
If I recall correctly there was some image on wikipedia that was getting billions of downloads a day or something, all from India, because some smart phone had made it a default "hello" image and hot linked it.
Unfortunately, I can't find a reference to it anymore.
Not that you’d do it, but the temptation there is always to repoint your real application to a different URL and change the original image to something subtly NSFW.
I was debugging a similar issue where a small marketplace run by a friend was being scrapped and the listings were being used to make a competing marketplace look more active than it actually was.
The thing is, they didn't host the scrapped images themselves, they just hot-linked everything.
So through a little nginx config, we turned their entire homepage to an ad for my friend's platform :)
In case anyone is inspired to do related things, I made a mistake once (troubling and embarrassing), which I'll mention in case it helps someone else avoid my mistake...
In earlier days of the Web, someone appeared to have hotlinked a photo from a page of mine, as their avatar/signature in some Web forum for another country, and it was eating up way too much bandwidth for my little site.
I handled this in an annoyed and ill-informed way, but which I thought was good-natured, and years later realized it was potentially harmful. I'd changed the URL to serve a new version of the image, to which I'd overlaid text with progressive political slogans relevant to their country. (Thinking I was making a statement to the person about the political issues, and that it would be just a small joke for them, before they changed their avatar/signature to stop hotlinking my bandwidth.) Years later, once I had a bit more understanding of the world, I realized that was very ignorant and cavalier of me, and might've caused serious government or social trouble for the person.
Sensitized by my earlier mistake, I could imagine ways that a subtly NSFW image could cause problems, especially in the workplace, and in some other cultures/countries.
A startup I used to work for had a horror story from before I started, where a small .png file had been accidentally hotlinked from a third party server. The png showed up on a significant % of users' custom homepages (think myspace, etc). At some point the person operating the server decided that instead of emailing someone or blocking the requests, they'd serve goatse up to a bunch of teenagers and housemoms. Mildly hilarious depending on your perspective, I guess?
This once happened in a particular South Korean news website where it shamelessly stole and hot-linked to a JavaScript file in a third-party website. The domain owner responded it by replacing the file, and the website in question had a warning message and tilted [1] for a while.
I worked on an ad-blocker a few months ago. I made the decision to have the filter-list files hosted on our own domain and CDN (similar to what Adguard does with their filters.adtidy.org).
This was done for 2 reasons:
1- Avoid scenarios like this where you ship code (extension in this case) that is hard to update. Then make that code depend on external resources outside of your control.
2- Leak our users' IP addresses to each random hosting provider.
So the solution was simple: Run a CRON once a day then host the files ourselves. Pretty happy with that decision now.
Except neither of those would help in this case. They’re already using their own domain name, and it’s unclear how they would even build their own CDN since they’re using that scale of bandwidth - AdGuard said they’re still pushing 100tb of access denied pages a month for their similar case. That is a LOT of bandwidth just for access denied messages.
Their point isn't that EasyList could have done anything differently, their point is that they are glad that they didn't decide to rely on others' infrastructure for their own ad blocker, because that makes them resilient against the fallout from this and similar.
Access Denied is an HTTP status code, not a page that you serve. 100Tb per month of status codes suggests something like 30 trillion requests per month. Is that possible?
Since they added "Access denied" for misbehaving browsers, can they instead serve them some sort of bad response that will "surface" issue to the users? Depending on what would work better and cost less... (1) a small list that would block major legitimate sites. Whoops, the browser is unusable, now users complain to the developer to fix the issue, or abandon it. (2) "hang" the request if the browser loads the list synchronously; blocking UI thread is a hallmark of a bad developer, so they might (3) stream /dev/zero. Might be expensive; maybe serve a compressed zip-bomb if HTTP spec allows and/or browsers will process it?
...Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately...
A huge text/plain artifact, requested often, would seem to fall into that category of "disproportionate percentage" compared to text/html served.
This limitation apparently doesn't apply to R2 / Workers [0].
May be EasyList could host them there? That's what we do [1] (and the dashboards show 400TB+ per mo [2], likely rigged by the traffic between Workers and Cloudflare Cache).
Cloudflare can decide whom they want to do business with. But a plain text file is in my opinion sort of HTML. At least it is not "non-html" content. A .pdf file would be non-HTML content.
What else is important to note that the client is being abused and not the client abusing the service. That should be taken into consideration, when deciding if someone is breaking the ToS.
My best guess is that CloudFlare wrote this to prevent folks from serving big binary files like photo, music, or video and this txt file case was an unintended condition that happens to work to CloudFlare's advantage.
text/plain though is decidedly not text/html and I would expect CloudFlare to potentially do some on-the- fly optimizations that are aware of the structure of an html file that save terabytes a day at their scale.
From a legal perspective I can understand such a wording, but I wonder why an engineer simply tells a (non-paying) customer that he violates the ToS, without thinking about it.
I mean, one could simply wrap the content in a HTML body and change the extension, but that would actually increase the data load for no good reason. So it is complete non-sense to complain about txt files being served.
Sounds like they’re just using the wrong service. R2 is designed for object storage, and has 0 egress fees. That’d be the way to go. Not sure why the support engineer didn’t mention it. The standard cloudflare web caching probably doesn’t work well for this use case for whatever reason. The price is only 0.015/GB/mo, so the ~MB(?) of list would be served in perpetuity for less than a dollar.
They're probably still getting many millions of requests a month so probably more than a dollar but even 20 million requests a month would only cost $3.60 (10 million free at first then 10 million @ $0.36/million)
I assume you probably know this but just wanting to share there are some pricing scales with R2 they're just pretty generous for a lot of things.
Actually, you're right. How would this work? Is Cloudflare really willing to foot the bill of 20 TB of bandwidth per day for a small text file that costs $0 to store?
Imagine you're trying to block a DDoS attack. If the client is downloading HTML then they likely also have JS enabled giving you a ton of options for running code on their computer to help you decide if the traffic is legitimate.
If they're downloading text you can still use the headers, and some tricks around redirects, but overall you have far less data on which to decide.
Cloudflare caches robots.txt by default when proxied (the only .txt-file that they automatically cache), for all other content the following from their ToS probably applies:
> Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately as part of a Paid Service or expressly allowed under our Supplemental Terms for a specific Service.
We will never know the reasoning of the support agent who replied to the EasyList maintainers, but I can imagine that it is indeed disproportionate for EasyList.
I really hope that Cloudflare actually sees that they are making a wrong decision here and actually help the EasyList maintainers.
text/html is not text/plain but that doesn't matter: it's not a technical limitation that caused cloudflare to draw this line.
it's cloudflare deciding to protect "web content" and not videos or .iso images or other things that normally are not commonly served while you browse a contemporary website and read HTML.
> Companies are also paying influencers, twitch streamers, and YouTubers to promote their products in a way that conventional ad blockers can't prevent.
Which I'm okay with in the same sense that I'm okay with newspaper/magazine ads or billboards or TV/radio commercials: they're annoying, but easy to ignore compared to online ads chewing up CPU time and battery life while actively violating one's privacy.
> in a way that conventional ad blockers can't prevent
Yet. One day someone will create an ad blocker with machine learning that "sees" the ads and deletes them in real time. Should work on all content types, even on augmented reality.
This issue caused CF to irreversibly ban them though, so it's not "just a bandwidth issue" anymore.
> Based on the URL that are being requested at Cloudflare, it violates our ToS as well. All the requests are txt file extension which isn't a web content
> you cannot use Cloudflare to cache or proxy the request to these text files
> This issue caused CF to irreversibly ban them though
Do you have a source for that? The article only mentions them being throttled + the screenshot with the support engineer saying they seem to be breaking the ToS and asking them politely to move back into compliance.
Rate-limit the GeoIP list for the affected areas to drop if more than 20% of active traffic. i.e. the service outages get co-located only with the problem users areas.
Also, when doing auto-updates: always add a chaotic delay offset 1 to 180 minutes to distribute the traffic loads. Even in an office with 16 hosts or more this is recommended practice to prevent cheap routers hitting limits. Another interesting trend, is magnet/torrent being used for cryptographic-signed commercial package file distribution.
Free API keys are sometimes a necessary evil... as sometimes service abuse is not accidental.
That would only work if they had an API; AFAICT, they're just hosting a file.
At this point, they might be better off coordinating with the other major adblocker providers and just outright move the file elsewhere. Breaking other people's garbage code is better than breaking yourself trying to fix it. Especially on a budget of $0.00.
If the defective code for the browsers are in public repos, it might also be more effective for someone to just fork the code, fix the issue (i.e. only download this file once a month, instead of every startup), and at least give the maintainers a chance to merge the fix back in.
This could allow client specific quotas, and easy adoption with maintained projects in minutes. Thus, defective and out-of-maintenance projects would need manually updated or get a 404.
Moving the file elsewhere won't fix it. They are serving terabytes of traffic on Access Denied, it won't go away if that changes to "Not Found" instead, the developers seem already entirely ready to ignore their adblocker just not working.
Bittorrent, switch in long-term to that. Not saying every end-user should be a seeder but there is big bittorrent community out there and everyone could help a little bit.
Other options:
- A kind of mirror network (it only needs to keep sure that integrity can be checked, maybe with a public key)
- And while doing that why not also support compression (why not? only devs need to read it and they can run easily a decompression command), every bit saved would help.
But what about simply enable a firewall and show captcha or similar if the origin IP is from India and requesting that URL until the situation is under control? I did that with the free plan recently in CloudFlare in a similar situation and it worked perfectly (of course on a much smaller scale).
However what you can do is match the user-agents, and return a global/catch-all adblocking rule that blocks all the content of all the pages (by blocking the body element).
The app developers are going to notice the issue very fast (because users are reporting the problem), and mirroring the lists or adding a cache is immediately going to be their priority.
Bonus: I think some browsers and extensions can execute JavaScript in adblocking rules; https://help.eyeo.com/adblockplus/snippet-filters-tutorial
(which is essentially re-using a gigantic XSS in order to notify the user)
However, as we’re living in the real world and the authors of the respective browsers strike me as lazy or uninterested, I also bet all that would change is the user agent.
Deleted Comment
Deleted Comment
I read that poisoning your own lunch to catch a workplace fridge thief could be considered assault.
EDIT: here’s what I read. https://law.stackexchange.com/questions/966/can-one-be-liabl...
Imagine, say, you update the list to block all URLs, and it impacts some municipal government worker’s ability to update some emergency alert service and causes hundreds of people to be permanently injured.
Still, I'm sure the 'community' can figure out how to keep something like this online. I'd be happy to pony up some cash for decent hosting and I'm sure many would be. If that doesn't work out, something like ipfs, a torrent or whatever.
Deleted Comment
Cloudflare provides a significant service to the free and open web by subsidizing the hosting costs of static content for websites. They give that away for free under what appears to be reasonable terms.
Deleted Comment
https://developers.cloudflare.com/r2/platform/pricing/
They just need to serve either an empty response or an intentionally broken rule to break the misbehaving browser and force its developers to fix it.
> EasyList tried to reach out to CloudFlare support, but the latter said they could not help. Moreover, serving EasyList actually may violate the CloudFlare ToS.
Seeing the comments from Cloudflare here, looks like the HN machine has yet again worked its magic to get appropriate attention!
If so, returning a bogus file that blocks everything and adding a comment in that list asking the developers to use caching or mirroring the file should be fine.
I wonder if those browsers honor the list when fetching the update though. Would be awesome if you could just add easylist and lock out further requests right on the device.
Unfortunately, I can't find a reference to it anymore.
The thing is, they didn't host the scrapped images themselves, they just hot-linked everything.
So through a little nginx config, we turned their entire homepage to an ad for my friend's platform :)
In earlier days of the Web, someone appeared to have hotlinked a photo from a page of mine, as their avatar/signature in some Web forum for another country, and it was eating up way too much bandwidth for my little site.
I handled this in an annoyed and ill-informed way, but which I thought was good-natured, and years later realized it was potentially harmful. I'd changed the URL to serve a new version of the image, to which I'd overlaid text with progressive political slogans relevant to their country. (Thinking I was making a statement to the person about the political issues, and that it would be just a small joke for them, before they changed their avatar/signature to stop hotlinking my bandwidth.) Years later, once I had a bit more understanding of the world, I realized that was very ignorant and cavalier of me, and might've caused serious government or social trouble for the person.
Sensitized by my earlier mistake, I could imagine ways that a subtly NSFW image could cause problems, especially in the workplace, and in some other cultures/countries.
[1] https://twitter.com/dohoons/status/880347968800411648
Dead Comment
https://www.ex-parrot.com/pete/upside-down-ternet.html
Deleted Comment
This was done for 2 reasons:
1- Avoid scenarios like this where you ship code (extension in this case) that is hard to update. Then make that code depend on external resources outside of your control.
2- Leak our users' IP addresses to each random hosting provider.
So the solution was simple: Run a CRON once a day then host the files ourselves. Pretty happy with that decision now.
So robots.txt is not supported by Cloudflare to cache/proxy it? That would be a weird regulation. And I bet everyone violates the Cloudflare ToS then.
2.8 Limitation on Serving Non-HTML Content
...Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately...
A huge text/plain artifact, requested often, would seem to fall into that category of "disproportionate percentage" compared to text/html served.
May be EasyList could host them there? That's what we do [1] (and the dashboards show 400TB+ per mo [2], likely rigged by the traffic between Workers and Cloudflare Cache).
[0] https://news.ycombinator.com/item?id=20791660
[1] https://news.ycombinator.com/item?id=30034547
[2] https://nitter.net/rethinkdns/status/1546232186554417152
What else is important to note that the client is being abused and not the client abusing the service. That should be taken into consideration, when deciding if someone is breaking the ToS.
text/plain though is decidedly not text/html and I would expect CloudFlare to potentially do some on-the- fly optimizations that are aware of the structure of an html file that save terabytes a day at their scale.
But anyway, just rename .txt to .html and you're done.
I mean, one could simply wrap the content in a HTML body and change the extension, but that would actually increase the data load for no good reason. So it is complete non-sense to complain about txt files being served.
I assume you probably know this but just wanting to share there are some pricing scales with R2 they're just pretty generous for a lot of things.
If they're downloading text you can still use the headers, and some tricks around redirects, but overall you have far less data on which to decide.
> Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately as part of a Paid Service or expressly allowed under our Supplemental Terms for a specific Service.
We will never know the reasoning of the support agent who replied to the EasyList maintainers, but I can imagine that it is indeed disproportionate for EasyList.
I really hope that Cloudflare actually sees that they are making a wrong decision here and actually help the EasyList maintainers.
it's cloudflare deciding to protect "web content" and not videos or .iso images or other things that normally are not commonly served while you browse a contemporary website and read HTML.
It's all 1s and 0s too
Even more wtf- the file extension determines the file content?
Google built an entire browser and used Manifest V3 as an excuse to cripple ad blockers.
Companies are also paying influencers, twitch streamers, and YouTubers to promote their products in a way that conventional ad blockers can't prevent.
Which I'm okay with in the same sense that I'm okay with newspaper/magazine ads or billboards or TV/radio commercials: they're annoying, but easy to ignore compared to online ads chewing up CPU time and battery life while actively violating one's privacy.
Yet. One day someone will create an ad blocker with machine learning that "sees" the ads and deletes them in real time. Should work on all content types, even on augmented reality.
The very interesting thing is that none of Google's ads have ever made it through this new version of Ublock for me.
> Based on the URL that are being requested at Cloudflare, it violates our ToS as well. All the requests are txt file extension which isn't a web content
> you cannot use Cloudflare to cache or proxy the request to these text files
Do you have a source for that? The article only mentions them being throttled + the screenshot with the support engineer saying they seem to be breaking the ToS and asking them politely to move back into compliance.
Deleted Comment
Also, when doing auto-updates: always add a chaotic delay offset 1 to 180 minutes to distribute the traffic loads. Even in an office with 16 hosts or more this is recommended practice to prevent cheap routers hitting limits. Another interesting trend, is magnet/torrent being used for cryptographic-signed commercial package file distribution.
Free API keys are sometimes a necessary evil... as sometimes service abuse is not accidental.
At this point, they might be better off coordinating with the other major adblocker providers and just outright move the file elsewhere. Breaking other people's garbage code is better than breaking yourself trying to fix it. Especially on a budget of $0.00.
If the defective code for the browsers are in public repos, it might also be more effective for someone to just fork the code, fix the issue (i.e. only download this file once a month, instead of every startup), and at least give the maintainers a chance to merge the fix back in.
https://127.0.0.1/file.csv
to
https://127.0.0.1/file.csv?apikey=abc123
This could allow client specific quotas, and easy adoption with maintained projects in minutes. Thus, defective and out-of-maintenance projects would need manually updated or get a 404.
=)
Other options:
- A kind of mirror network (it only needs to keep sure that integrity can be checked, maybe with a public key)
- And while doing that why not also support compression (why not? only devs need to read it and they can run easily a decompression command), every bit saved would help.
S3 buckets in IAD with <5GB blobs can double-up as bit-torrent seeders.
I'd imagine, some tech IPFS/Filecoin/Sia might come in handy, too, but unsure of how healthy most of these web3 projects are right now.
There's also fosstorrents.com that help seed projects.
Sure it gained traction around Blockchain, crypto DeFI - but the storage technology is ELI5 a massive distributed storage.
I could hazard a guess that in terms of philosophy its closer to BitTorrent than Blockchain