EasyList is in trouble and so are many ad blockers

A great opportunity right now for CloudFlare to win some goodwill and PR by helping out EasyList for free right now.

But what about simply enable a firewall and show captcha or similar if the origin IP is from India and requesting that URL until the situation is under control? I did that with the free plan recently in CloudFlare in a similar situation and it worked perfectly (of course on a much smaller scale).

rvnx · 3 years ago

These apps behind cannot render the captcha, as the fetch is happening in the background.

However what you can do is match the user-agents, and return a global/catch-all adblocking rule that blocks all the content of all the pages (by blocking the body element).

The app developers are going to notice the issue very fast (because users are reporting the problem), and mirroring the lists or adding a cache is immediately going to be their priority.

Bonus: I think some browsers and extensions can execute JavaScript in adblocking rules; https://help.eyeo.com/adblockplus/snippet-filters-tutorial

(which is essentially re-using a gigantic XSS in order to notify the user)

hrbf · 3 years ago

Generally, I like the idea with the user agents filtering and “block everything” rule. No need for geoblocking. Insert a comment about why this is happening and ask for it to be changed.

However, as we’re living in the real world and the authors of the respective browsers strike me as lazy or uninterested, I also bet all that would change is the user agent.

charcircuit · 3 years ago

I think the idea was to block users without technically consuming bandwidth. A captcha is equivalent to blocking.

Deleted Comment

jannyfer · 3 years ago

Blocking all page content to knowingly cause unintended behavior… I wonder if this can be considered criminal.

I read that poisoning your own lunch to catch a workplace fridge thief could be considered assault.

EDIT: here’s what I read. https://law.stackexchange.com/questions/966/can-one-be-liabl...

Imagine, say, you update the list to block all URLs, and it impacts some municipal government worker’s ability to update some emergency alert service and causes hundreds of people to be permanently injured.

GekkePrutser · 3 years ago

True but I bet 99% of CloudFlare's income comes from companies that wish to see EasyList die in a fire. I'm pretty sure this would factor into their strict enforcement of the 'rules'. I mean, this is something between github and CloudFlare right? And github sure hosts a ton of other .txt files and other stuff that's not 'web content'. They don't enforce it so strictly with other sites.

Still, I'm sure the 'community' can figure out how to keep something like this online. I'd be happy to pony up some cash for decent hosting and I'm sure many would be. If that doesn't work out, something like ipfs, a torrent or whatever.

winstonprivacy · 3 years ago

Correct. And let's not forget that the company which owns them would also like to see EasyList die in a fire.

jgrahamc · 3 years ago

Looks like it's fast to download now.

jgrahamc · 3 years ago

I am following up internally. Looks like there's a combination of this data not being cached, our systems thinking a DDoS was happening (which it sort of was). But getting the full story now.

Yeri · 3 years ago

Seems like MP says it’s fixed: https://twitter.com/tuinslak/status/1583016022491435009?s=61...

Deleted Comment

anigbrowl · 3 years ago

I can't understand their argument that a text file 'isn't a web content'; seems like a bullshit excuse.

r3trohack3r · 3 years ago

This doesn’t sound like bullshit to me. Serving a static text file that is primarily used by applications is not in line with their terms of service.

Cloudflare provides a significant service to the free and open web by subsidizing the hosting costs of static content for websites. They give that away for free under what appears to be reasonable terms.

cvwright · 3 years ago

Does not inspire confidence in Cloudflare, that’s for sure.

jacooper · 3 years ago

Maybe if they created a web page for easylist and then hosted that + the lists directly on CF pages, maybe that would considered as web content?

capitalsigma · 3 years ago

It probably means that their DDoS protection needs to use JS to get some trust signals

Deleted Comment

noja · 3 years ago

Web content is for consumption by people.

unknownaccount · 3 years ago

Just tack on a .html file extension and add a <html> tag at top and bottom…problem solved

thayne · 3 years ago

So does this mean any site with a security.txt file is violating cloudflares ToS?

tatpacc · 3 years ago

Pwned Passwords project by Troy Hunt is served by CloudFlare cache. I don't know scale of bandwidth usage by Pwned Passwords. But CloudFlare can definitely make the similar arrangement here too.

chemmail · 3 years ago

This is a bit different though. You are basically taking away a main revenue stream from websites, your main clients. That sounds like bad optics for them.

corobo · 3 years ago

Wouldn't their R2 service tick all the boxes for this one?

https://developers.cloudflare.com/r2/platform/pricing/

22c · 3 years ago

Sounds like they'd probably be in for at least $500/mo on this which doesn't seem like a lot if you're serving the amount of data EasyList is doing, but is a lot if your previous hosting costs were "free".

bluehatbrit · 3 years ago

Most requests will be in the background or in Cron jobs. Captcha wouldn't be possible in those situations as it would never be seen by anyone.

Nextgrid · 3 years ago

I’m not sure a captcha would help though. These aren’t intentional attack requests, they’re “legitimate” requests by a clueless developer’s app that happened to get popular.

They just need to serve either an empty response or an intentionally broken rule to break the misbehaving browser and force its developers to fix it.

AnonC · 3 years ago

> EasyList is hosted on Github and proxied with CloudFlare. Unfortunately, CloudFlare does not allow non-enterprise users use that much traffic, and now all requests to the EasyList file are getting throttled.

> EasyList tried to reach out to CloudFlare support, but the latter said they could not help. Moreover, serving EasyList actually may violate the CloudFlare ToS.

Seeing the comments from Cloudflare here, looks like the HN machine has yet again worked its magic to get appropriate attention!

bergenty · 3 years ago

A captcha for all 600 million internet users seems like overkill. Maybe a smaller subnet range.

metalliqaz · 3 years ago

that would break everyone in India not using one of those broken browsers

iforgotpassword · 3 years ago

They are already serving access denied replies, so I assume they can identify the browsers via user agent or similar?

If so, returning a bogus file that blocks everything and adding a comment in that list asking the developers to use caching or mirroring the file should be fine.

I wonder if those browsers honor the list when fetching the update though. Would be awesome if you could just add easylist and lock out further requests right on the device.

acdha · 3 years ago

Everyone in the world is impacted if the site goes down under load. Changing that to everyone in a particular country (perhaps with a given user agent if the free plan allows expressions) would still be an improvement even if other work is needed.

I'm confused about the ToS comment by Cloudflare. The txt is on a website so it is a web content?

So robots.txt is not supported by Cloudflare to cache/proxy it? That would be a weird regulation. And I bet everyone violates the Cloudflare ToS then.

tyingq · 3 years ago

It's from this tos page: https://www.cloudflare.com/terms/

2.8 Limitation on Serving Non-HTML Content

...Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately...

A huge text/plain artifact, requested often, would seem to fall into that category of "disproportionate percentage" compared to text/html served.

ignoramous · 3 years ago

This limitation apparently doesn't apply to R2 / Workers [0].

May be EasyList could host them there? That's what we do [1] (and the dashboards show 400TB+ per mo [2], likely rigged by the traffic between Workers and Cloudflare Cache).

[0] https://news.ycombinator.com/item?id=20791660

[1] https://news.ycombinator.com/item?id=30034547

[2] https://nitter.net/rethinkdns/status/1546232186554417152

tomschwiha · 3 years ago

Cloudflare can decide whom they want to do business with. But a plain text file is in my opinion sort of HTML. At least it is not "non-html" content. A .pdf file would be non-HTML content.

What else is important to note that the client is being abused and not the client abusing the service. That should be taken into consideration, when deciding if someone is breaking the ToS.

spatley · 3 years ago

My best guess is that CloudFlare wrote this to prevent folks from serving big binary files like photo, music, or video and this txt file case was an unintended condition that happens to work to CloudFlare's advantage.

text/plain though is decidedly not text/html and I would expect CloudFlare to potentially do some on-the- fly optimizations that are aware of the structure of an html file that save terabytes a day at their scale.

Slix · 3 years ago

This doesn't sound right to me. Cloudflare also protects web APIs. This text file is an extremely simple web API, but it is still a web API.

bornfreddy · 3 years ago

Sounds like it is meant to deal with multimedia mostly?

But anyway, just rename .txt to .html and you're done.

tomrod · 3 years ago

If I can read it in Lynx, it is web content.

arendtio · 3 years ago

From a legal perspective I can understand such a wording, but I wonder why an engineer simply tells a (non-paying) customer that he violates the ToS, without thinking about it.

I mean, one could simply wrap the content in a HTML body and change the extension, but that would actually increase the data load for no good reason. So it is complete non-sense to complain about txt files being served.

layer8 · 3 years ago

The solution seems simple, just wrap it in a trivial HTML envelope. Enclose it in <pre> tags if needed.

jakear · 3 years ago

Sounds like they’re just using the wrong service. R2 is designed for object storage, and has 0 egress fees. That’d be the way to go. Not sure why the support engineer didn’t mention it. The standard cloudflare web caching probably doesn’t work well for this use case for whatever reason. The price is only 0.015/GB/mo, so the ~MB(?) of list would be served in perpetuity for less than a dollar.

vel0city · 3 years ago

They're probably still getting many millions of requests a month so probably more than a dollar but even 20 million requests a month would only cost $3.60 (10 million free at first then 10 million @ $0.36/million)

I assume you probably know this but just wanting to share there are some pricing scales with R2 they're just pretty generous for a lot of things.

stavros · 3 years ago

Actually, you're right. How would this work? Is Cloudflare really willing to foot the bill of 20 TB of bandwidth per day for a small text file that costs $0 to store?

kenmacd · 3 years ago

Imagine you're trying to block a DDoS attack. If the client is downloading HTML then they likely also have JS enabled giving you a ton of options for running code on their computer to help you decide if the traffic is legitimate.

If they're downloading text you can still use the headers, and some tricks around redirects, but overall you have far less data on which to decide.

tomudding · 3 years ago

Cloudflare caches robots.txt by default when proxied (the only .txt-file that they automatically cache), for all other content the following from their ToS probably applies:

> Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately as part of a Paid Service or expressly allowed under our Supplemental Terms for a specific Service.

We will never know the reasoning of the support agent who replied to the EasyList maintainers, but I can imagine that it is indeed disproportionate for EasyList.

I really hope that Cloudflare actually sees that they are making a wrong decision here and actually help the EasyList maintainers.

tyingq · 3 years ago

The TOS isn't that you can't serve plain text, it's that it shouldn't be disproportionate in volume to the cached html served.

thaumasiotes · 3 years ago

What's the difficulty supposed to be? Serve the same thing with a different MIME type and you're in compliance.

webstrand · 3 years ago

I guess they just need to serve it with a minimal html shell

andiareso · 3 years ago

Yeah... That just doesn't seem right. All web content is text...

naikrovek · 3 years ago

text/html is not text/plain but that doesn't matter: it's not a technical limitation that caused cloudflare to draw this line.

it's cloudflare deciding to protect "web content" and not videos or .iso images or other things that normally are not commonly served while you browse a contemporary website and read HTML.

jonny_eh · 3 years ago

> All web content is text...

It's all 1s and 0s too

RunSet · 3 years ago

> The txt is on a website so it is a web content?

Even more wtf- the file extension determines the file content?

JimWestergren · 3 years ago

bombcar · 3 years ago

If I recall correctly there was some image on wikipedia that was getting billions of downloads a day or something, all from India, because some smart phone had made it a default "hello" image and hot linked it.

Unfortunately, I can't find a reference to it anymore.

alphabet9000 · 3 years ago

https://www.vice.com/en/article/qjpmyx/why-is-this-flower-on...

robin_reala · 3 years ago

Not that you’d do it, but the temptation there is always to repoint your real application to a different URL and change the original image to something subtly NSFW.

Raed667 · 3 years ago

I was debugging a similar issue where a small marketplace run by a friend was being scrapped and the listings were being used to make a competing marketplace look more active than it actually was.

The thing is, they didn't host the scrapped images themselves, they just hot-linked everything.

So through a little nginx config, we turned their entire homepage to an ad for my friend's platform :)

neilv · 3 years ago

In case anyone is inspired to do related things, I made a mistake once (troubling and embarrassing), which I'll mention in case it helps someone else avoid my mistake...

In earlier days of the Web, someone appeared to have hotlinked a photo from a page of mine, as their avatar/signature in some Web forum for another country, and it was eating up way too much bandwidth for my little site.

I handled this in an annoyed and ill-informed way, but which I thought was good-natured, and years later realized it was potentially harmful. I'd changed the URL to serve a new version of the image, to which I'd overlaid text with progressive political slogans relevant to their country. (Thinking I was making a statement to the person about the political issues, and that it would be just a small joke for them, before they changed their avatar/signature to stop hotlinking my bandwidth.) Years later, once I had a bit more understanding of the world, I realized that was very ignorant and cavalier of me, and might've caused serious government or social trouble for the person.

Sensitized by my earlier mistake, I could imagine ways that a subtly NSFW image could cause problems, especially in the workplace, and in some other cultures/countries.

kevingadd · 3 years ago

A startup I used to work for had a horror story from before I started, where a small .png file had been accidentally hotlinked from a third party server. The png showed up on a significant % of users' custom homepages (think myspace, etc). At some point the person operating the server decided that instead of emailing someone or blocking the requests, they'd serve goatse up to a bunch of teenagers and housemoms. Mildly hilarious depending on your perspective, I guess?

lifthrasiir · 3 years ago

This once happened in a particular South Korean news website where it shamelessly stole and hot-linked to a JavaScript file in a third-party website. The domain owner responded it by replacing the file, and the website in question had a warning message and tilted [1] for a while.

[1] https://twitter.com/dohoons/status/880347968800411648

o_m · 3 years ago

Or something less malicious, like "Donate to Wikipedia", or some other organization.

timbit42 · 3 years ago

Been there. Done that. Someone had used our image in their phpBB signature. The hits slowed quite quickly.

bawolff · 3 years ago

Wikipedia is anyone can edit. It wouldn't be the first time someone did something like that.

magic_hamster · 3 years ago

...or a nice steak.

Dead Comment

e12e · 3 years ago

One could take some inspiration and simply rotate the image(s) - like in the case of wifi leeches:

https://www.ex-parrot.com/pete/upside-down-ternet.html

porbelm · 3 years ago

Oh if something /ever/ begged to be goatse'd

taoufix · 3 years ago

You do realize it will be displayed to children?

radicality · 3 years ago

The discussion on wikimedia phabricator: https://phabricator.wikimedia.org/T273741

I worked on an ad-blocker a few months ago. I made the decision to have the filter-list files hosted on our own domain and CDN (similar to what Adguard does with their filters.adtidy.org).

This was done for 2 reasons:

1- Avoid scenarios like this where you ship code (extension in this case) that is hard to update. Then make that code depend on external resources outside of your control.

2- Leak our users' IP addresses to each random hosting provider.

So the solution was simple: Run a CRON once a day then host the files ourselves. Pretty happy with that decision now.

chrismeller · 3 years ago

Except neither of those would help in this case. They’re already using their own domain name, and it’s unclear how they would even build their own CDN since they’re using that scale of bandwidth - AdGuard said they’re still pushing 100tb of access denied pages a month for their similar case. That is a LOT of bandwidth just for access denied messages.

lolinder · 3 years ago

Their point isn't that EasyList could have done anything differently, their point is that they are glad that they didn't decide to rely on others' infrastructure for their own ad blocker, because that makes them resilient against the fallout from this and similar.

denton-scratch · 3 years ago

Access Denied is an HTTP status code, not a page that you serve. 100Tb per month of status codes suggests something like 30 trillion requests per month. Is that possible?

warbeforepeace · 3 years ago

Make it a web request so they can get a user agent. block the impacting user agents.

sershe · 3 years ago

Since they added "Access denied" for misbehaving browsers, can they instead serve them some sort of bad response that will "surface" issue to the users? Depending on what would work better and cost less... (1) a small list that would block major legitimate sites. Whoops, the browser is unusable, now users complain to the developer to fix the issue, or abandon it. (2) "hang" the request if the browser loads the list synchronously; blocking UI thread is a hallmark of a bad developer, so they might (3) stream /dev/zero. Might be expensive; maybe serve a compressed zip-bomb if HTTP spec allows and/or browsers will process it?

Too much work. Just blackhole all requests originating from India in the firewall, as a start.

d3nj4l · 3 years ago

Right, block roughly 10% of the world. Great idea!

rc_mob · 3 years ago

Phew. Is just a bandwidth issue. This goofy title made me think advertisers found a way around ad blockers.

shitlord · 3 years ago

In a way, the advertisers did find a way around ad blockers.

Google built an entire browser and used Manifest V3 as an excuse to cripple ad blockers.

Companies are also paying influencers, twitch streamers, and YouTubers to promote their products in a way that conventional ad blockers can't prevent.

NaturalPhallacy · 3 years ago

In case anyone reading hasn't heard of it: SponsorBlock for YouTube - Skip Sponsorships - https://chrome.google.com/webstore/detail/sponsorblock-for-y...

yellowapple · 3 years ago

> Companies are also paying influencers, twitch streamers, and YouTubers to promote their products in a way that conventional ad blockers can't prevent.

Which I'm okay with in the same sense that I'm okay with newspaper/magazine ads or billboards or TV/radio commercials: they're annoying, but easy to ignore compared to online ads chewing up CPU time and battery life while actively violating one's privacy.

matheusmoreira · 3 years ago

> in a way that conventional ad blockers can't prevent

Yet. One day someone will create an ad blocker with machine learning that "sees" the ads and deletes them in real time. Should work on all content types, even on augmented reality.

chlorion · 3 years ago

I am on Chromium and using the manifest V3 version of Ublock right now and I have noticed no difference between it and Firefox with regular Ublock.

The very interesting thing is that none of Google's ads have ever made it through this new version of Ublock for me.

laundermaf · 3 years ago

This issue caused CF to irreversibly ban them though, so it's not "just a bandwidth issue" anymore.

> Based on the URL that are being requested at Cloudflare, it violates our ToS as well. All the requests are txt file extension which isn't a web content

> you cannot use Cloudflare to cache or proxy the request to these text files

jsty · 3 years ago

> This issue caused CF to irreversibly ban them though

Do you have a source for that? The article only mentions them being throttled + the screenshot with the support engineer saying they seem to be breaking the ToS and asking them politely to move back into compliance.

nicce · 3 years ago

Well, CF is just one service provider. There are bigger issues if they have already such a monopoly that their decisions kill projects worlwide.

ziml77 · 3 years ago

Where did you get that they were irreversibly banned? Or banned at all for that matter?

It is a bandwidth issue for a volunteer-run project.

Joel_Mckay · 3 years ago

Rate-limit the GeoIP list for the affected areas to drop if more than 20% of active traffic. i.e. the service outages get co-located only with the problem users areas.

Also, when doing auto-updates: always add a chaotic delay offset 1 to 180 minutes to distribute the traffic loads. Even in an office with 16 hosts or more this is recommended practice to prevent cheap routers hitting limits. Another interesting trend, is magnet/torrent being used for cryptographic-signed commercial package file distribution.

Free API keys are sometimes a necessary evil... as sometimes service abuse is not accidental.

codalan · 3 years ago

That would only work if they had an API; AFAICT, they're just hosting a file.

At this point, they might be better off coordinating with the other major adblocker providers and just outright move the file elsewhere. Breaking other people's garbage code is better than breaking yourself trying to fix it. Especially on a budget of $0.00.

If the defective code for the browsers are in public repos, it might also be more effective for someone to just fork the code, fix the issue (i.e. only download this file once a month, instead of every startup), and at least give the maintainers a chance to merge the fix back in.

It is very common to see API keys in urls for access to what are essentially flat files. Thus, fairly trivial to change from:

https://127.0.0.1/file.csv

https://127.0.0.1/file.csv?apikey=abc123

This could allow client specific quotas, and easy adoption with maintained projects in minutes. Thus, defective and out-of-maintenance projects would need manually updated or get a 404.

zaarn · 3 years ago

Moving the file elsewhere won't fix it. They are serving terabytes of traffic on Access Denied, it won't go away if that changes to "Not Found" instead, the developers seem already entirely ready to ignore their adblocker just not working.

therealmarv · 3 years ago

Bittorrent, switch in long-term to that. Not saying every end-user should be a seeder but there is big bittorrent community out there and everyone could help a little bit.

Other options:

- A kind of mirror network (it only needs to keep sure that integrity can be checked, maybe with a public key)

- And while doing that why not also support compression (why not? only devs need to read it and they can run easily a decompression command), every bit saved would help.

> Bittorrent, switch in long-term to that.

S3 buckets in IAD with <5GB blobs can double-up as bit-torrent seeders.

I'd imagine, some tech IPFS/Filecoin/Sia might come in handy, too, but unsure of how healthy most of these web3 projects are right now.

There's also fosstorrents.com that help seed projects.

srvmshr · 3 years ago

Is IPFS strictly Web3?

Sure it gained traction around Blockchain, crypto DeFI - but the storage technology is ELI5 a massive distributed storage.

I could hazard a guess that in terms of philosophy its closer to BitTorrent than Blockchain

Denvercoder9 · 3 years ago

Bittorrent won't work, as this list is commonly downloaded by heavily sandboxed browser extensions that can't access torrents.

pmlnr · 3 years ago

Oh, but yes every user should be a seeder. Why not?

Mobile connections are usually metered. I'd still seed it 24/7 on my home connection though.

swinglock · 3 years ago

Wouldn't need all users to, and many are mobile or on metered connections.