Show HN: Ladder, open source alternative to 12ft.io and 1ft.io

I got the feeling that these features should be part of a browser extension the same way as there are AdBlock extensions. I guess the reason it is not is "personal preference" of the author, or is there some technical reason?

sva_ · 2 years ago

> these features should be part of a browser extension

You mean like Bypass Paywall Clean?

https://gitlab.com/magnolia1234/bypass-paywalls-chrome-clean

Beijinger · 2 years ago

Does not work so well anymore. Better use a bookmarklet

javascript:location.href='https://archive.is/?run=1&url=%27+encodeURIComponent(documen...

NelsonMinar · 2 years ago

This works quite well and probably covers 90% of my needs. For the other 10% I still use archive.today or 12ft (RIP).

It's a shame Google won't let this addon be in the store.

johnmaguire · 2 years ago

Is there a Firefox version?

bilekas · 2 years ago

I don't know for sure, but I would imagine there are more severe actions taken against circumventing paid material (content behind a paywall) than there is for free content supplemented by advertisements..

Edit : The Digital Millennium Copyright Act (DMCA) prohibits circumventing an effective technological means of control that restricts access to a copyrighted work. I guess that would apply here.

mckirk · 2 years ago

Given how liberally the DMCA is applied, you definitely don't want to be on the wrong side of that.

I remember some guy that wrote a WoW bot and got sued using the DMCA, with the argument that his bot was circumventing the anti-cheat and the anti-cheat could be seen as a 'mechanism protecting copyrighted material', because it was safeguarding access to the game servers, the servers were generating parts of the game world (such as sounds) dynamically, and those were under copyright... Wild stuff.

nerdbert · 2 years ago

Isn't anything that can be circumvented ineffective?

Or, looking at it the other way, if you put a small sticker that says "do not do X" and even one person follows that, isn't that therefore an "effective" method?

Aaargh20318 · 2 years ago

> The Digital Millennium Copyright Act (DMCA) prohibits circumventing an effective technological means of control that restricts access to a copyrighted work. I guess that would apply here.

It doesn't if you're not in the US.

nottheengineer · 2 years ago

Good old section 1201. The EFF has been fighting it for a while, but hasn't had much success unfortunately.

overtomanu · 2 years ago

there is below extension for this purpose which I know of, I think there can be many more if we search for them

chrome and firefox extension for removing paywall: https://github.com/iamadamdev/bypass-paywalls-chrome

user764743 · 2 years ago

This extension is asking for a lot of permissions it shouldn't ask for

If you want an alternative that only requests permissions for sites with paywalls, this one is better: https://gitlab.com/magnolia1234/bypass-paywalls-firefox-clea...

Really dummy question: how do services like this work? As in, how do they bypass these paywalls?

The obvious thing is to mock Googlebot, but site owners can check that the request isn't coming from a Google-published IP and see that it's a fake, right?

Fnoord · 2 years ago

Some possible clues:

> https://github.com/kubero-dev/ladder#environment-variables

> USER_AGENT User agent to emulate Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

> X_FORWARDED_FOR IP forwarder address 66.249.66.1

> RULESET URL to a ruleset file https://raw.githubusercontent.com/kubero-dev/ladder/main/rul... or /path/to/my/rules.yaml

janejeon · 2 years ago

Oh wow... I'm surprised that's enough. When I was researching scraping protection bypass, you had to do some real crazy stuff with the browser instance + using residential IPs at a minimum...

ComputerGuru · 2 years ago

I don’t know of any off-the-shelf product that respects X_FORWARDED_FOR unless the current request ip originates from a whitelisted (or lan) address.

narinxas · 2 years ago

> site owners can check that the request isn't coming from a Google-published IP and see that it's a fake, right?

just because they can doesn't mean they will... also most "site owners" are (by this point) a completely different people than "site operators" (who I take to be the 'engineers' who indeed can check this IP things)

calflegal · 2 years ago

related: If this is how they work, why doesn't google offer a private service to allow publishers to have content indexed while still protected?

matsemann · 2 years ago

It used to be against guidelines to serve different content to google vs what users would see. Not sure if still the case, but I don't think it's in google's interest to give a result that the user actually can't access.

ktpsns · 2 years ago

nfriedly · 2 years ago

The docker image, and on the upside is fairly easy to get running. But I'm downside, I'm zero for two actually using it.

I tried a Bloomberg article which gave me a "suspicious activity from your IP, please fill out this captcha" page, only the captcha was broken and didn't load.

Then I tried a WSJ article which loaded basically the same couple of paragraphs that I could get for free, but did not load any of the rest of the content.

fyzix · 2 years ago

I'm very new to this kind of service, but do you have to write your own rulesets for each site you want to bypass? The repo doesn't seem to include much...

2cpu1container · 2 years ago

Yes, the one i provide is still pretty empty yet. I plan to build one that can be used as a starting point or as a default.

KoftaBob · 2 years ago

Create a browser book mark and set this as the URL of the bookmark:

javascript:window.location.href="https://archive.is/latest/"+location.href

It will usually open up the archived version of article without the paywall.

SigmundurM · 2 years ago

You mention 13ft as another open source inspiration. How is Ladder improving on what 13ft does?

I did try 13ft. But it misses several points.

The ladder applies custom rules to inject code. It basically modifies the origin website to remove the Paywall. It rewrites (most of) the links and assets in the origins HTML to avoid CORS Errors by routing thru the local proxy.

The ladder uses Golangs fiber/fasthttp, which is significantly faster than Python (biased opinion) .

Several small features like basic auth ...

withinboredom · 2 years ago

> The ladder uses Golangs fiber/fasthttp, which is significantly faster than Python

I have a feeling that this performance difference is practically imperceptible to regular humans. It's like optimizing CPU performance when the bottleneck is the database.

oh_sigh · 2 years ago

If the paywall is implemented in client code, then usually just disabling javascript for the site is enough to let you view it. If it is implemented server side, then there usually isn't a way around it without an account.

pacifika · 2 years ago

Open source makes it easy for the cat in the cat mouse game, right?

lucideer · 2 years ago

There's no real cat & mouse game here (yet*) - sites don't do anything to mitigate this. Sites deliberately make their content available to robots to gain SEO traction: they're left with the choice of allowing this kind of bypass or hurting their own SEO.

* I say "yet" because there could conceivably be ways to mitigate this, but afaik most would involve individual deals/contracts between every search engine & every subscription website - Google's monopoly simplifies this somewhat, but there's not much of an incentive from Google's perpsective to facilitate this at any scale.

tiagod · 2 years ago

Google publishes IP ranges for GoogleBot. You can also reverse-lookup the request IP address - the resolved domain should in turn resolve to the original address.

gumby · 2 years ago

The README says "The author does not endorse or encourage any unethical or illegal activity."

Is it actually illegal anywhere to bypass a paywall?

qingcharles · 2 years ago

Certainly in Illinois it would be a crime to violate the TOS of a website. Misdemeanor for first offense, felony for second, IIRC.

quickthrower2 · 2 years ago

Can’t be that simple. What if TOS has ridiculous shit in it. Stuff about life long servitude to the webmasters pet goldfish, for example?

Not sure about the paywalls. But it might be used for "drive by attacks" or phishing.