Readit News logoReadit News
francocanzani commented on Unlock Articles with Paywallskip   paywallskip.com/... · Posted by u/francocanzani
foobarqux · a year ago
Doesn’t seem to work on wsj, which is the only site archive.is doesn’t work on consistently anymore. So why not just use archive.is?
francocanzani · a year ago
Will have a look. It worked earlier today.
francocanzani commented on Unlock Articles with Paywallskip   paywallskip.com/... · Posted by u/francocanzani
bj-rn · a year ago
francocanzani · a year ago
Will have a look. Thanks
francocanzani commented on Unlock Articles with Paywallskip   paywallskip.com/... · Posted by u/francocanzani
nkrisc · a year ago
Do you also support skipping the paywall for services like AWS as well? That would be useful.
francocanzani · a year ago
That would take the fun out of AWS
francocanzani commented on Unlock Articles with Paywallskip   paywallskip.com/... · Posted by u/francocanzani
Ancapistani · a year ago
Is this open or shared source?

Need help?

francocanzani · a year ago
I'm really doubtful on this because most of these open source apps get banned from Github. Probably will make a dummy account to open issues and always have a copy local and in Gitlab. Should be able to clean the code a bit and share it soon.
francocanzani commented on Unlock Articles with Paywallskip   paywallskip.com/... · Posted by u/francocanzani
frankacter · a year ago
Are you considering a chrome extension to automate the process from the client perspective?
francocanzani · a year ago
Yes, is being developed and will launch this week. Just click and go, nothing fancy.
francocanzani commented on Unlock Articles with Paywallskip   paywallskip.com/... · Posted by u/francocanzani
rrr_oh_man · a year ago
What will you do when the lawyers come for you?
francocanzani · a year ago
The legal landscape surrounding this issue remains ambiguous. I've documented my analysis in the legal section of my website. Typically, the consequence is domain takedowns, which is why I proactively purchased 10 domains as a precautionary measure.

https://www.paywallskip.com/posts/legal

francocanzani commented on Unlock Articles with Paywallskip   paywallskip.com/... · Posted by u/francocanzani
rendall · a year ago
Does paywallskip scrape archive.is and archive.org?
francocanzani · a year ago
It fallbacks to archives, yes. Basically I use different User-Agent headers, different Referer headers, it tries disabling javascript once the page has loaded and the fallback is to fetch from web archives (Wayback Machine, archive.is, Google cache).

Then the HTML is validated and parsed.

francocanzani commented on Unlock Articles with Paywallskip   paywallskip.com/... · Posted by u/francocanzani
francocanzani · a year ago
Hi! Our project addresses the limitations of existing paywall bypass tools by implementing a dynamic, community-driven approach. Key features include:

Real-time Adaptive Blacklist:

Constantly updated database of paywalled sites and effective bypass methods User-driven reporting system for quick adaptation to paywall changes Significantly faster response to new paywalls compared to static solutions

Multi-Method Bypass Arsenal:

Unlike single-method solutions (e.g., 12ft.io's cache access), we employ various techniques Methods include: User-Agent spoofing, Referer header manipulation, JS disabling post-load, and web archive fallbacks (Wayback Machine, archive.is, Google cache) Our blacklist determines the most effective method per site, improving success rates

Site-Specific Solutions:

Tracking individual websites allows for custom bypass methods when general approaches fail Parsed and validated HTML output ensures content integrity

We believe this approach offers a more robust and adaptable solution to paywall bypassing. We're eager to hear the community's thoughts and potential improvements.

u/francocanzani

KarmaCake day27September 2, 2024View Original