Readit News logoReadit News
jpyles commented on Scraperr – A Self Hosted Webscraper   github.com/jaypyles/Scrap... · Posted by u/jpyles
michaeljx · 3 months ago
I was in a similar boat with my scrapers. Started with Selenium 5-6 years ago and only discovered Playwright 2 years ago. Spend a month or so swapping the two, which was well worth it. Cleaner API, async support.
jpyles · 3 months ago
Luckily, I have some experience with playwright, so swapping shouldn't take me too long.

Currently working on a PR to swap over

jpyles commented on Scraperr – A Self Hosted Webscraper   github.com/jaypyles/Scrap... · Posted by u/jpyles
_QrE · 3 months ago
Is there a reason for using Selenium over something like Playwright? I haven't had very many positive experiences with selenium, and playwright I found is easier to use and more flexible.

Also, for stuff like this:

`modified_value = original_value.replace("HeadlessChrome", "Chrome")`

There's quite a few ways to figure out that a browser is a bot, and I don't think replacing a few values like this does much. Not asking you to reveal any tricks, just saying that if you're using something like Playwright, you can e.g. run scripts in the browser to adjust your fingerprint more easily.

jpyles · 3 months ago
With the custom headers, you can actually trick a lot of sites with bot protection to let you load their sites (even big sites like youtube, which I have found success in)
jpyles commented on Scraperr – A Self Hosted Webscraper   github.com/jaypyles/Scrap... · Posted by u/jpyles
_QrE · 3 months ago
Is there a reason for using Selenium over something like Playwright? I haven't had very many positive experiences with selenium, and playwright I found is easier to use and more flexible.

Also, for stuff like this:

`modified_value = original_value.replace("HeadlessChrome", "Chrome")`

There's quite a few ways to figure out that a browser is a bot, and I don't think replacing a few values like this does much. Not asking you to reveal any tricks, just saying that if you're using something like Playwright, you can e.g. run scripts in the browser to adjust your fingerprint more easily.

jpyles · 3 months ago
I am quite aware, but I actually built most of the scraping logic a long time ago, before I even knew that playwright was a thing.

I am looking to refactor a lot of this, and switching over to playwright is a high priority, using something like camoufox for scraping, instead of just chromium.

Most of my work on this the past month has been simple additions that are nice to haves

u/jpyles

KarmaCake day84May 11, 2025View Original