Readit News logoReadit News
vkuprin commented on Show HN: I built a tool that watches webpages and exposes changes as RSS   sitespy.app... · Posted by u/vkuprin
briaoeuidhtns · 13 hours ago
have you adhered to the license? https://github.com/dgtlmoon/changedetection.io/blob/master/C... . if so, where can I get a copy of the source?
vkuprin · 10 hours ago
Yes — the project is Apache 2.0 licensed (https://github.com/dgtlmoon/changedetection.io/tree/master?t...), which permits forking and commercial use. There's also a COMMERCIAL_LICENCE.md in the repo for hosting/resale cases, and I've reached out to the maintainer directly about it. Attribution is here: https://docs.sitespy.app/docs/acknowledgements
vkuprin commented on Show HN: I built a tool that watches webpages and exposes changes as RSS   sitespy.app... · Posted by u/vkuprin
BloodAndCode · 16 hours ago
that direction makes a lot of sense. selectors always end up turning into maintenance work sooner or later.

the idea of a small model just identifying “the thing that looks like a price / status / headline” feels much closer to semantic detection than DOM-path tracking

curious though — would you run that model on every check, or only when the selector fails? seems like a nice hybrid approach to keep things cheap.

vkuprin · 14 hours ago
Yes! Exactly this direction — the hybrid fetch is already live: plain HTTP first, Chromium if the content looks off. LLM semantic targeting is the next step, but only triggered when a selector breaks, not on every check — too expensive otherwise.
vkuprin commented on Show HN: I built a tool that watches webpages and exposes changes as RSS   sitespy.app... · Posted by u/vkuprin
rozumem · 20 hours ago
Curious how you're thinking about getting around anti-bot protection. I scrape a lot and I've noticed many highly trafficked sites investing in anti-bot measures recently, with the rise of AI browsers and such. Still, cool idea, congrats on the launch.
vkuprin · 16 hours ago
I'm planning to add proxy rotation across different regions to help with geo-restricted content and rate limiting. Anti-bot is an arms race though — some sites just can't be monitored without solving a captcha, which isn't something I'm trying to do. Focused on making the common cases work well rather than promising to bypass everything.
vkuprin commented on Show HN: I built a tool that watches webpages and exposes changes as RSS   sitespy.app... · Posted by u/vkuprin
BloodAndCode · 16 hours ago
that makes sense. the error flagging is a nice touch — silent failures are usually the worst part of these tools. i've also run into the “random class name” problem on a lot of modern frontends. have you experimented with more semantic selectors (text anchors, attributes, etc) as a fallback, or do you try to keep it simple on purpose?
vkuprin · 16 hours ago
Yeah, semantic anchors are definitely the right direction — [data-testid], aria-label, or text proximity tend to survive rebuilds much better than class paths. The picker leans towards CSS right now but that's something I want to improve.

The harder problem is auth-gated content — Instagram feeds, dashboards, paywalled pages. Browser Steps handles it today (you can script login flows), but honestly I think the real fix is AI-assisted interaction. A small cheap model that can find what you care about without needing a brittle selector at all. That's where I want to take this — less "maintain a CSS path", more "here's what I'm interested in, figure it out...

vkuprin commented on Show HN: I built a tool that watches webpages and exposes changes as RSS   sitespy.app... · Posted by u/vkuprin
BloodAndCode · 16 hours ago
i've actually been looking for something like this. full-page change monitors get noisy really fast, especially on sites with lots of small UI changes.

being able to watch a specific element sounds way more useful in practice (price blocks, availability text, etc).

curious how fragile the element tracking is though. if the site slightly changes the DOM structure or class names, does the watcher usually survive or do you end up reselecting the element pretty often?

vkuprin · 16 hours ago
The selector is stored as a CSS path and matched against the fetched HTML on each check — so as long as the element's structure and nesting stay roughly the same, minor layout changes don't usually break it.

The fragile cases are sites that generate class names on every build (React/webpack/vite apps often do this) — those selectors will just stop working.

For semantic elements like price tags, availability text, or content blocks, they tend to be stable enough that it's not a real problem day-to-day. And if a filter stops matching entirely, the watch flags with error message it rather than silently giving you empty diffs.

vkuprin commented on Show HN: I built a tool that watches webpages and exposes changes as RSS   sitespy.app... · Posted by u/vkuprin
Kotlopou · 17 hours ago
I use RSS to get updates from a ll the stuff I read online at once, and thought this would be nice for those websites that don't already have an RSS feed, but... Perhaps I'm stupid, but I can't actually find the RSS output? And searching for RSS on https://docs.sitespy.app/docs returns no hits.
vkuprin · 16 hours ago
Not stupid at all — the docs were missing an RSS page, which is on me. I've just added one: https://docs.sitespy.app/docs/dashboard/rss. RSS feeds are available per watch, per tag, or across all watches from the dashboard. Thanks for flagging it, this is exactly the kind of feedback that helps
vkuprin commented on Show HN: I built a tool that watches webpages and exposes changes as RSS   sitespy.app... · Posted by u/vkuprin
shaunpud · 18 hours ago
Does your project use changedetection.io behind the scenes? When I look at the _All Watches Feed_ the contents of the rss file include;

    <?xml version='1.0' encoding='UTF-8'?>
    <rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0"><channel><title></title><link>https://changedetection.io</link><description>Feed description</description><docs>http://www.rssboard.org/rss-specification</docs><generator>python-feedgen</generator><lastBuildDate>Thu, 12 Mar 2026 10:10:10 +0000</lastBuildDate></channel></rss>

vkuprin · 17 hours ago
Yes — it's a fork of changedetection.io. I went into more detail here: https://news.ycombinator.com/item?id=47349141. The RSS link you spotted was a leftover, already fixed
vkuprin commented on Show HN: I built a tool that watches webpages and exposes changes as RSS   sitespy.app... · Posted by u/vkuprin
msp26 · 17 hours ago
I got claude to reverse engineer the extension and compare to changedetection and here's what it came up with. Apologies for clanker slop but I think its in poor taste to not attribute the opensource tool that the service is built on (one that's also funded by their SaaS plan)

---

Summary: What Is Objectively Provable

- The extension stores its config under the key changedetection_config

- 16 API endpoints in the extension are 1:1 matches with changedetection.io's documented API

- 16 data model field names are exact matches with changedetection.io's Watch model (including obscure ones like time_between_check_use_default, history_n, notification_muted, fetch_backend)

- The authentication mechanism (x-api-key header) is identical

- The default port (5000) matches changedetection.io's default

- Custom endpoints (/auth/, /feature-flags, /email/, /generate_key, /pregate) do NOT exist in changedetection.io — these are proprietary additions

- The watch limit error format is completely different from changedetection.io's, adding billing-specific fields (current_plan, upgrade_required)

- The extension ships with error tracking that sends telemetry (including user emails on login) to the developer's GlitchTip server at 100% sample rate

The extension is provably a client for a modified/extended changedetection.io backend. The open question is only the degree of modification - whether it's a fork, a proxy wrapper, or a plugin system. But the underlying engine is unambiguously changedetection.io.

vkuprin · 17 hours ago
Fair point, and I should have been upfront about this earlier. The backend is a fork of changedetection.io. I've built on top of it — added the browser extension workflow, element picker, billing, auth, notifications, and other things — but the core detection engine comes from their project. That should have been clearly attributed from the start, and I'll add it to the docs and about page.

changedetection.io is a genuinely great project. What I'm trying to build on top of it is the browser-first UX layer and hosted product that makes it easier for non-technical users to get value from it without self-hosting and AI focus approach

P.S -> I've also added an acknowledgements page to the docs: https://docs.sitespy.app/docs/acknowledgements

vkuprin commented on Show HN: I built a tool that watches webpages and exposes changes as RSS   sitespy.app... · Posted by u/vkuprin
pentagrama · a day ago
That looks nice, I use the free plan of https://visualping.io for some software changelogs, RSS feeds are a paid feature. Will check this out.
vkuprin · a day ago
Yeah, RSS is free on all plans — it felt like a core feature, not an upsell
vkuprin commented on Show HN: I built a tool that watches webpages and exposes changes as RSS   sitespy.app... · Posted by u/vkuprin
nicbou · a day ago
Monthly is fine, but not monthly all at once, because I watch multiple pages on one website, and that triggers the rate limiting.

The ideal pipeline for me would be "notice a change in a specific part of a page, use a very small LLM to extract a value or answer a question, update a constant in a file and make a pull request".

I've been thinking about this pipeline for a long time because my work depends on it, but nothing like it seems to exist yet. I'll probably write my own, but I just can't find the time.

vkuprin · a day ago
You can already work around the rate-limit issue today — there's a global minimum recheck interval in Settings that spreads checks out across time. Not per-site throttling yet, but it prevents one domain from getting hit too many times at once.

The pipeline you described — detect a change, extract a value with a small LLM, open a PR — is pretty much exactly what the MCP server is designed for. Connect Site Spy to Claude or Cursor, and when a specific part of a page changes, the agent can handle the extraction and PR automatically. I don't think anyone has wired up that exact flow yet, but all the pieces exist.

u/vkuprin

KarmaCake day123April 18, 2023
About
Building sitespy.app
View Original