churros_train (u/churros_train)

churros_train commented on Ask HN: What are you working on (September 2024)? · Posted by u/david927

I read about things I don't understand all the time, and the best way I know to train myself to retain information is to actively take notes directly on the thing I am reading on.

Using a highlighter or annotation type tool, if you will.

So I decided to build an annotation tool for all public webpages! Playground demos of how it will work: - https://www.contextdive.com/snapshot?snapshottedId=47692b19-... - https://www.contextdive.com/snapshot?snapshottedId=3557f52f-...

^These are previously snapshotted page, you can highlight anywhere and leave a comment by right clicking for the context menu

\PS: I still don't have persistence of comments working yet since its a playground, but would love to hear feedback if anyone would like to use it.

churros_train commented on State of S3 – Your Laptop is no Laptop anymore – a personal Rant blog.jeujeus.de/blog/hard... · Posted by u/tosh

churros_train · 2 years ago

For anyone reading this, I thought S3 was referring to the AWS S3 and was really confused initially. Instead S3 here refers to a sleep state - https://answers.microsoft.com/en-us/windows/forum/all/how-to....

churros_train commented on Show HN: I'm making an AI scraper called FetchFox fetchfoxai.com/... · Posted by u/marcell

marcell · 2 years ago

Try it out and let me know if you like it :). If there's a bug I'll fix it!

More specifically, FetchFox is targeting a specific niche of scraping. It focuses on small scale scraping, like dozens or a few hundred pages. This is partly because, as a Chrome extension, it can only scrape what the user's internet connection can support. You can't scrape thousands or millions of pages on a residential connection.

But a separate reason is, I think that LLM's open up a new market and use case for scraping. FetchFox lets anyone scrape without coding knowledge. Imagine you're doing a research project, and want data from 100 websites. FetchFox makes that easy, whereas with traditional scraping you would have needed coding knowledge to scrape those sites.

As an example, I used FetchFox to research political bias in the media. I was able to get data from hundreds of articles without writing a line of code: https://ortutay.substack.com/p/analyzing-media-bias-with-ai . I think this tool could be used by many non-technical people in the same way.

churros_train · 2 years ago

Ah thats really interesting! How do you evaluate large scale cloud scraping services, since its operations are entirely hidden from you?

Personally I am looking into options in this area, are you planning to offer a cloud based version of this at some point/could you tell which existing ones are good if not?

churros_train commented on Show HN: I'm making an AI scraper called FetchFox fetchfoxai.com/... · Posted by u/marcell

trog · 2 years ago

Would love something like this that allows users to trivially turn sites like Facebook/Twitter into RSS feeds. I'm sure this kinda thing is a useful stepping stone to doing that.

churros_train · 2 years ago

My impression is that facebook and twitter have really strong anti scraping measures. Is that wrong? And is there any reliable scraping services that can actually do scraping of those large companies' sites at a reasonable cost?

churros_train commented on Show HN: I'm making an AI scraper called FetchFox fetchfoxai.com/... · Posted by u/marcell

churros_train · 2 years ago

I am really curious how do people actually evaluate scrapers? There are so many options and I am dizzy just trying to read them...

Also wondering how does the OP think about comparing themselves and standing out in the marketplace of seemingly bazillion options