Readit News logoReadit News
Posted by u/gabrielsroka 6 years ago
Show HN: Export HN Favorites to a CSV File
I wrote a short JavaScript snippet to export HN Favorites to a CSV file.

It runs in your browser like a browser extension. It scrapes the HTML and navigates from page to page.

Setup and usage instructions are in the file.

Check out https://gabrielsroka.github.io/getHNFavorites.js or to view the source code, see getHNFavorites.js on https://github.com/gabrielsroka/gabrielsroka.github.io

sbr464 · 6 years ago
gabrielsroka · 6 years ago
Ok, now that's really smart!

Would a client using your API paginate to fetch, say, 50 pages? When I tried it using ?limit=50, I got a 504 error.

Thanks!

(Edit, never mind, I see you explain it in the readme.)

sbr464 · 6 years ago
I had made it pretty quickly, the limit would be too large, thats 50*30, you may need to provide an offset and make a few requests.
wildduck · 6 years ago
Interesting that it is using x-ray, seems like x-ray is still using PhantomJS as the plugin, is PhantomJS deprecated? Would it be using Puppeteer instead?
jaytaylor · 6 years ago
This is cool, I love HN metadata, too :)

Plug for a related golang tool I wrote and use which exports favorites upvotes as structured JSON:

https://github.com/jaytaylor/hn-utils

Just

    go get github.com/jaytaylor/hn-utils/...

simonw · 6 years ago
It's a shame favorites aren't exposed in the official HN API: https://github.com/HackerNews/API - this is a smart workaround.
dang · 6 years ago
Our plan is for the next version of HN's API to simply serve a JSON version of every page. I'm hoping to get to that this year.
simonw · 6 years ago
That would be amazing!

I've been having some fun with the API recently building this tool: https://github.com/dogsheep/hacker-news-to-sqlite

gitgud · 6 years ago
Wow, that would make it really easy to implement an alternative HN client.

Related Question: Is this the source code for HN? https://github.com/wting/hackernews

bhl · 6 years ago
Are there plans for an export tool, e.g. a user downloading all their comments and upvoted submissions? I tend to use the submission upvote button more than the favorite one, and an export tool wouldn't require a user API key for non-private info.
death-by-ppt · 6 years ago
Hi Dan,

That's great news! Is there a way to be notified (eg, via email) when this comes out?

Thanks.

amjd · 6 years ago
That's great! Is there any plan of exposing authenticated content through the API too? Mainly talking about upvoted stories.
gabrielsroka · 6 years ago
Thanks Simon. I'd originally written this script to export my DVD ratings from Netflix, since there's no API for that either. It was easy to adapt it to HN.

I wanted to show people that it's possible (and easy) to get to your own data!

dvfjsdhgfv · 6 years ago
This is smart. I'm adding this to my HN favorites.
gabrielsroka · 6 years ago
Thanks dvfjsdhgfv. If there's sufficient interest, I can easily turn it into a Chrome extension.

(Edit: haha, I see what you just did there. A little recursive humor.)

catchmeifyoucan · 6 years ago
This is great! My biggest problem was I couldn’t search through my upvoted items to find the article I liked again. I used google custom search and cleaned the data as flat urls.

https://www.heyraviteja.com/post/projects/deep-search-hn/

catchmeifyoucan · 6 years ago
oops - I didn’t realize that favorites != upvoted.
zerop · 6 years ago
Can it will be done with "Scrape Similar" Chrome plugin?
gabrielsroka · 6 years ago
Thanks for the tip. I gave the "Scraper" extension a try, and 1) I got an error, 2) it only seems to scrape 1 page -- it doesn't paginate (or, did I miss something?).

I used the jQuery selector `a.storylink`.

rtcoms · 6 years ago
Is there any way to find most Favorited items on HN ?
app4soft · 6 years ago
Could someone convert it to Python-script?
gabrielsroka · 6 years ago
Part of the advantage of running JavaScript in your browser is that you might already be authenticated and it can use your session. But, fetching your HN favorites doesn't require authentication.

  #!/usr/bin/env python3
  import requests
  from bs4 import BeautifulSoup

  for p in range(1, 17):
      r = requests.get(f'https://news.ycombinator.com/favorites?id=app4soft&p={p}')
      s = BeautifulSoup(r.text, 'html.parser')
      print([{'title': a.text, 'url': a['href']} for a in s.select('a.storylink')])

app4soft · 6 years ago
Thanks!

One more question: what is the best way stop it when it will reaches last page?

> for p in range(1, 17):

Actually p=17[0] is empty (as p=16 is maximum as for now).

Maybe, script should scrap pages from `1` to `infinity` UNTIL it detect next message on page[0]:

> app4soft hasn't added any favorite submissions yet.

[0] https://news.ycombinator.com/favorites?id=app4soft&p=17

death-by-ppt · 6 years ago
Can someone convert it to Bel?
amjd · 6 years ago
I had written a Python script to get saved (upvoted) links as JSON / CSV a few years ago: https://github.com/amjd/HN-Saved-Links-Export

I'm not sure if it still works as it too relied on HTML scraping. Perhaps I should update it to support favorites too.

Edit: Whoa, it's been 4 years already. I believe HN didn't have favorite feature at the time. That's why I used upvoting as my bookmarks system and created a script to export that data.

gabrielsroka · 6 years ago
@amjd, thanks for sharing. I upgraded it from Python 2 to Python 3, but when I ran it, I got a 404 error on the `saved` endpoint. Does it work for you?

Edit: I see from the other PR it's called `upvoted`.

Edit 2: I changed it to `upvoted` and now I get a 200 OK, but the code crashed right afterwards on `tree.cssselect()`.