Readit News logoReadit News
JasonPunyon commented on Show HN: Sumble – knowledge graph for GTM data – query tech stack, key projects   sumble.com... · Posted by u/antgoldbloom
csomar · 8 months ago
Simple queries. As in typing the name of a person in a company list of 200. Keeps spinning forever.
JasonPunyon · 8 months ago
Thanks for taking it for a spin! I'm working on why this is slow now.
JasonPunyon commented on Stack Overflow and OpenAI are partnering   stackoverflow.co/company/... · Posted by u/onatm
JasonPunyon · 2 years ago
If anyone wants their data back in a way they can use it, it's right here https://seqlite.puny.engineering

And I'd be remiss if I didn't point out that their trade dress is MIT licensed. https://stackoverflow.design

Have fun.

JasonPunyon commented on Farey Numbers and Linked Lists   jasonpunyon.com/blog/2024... · Posted by u/JasonPunyon
rawling · 2 years ago
Pretty neat article!

(Don't have much to say, just didn't want "typo" to be my only feedback)

JasonPunyon · 2 years ago
Thanks so much, that’s really kind.
JasonPunyon commented on Farey Numbers and Linked Lists   jasonpunyon.com/blog/2024... · Posted by u/JasonPunyon
rawling · 2 years ago
> Inserting between b and be gives us bee. Inserting between be and e gives us bee.

Typo?

JasonPunyon · 2 years ago
Good catch! Fix going up now.
JasonPunyon commented on SEqlite – Minimal Stack Exchange Data Dump in SQLite Format   seqlite.puny.engineering/... · Posted by u/JasonPunyon
wolfgang42 · 2 years ago
I wrote a blog post a while back about reading these dumps: https://search.feep.dev/blog/post/2021-09-04-stackexchange

Presumably they have a script that does something similar to that process, and then writes the resulting data into a predefined table structure.

JasonPunyon · 2 years ago
Nice post!

Yep, my process is similar. It goes...

  - decompress (users|posts)  
  - split into batches of 10,000  
  - xsltproc the batch into sql statements  
  - pipe the batches of statements into sqlite in parallel using flocks for coordination
On my M1 Max it takes about 40 minutes for the whole network. Then I compress each database with brotli which takes about 5 hours.

JasonPunyon commented on SEqlite – Minimal Stack Exchange Data Dump in SQLite Format   seqlite.puny.engineering/... · Posted by u/JasonPunyon
panqueca · 2 years ago
I liked the idea. I think SQLite is very powerful way of storing and sharing data. But this site is only about Stack Exchange Network.

Do you know if is there a place to host other of sqlite dumps? I mean from other websites? Recently I dumped the whole hackernews api and I got thinking about it.

JasonPunyon · 2 years ago
This site is on a Cloudflare R2 bucket because (and only because) they have free egress. While not datacenter sized some of these files are large. Just opening up your credit card to 10 cents a gigabyte will be a bad time anywhere else.

u/JasonPunyon

KarmaCake day831October 26, 2009
About
- jasonpunyon.com - cv.jasonpunyon.com - https://hachyderm.io/@JasonPunyon
View Original