Readit News logoReadit News
lukeigel commented on Show HN: CanIUse for WebCodecs – Hardware support from 71M tests   webcodecsfundamentals.org... · Posted by u/sb2702
lukeigel · 18 days ago
Doing gods work! I saw it yesterday on Twitter and now I've been relying on it all of this morning.

Deleted Comment

lukeigel commented on Show HN: Jmail – Google Suite for Epstein files   jmail.world... · Posted by u/lukeigel
chirau · 2 months ago
I clicked on attachments, then images. There are only 9 images in this whole release?
lukeigel · 2 months ago
Sorry about that. That Attachments tab shows you literal attachments found in the .eml files from the YAHOO dataset. We have released very few .emls that have attachments though. Expect more and more over time.

Releasing YAHOO responsibly has been time consuming, and we're relying on Drop Site to tackle the redactions. See this post for context.

https://news.ycombinator.com/item?id=46347272

lukeigel commented on Show HN: Jmail – Google Suite for Epstein files   jmail.world... · Posted by u/lukeigel
vldszn · 2 months ago
Wow, that’s impressive! Are you planning to open-source the project? =)
lukeigel · 2 months ago
Yes we'd love to. Letting the dust settle first
lukeigel commented on Show HN: Jmail – Google Suite for Epstein files   jmail.world... · Posted by u/lukeigel
nsomaru · 2 months ago
Hey, I’d be interested in your thoughts on this, or the key ideas/research results you relied on:
lukeigel · 2 months ago
Yes! We used our friends at Reducto (https://reducto.ai/) for all document extraction and parsing (one of the best companies I've ever referred to YC ;) )

We did an initial parsing pass of all four DOJ document batches on Friday. This takes a raw PDF and returns chunks containing typed blocks—each with a type (Title, Text, Figure, etc.), bounding boxes, content, and confidence scores. For PDFs that were just scans of photographs (which was like 90% of new content in Friday's release), it gave in depth descriptions of those! You can type search terms like "door" at https://www.jmail.world/photos to see what I mean.

For apps like Jmail and JFlights we use their structured extraction endpoint instead—you define a schema (e.g. {from, to, subject, date, body} for emails or {departure_airport, arrival_airport, passengers[], date} for flights) and it pulls those fields directly into JSON.

The JFlights example served as the best ad for Reducto and how doc parsing technology can speed up hours of journalistic investigations like this.

See for yourself. Given this document

https://www.jmail.world/drive/HOUSE_OVERSIGHT_002031

It inferred and enriched multiple flight cards on JFlights (https://www.jmail.world/flights). I was really shook when I first saw this.

lukeigel commented on Show HN: Jmail – Google Suite for Epstein files   jmail.world... · Posted by u/lukeigel
mikeyouse · 2 months ago
Ah I was going to ask about the Yahoo emails.. are those distinct from the cloned Gmail messages or are they in the same inbox on your site?

Has anyone written a parser for the text messages? A messages-like UI to be able to read through all the texts would be super interesting too. The format DOJ released them in is impossible to follow.

lukeigel · 2 months ago
Another person made an oddly beautiful ASCII ui for the text messages. All seem to be from HOUSE_OVERSIGHT (we have those plus DOJ, YAHOO. No dedicated text UI from us)

https://michelcrypt4d4mus.github.io/epstein_text_messages/

He also shouted us out last month which was very kind of him

lukeigel commented on Show HN: Jmail – Google Suite for Epstein files   jmail.world... · Posted by u/lukeigel
cobertos · 2 months ago
Why and how is the data from DDoSecrets redacted?

Do you have a page about each dataset you're sourcing and the background on them like your provide here?

The "EFTA00000468" saga has me distrusting the authenticity of most of these datasets.

lukeigel · 2 months ago
Re: the DOJ emails prefixed with "EFTA", I have no idea how over-redacted they are. They definitely seem dubious though.

Re: the DDoSecrets emails though (YAHOO dataset), I have more to share.

Drop Site News agreed to give us access to the Yahoo dataset discovered by DDoSecrets, but on the condition that we help redact it. It's a completely unfiltered dataset. It's literally just .eml files for jeeprojects@yahoo.com. It includes many attached documents. There is no illegal imagery, but it has photos of Epstein's extended family (nephews, nieces, etc) and headshots of many models that Epstein's executive assistant would send to him. I was quite shocked that this thing existed.

We built some internal redaction tools that the Drop Site team is now using to comb through all of this. We've released 5 batches of the Yahoo mail now, with the 1k+ Amazon receipts being the most recent.

A few thoughts on how we do redaction are here: https://www.jmail.world/about.

Unlike the DOJ, we've tried to minimize the ambiguity about what was redacted.

For example: all redacted images are replaced with a Gemini-generated description of that photograph.

Another example: we are aggressively redacting email addresses and phone numbers of normal people to avoid spamming them. Perhaps others would leave it all in, but Riley and I don't want to be responsible for these people's lives getting disrupted by this entire saga. For example, we redacted this guy's email but not his name: https://www.jmail.world/thread/4accfb5f3ed84656e9762740081a4...

Riley and I were not expecting this type of scope when we first dropped Jmail. Jmail is an interesting side project for us, and this new dataset requires full-time attention. Thankfully we have help though. We're happy to take on this responsibility given how helpful, thoughtful and careful both the Drop Site and DDoSecrets team has been here.

lukeigel commented on Show HN: Jmail – Google Suite for Epstein files   jmail.world... · Posted by u/lukeigel
hn-acct · 2 months ago
Why does this site work better on mobile than the real google suite. Funny stuff
lukeigel · 2 months ago
Thank you! I tried to optimize scrolling through the photos the most. Avoided any virtualization library, used Cloudflare for image transformations and for caching, etc. The masonry layout and polish you see here is all https://news.ycombinator.com/user?id=walz though

Specifically at https://www.jmail.world/photos

lukeigel commented on Show HN: Jmail – Google Suite for Epstein files   jmail.world... · Posted by u/lukeigel
grepfru_it · 2 months ago
I’ve seen that room before. I found it during an image search. Either this image you are sharing is not really epsteins or the image I found predated the catch of epsteins misdeeds. Or it’s AI/manipulation all the way down. Not sure what to think anymore
lukeigel · 2 months ago
Tons of this week's DOJ drop is old photos. These photos of his Upper East Side mansion and his Island estate are well known

We found that Volume 2 and Volume 4 had the most never-before-seen stuff.

https://www.justice.gov/epstein/doj-disclosures/data-set-2-f...https://www.justice.gov/epstein/doj-disclosures/data-set-4-f...

Also, this morning they quietly released volumes 5-7. Will have to find out how much of this is new.

Deleted Comment

u/lukeigel

KarmaCake day600February 5, 2019View Original