Show HN: Rarbg on IPFS

Zopieux · 3 years ago

The sqlite db contains a imdb column but it seems like the author forgot to include it in the fts4 index, meaning one cannot search for "tt9916362" even though it's right there in the database. It's a shame because this curated mapping is the most useful aspect of RARBG.

FYI, this dump has 826'201 magnets (torrents) with an associated imdb ID and 2'017'490 without, including lots of porn but also random music and software.

Category breakdown (careful though, most items aren't categorized or use a category code I couldn't interpret):

      XXX ( 1):     2,255
      XXX ( 4):       607
   Movies (14):     3,206
   Movies (17):   117,440
       TV (18):   198,314
    Music (23):    11,621
    Music (24):   471,161
    Music (25): 1,339,739

Avlin67 · 3 years ago

> including lots of porn

but not all

away_rar10000 · 3 years ago

that's right, it's missing all of the vr180 stuff, and seems like it might have been the only place to find it.

parakovsky · 3 years ago

so what's to change in index file?

rhqq2 · 3 years ago

Related: I just dropped a mega DB archive dump for everything I have with regard to RARBG. My hope is others will find this useful.

https://github.com/sleaze/rarbg-db-dumps

https://news.ycombinator.com/item?id=36187767

thyrox · 3 years ago

This is just amazing work!

I have a question for you as even playing with this or creating oss project using this sounds like inviting trouble. So if any programmer wants to create an open source project using this data (just for kicks) then apart from using a VPN and throwaway email do you need to be careful about anything else? Any tips?

thunkymonkey · 3 years ago

Never link discussion of it with your other alts. ;)

dark-star · 3 years ago

How does this compare to the ipfs dump that someone claimed has over 2 million entries? Your DB dump has only around 1.6 million... so does ipfs have duplicates? or is there something substantial missing in your dump?

Dead Comment

mmastrac · 3 years ago

I was curious how this works and then I saw the sqlite requests in the network tab. It's amazing to see what we have access to these days -- SQLite over HTTP over IPFS to provide a giant, censorship-resistant database!

remram · 3 years ago

IPFS is distributed but not anonymized, so you can send abuse requests to people's ISPs to take content down (like with torrents).

The HTTP part is also not censorship-resistant. ipfs.io has a DMCA process, and they could also be asked to reveal the IP of users.

bscphil · 3 years ago

> you can send abuse requests to people's ISPs

In practice it's very rare for me to see a direct ipfs protocol link, almost all the traffic goes through HTTP gateways (which frequently cache the content as well). Hard to imagine that they don't become a target for "hosting" pirated content if / when IPFS becomes more than a negligible platform for piracy. (A significant amount of Library Genesis traffic is already using IPFS via these same gateways.)

As you mention, there's a DMCA process for some of the gateways, but that might not be enough to ward off attention.

tylersmith · 3 years ago

IPFS can be accessed through Tor immediately through a gateway, and with a little work through the ipfs client directly.

activiation · 3 years ago

Use Tor?

qersist3nce · 3 years ago

Can you explain bit more? Isn't the dump of last RARBG magnet links on the order of MBs? We can just download it and grep in plain text. I don't get what is the role of SQLite or IPFS here.

phiresky · 3 years ago

This blog post explains how it works: https://phiresky.github.io/blog/2021/hosting-sqlite-database...

Disclaimer: I wrote that article and was somewhat involved in the other sqlite over ipfs project this is forked from.

Yes, for MB size files just downloading the whole thing will be faster and much easier - even if running in the browser. I'd say the boundary is somewhere around 10MB compressed (~20-200MB uncompressed). Looks like the sqlite dump used here is ~400MB in size, ~180MB compressed. Loading that in the browser would probably work but it wouldn't be too great.

runeks · 3 years ago

> Isn't the dump of last RARBG magnet links on the order of MBs?

As I understand it, the purpose of using SQLite is indexing and querying, ie. being able to efficiently search through the data via a website.

romnon · 3 years ago

this, anyone above 20 can operate.

TekMol · 3 years ago

It's censorship resistant if enough nodes mirror the sqlite file I guess.

Is there any incentive for nodes to do so?

Is it possible to see a statistic about how many nodes mirror it?

lyu07282 · 3 years ago

torrents work fine without having a monetary incentive, this misconception of the blockchain crowd really has to die its killing real decentral solutions

jazzyjackson · 3 years ago

it's content addressed, if an ISP was so keen it would be simple to blackhole requests for a particular file (of course, one need only to change 1 bit to get a new content hash, but you have to redistribute the new file from scratch

dietr1ch · 3 years ago

I don't know about how IPFS is implemented, but you could use content-addressed blocks underneath every file too. This way flipping a bit means that only one underlying block changes, and that the new file would share N-1 blocks with the previous one, making redistribution only require sharing a single block.

lordofgibbons · 3 years ago

A note of caution for those unfamiliar with how IPFS works.

It's very similar to BitTorrent with how content distribution happens. Your local node will broadcast which content it has available (downloaded).

If you access a piece of content you automatically become a host for it so you still need to use a VPN if you live in a country where you can get sued.

malikNF · 3 years ago

I think your post needs to clarify this is true only if you use the ipfs client to access things hosted on ipfs.

Accessing ipfs.io/ipfs/ doesn’t do anything you mentioned. Its just a gateway.

You could take the link on the main submission. And replace ipfs.io with any (most, because some are offline) of the links here https://ipfs.github.io/public-gateway-checker/

And it will still work.

lordofgibbons · 3 years ago

That's true, and I should have clarified that.

However, is using a proxy like ipfs.io really using ipfs?

If everyone did that, there's no point to using this protocol. The strength of the network comes from the fact that the content gets replicated/distributed when accessed. That doesn't happen when accessed through a proxy.

RobotToaster · 3 years ago

Tangential, but does anyone know what's happening with the JS IPFS implementation? when I tried it a while ago it seemed broken.

Grimburger · 3 years ago

Using Brave it makes this very clear, the above link won't open by default without the user choosing to run a local node or use a public gateway instead. It explains the implications of both options.

Screenshot: https://i.imgur.com/ZP6AgPp.png

lgats · 3 years ago

Most browsers don't support the ipfs protocol and instead only use the gateway.

evilllkint · 3 years ago

I love Brave, so useful.

TekMol · 3 years ago

That might be how some client(s) work, but I don't think that is how IPFS works.

If you access a piece of content over ipfs.io for example, I would think you just make https requests like to every other website.

jacooper · 3 years ago

Because its using a gateway. I think if you are accessing it natively through IPFS, it will default to adding you as a peer.

gkbrk · 3 years ago

If you access a piece of content over ipfs.io, and you don't have your browser set up to actually do those requests over a local IPFS daemon, you are not using IPFS. You are just using a normal centralized website.

cramjabsyn · 3 years ago

It depends if you run a node or are accessing via a proxy/gateway

robbintt · 3 years ago

Is it recommended to run my own proxy then, and is there any boilerplate project out there? I could also use OpenVPN, but seems like I just want to proxy ipfs, not my whole connection.

jacooper · 3 years ago

Also AFAIK if no one acceses the content for a long time, it can get lost, like torrents.

willsoon · 3 years ago

But now, I dont know why, theres a lot of IPFS pinning service I got HUGO hosting in GH and deploying in FTPS from fleek.com. It has nothing on it but it works like a charm.

willsoon · 3 years ago

No, you are actually accessing the web. But you have to download using torrent, so to some extent it is true what you note.

EmilStenstrom · 3 years ago

Relevant context: Rarbg just shut down (see message on https://rarbg.to/)

hannofcart · 3 years ago

Here's a fun way to spend the next 15 minutes on the site. Find search terms that satisfy both the above constraints:

1. Has at least 20 results

2. None of the first 20 results is porn content

I even tried with Math and Chess. No dice.

progbits · 3 years ago

"Quantum" works.

That was my second attempt, I tried "physics" first but oh boy was I naive.

no_time · 3 years ago

Easy. Pick a scenegroup that only releases SFW content. "CODEX" "FLT" and such.

primax · 3 years ago

Got it on the first try with Star Wars

Retr0id · 3 years ago

This works by doing range requests on an sqlite DB.

Very cool, but does ipfs support verification of partial reads like this? (or would I need to download the whole DB and check the hash?)

I can think of some ways it could work using merkle trees or similar, but I have no idea what ipfs does under the hood, if anything.

ianopolous · 3 years ago

IPFS doesn't currently. But there are some in the community working on exactly this using Blake3 and BAO.

whyrusleeping · 3 years ago

The incremental requests are verified if running through your own node as a side effect of loading the data

rch · 3 years ago

Is there an IPFS dedicated to training data? A mirror of input datasets and fully open models resident on HuggingFace could endeavor to cut out onerous license agreements when possible.