BTFS: BitTorrent Filesystem

raffraffraff · a year ago

So is there a server program to partner this with? Something that acts as a torrent file builder, tracker and simple file server for the torrent? I can imagine in a large org you could store a gigantic quantity of public data on a server that creates a torrent whenever the data changes, serves the.torrent file over http and also acts as a tracker. You could wrap the FUSE client in some basic logic that detects newer torrents on the server and reloads/remounts.

Many moons ago I created a Linux distribution for a bank. It was based on Ubuntu NetBoot with minimal packages for their branch desktop. As the branches were serverless, the distro was self-seeding. You could walk into a building with one of them and use it to build hundreds of clones in a pretty short time. All you needed was wake-on-lan and PXE configured on the switches. The new clones could also act as seeds. Under the hood it served a custom Ubuntu repo on nginx and ran tftp/inetd and wackamole (which used libspread, neither have been maintained for years). Once a machine got built, it pulled a torrent off the "server" and added it to transmission. Once that was completed the machine could also act as a seed, so it would start up wackamole, inetd, nginx, tracker etc. At first you seed 10 machines reliably, but once they were all up, you could wake machines in greater numbers. Across hundreds of bank branches I deployed the image onto 8000 machines in a few weeks (most of the delays due to change control and staged rollout plan). Actually the hardest part was getting the first seed downloaded to the branches via their old Linux build, and using one of them to act as a seed machine. That was done in 350+ branches, over inadequate network connections (some were 256kbps ISDN)

mdaniel · a year ago

This may interest you, although as far as I know only AWS's S3 implements it: https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObjec...

I actually have never used it in order to know if AWS puts their own trackers in the resulting .torrent or what

raffraffraff · a year ago

hmmm, pretty cool. Of course you have to pay the egress tax, but still...

elevation · a year ago

> You could wrap the FUSE client in some basic logic that detects newer torrents on the server and reloads/remounts.

Wouldn't creating new torrents for each update to a dataset cause clients to retransfer data that hasn't changed?

raffraffraff · a year ago

From memory, with a standard torrent, when it loads up a new torrent file it verifies hashes on existing data, and if it matches up it leaves it. That's how it worked on my old distribution. Not sure how this would work on BTFS, but if it's lazy-loading data and if it handled existing data in the same way, it shouldn't be an issue. But I think this would suit infrequently changing data anyways.

nine_k · a year ago

I suppose blocks with the same hash as existing won't get re-requested.

apichat · a year ago

This tool should be upgrade to use Bittorrent v2 new functions.

https://blog.libtorrent.org/2020/09/bittorrent-v2/

Especially merkle hash trees which enable :

- per-file hash trees - directory structure

extraduder_ire · a year ago

Ever since I learned of bittorrent V2, I've been hoping for some per-file index to show up online. I'd love to be able to tag most or all of the bulk media files on my drives with how many copies exist in bittorrent swarms at any time. That way, I could more easily pick out big files for deletion knowing I could easily get another identical copy back later.

I tried doing something like this in the past using ipfs, but I wasn't very successful and didn't find any duplicates.

bardan · a year ago

somebody check the test functions...

logrot · a year ago

Especially the corrupt torrent test file.

Kerbonut · a year ago

I dream of having a BTFS that will fix my "damaged" media files. E.g. ones I media shift, if my disk was scratched and portions are missing, or if the codec options I picked suck, it could download the "damaged" portions of my media and fix it seamlessly.

kelchm · a year ago

Not the same as what you are talking about, but your comment reminded me of AccurateRip [1] which I used to make extensive use of back when I was ripping hundreds of CDs every year.

1: http://www.accuraterip.com/

a-french-anon · a year ago

Pretty sure AccuRip is only a collections of checksums to validate your rips. http://cue.tools/wiki/CUETools_Database actually improved on it to provide that healing feature (via some kind of parity, I guess?).

Related, I use and recommend https://github.com/cyanreg/cyanrip on modern UNIXes.

neolithicum · a year ago

Do you have any tricks you can share on how to rip a large library of CDs? It would be nice to semi-automate the ripping process but I haven't found any tools to help with that. Also the MusicBrainz audio tagging library (the only open one I am aware of?) almost never has good tags for my CDs that don't have to be edited afterwards.

tarnith · a year ago

Why not run a filesystem that maintains this? (ZFS exists, storage is cheap)

gosub100 · a year ago

another use of this is to share media after I've imported it into my library. if I voluntarily scan hashes of all my media, if a smart torrent client could offer those files only (so a partial torrent because I always remove the superfluous files) it would help seed a lot of rare media files.

Stephen304 · a year ago

This happens to be one of the pipe dream roadmap milestones for bitmagnet: https://bitmagnet.io/#pipe-dream-features

I used to use magnetico and wanted to make something that would use crawled info hashes to fetch the metadata and retrieve the file listing, then search a folder for any matching files. You'd probably want to pre-hash everything in the folder and cache the hashes.

I hope bitmagnet gets that ability, it would be super cool

jonhohle · a year ago

I’ve done a lot of archival of CD-ROM based games, and it’s not clear to me this is possible without a lot of coordination and consistency (there are like 7 programs that use AccurateRip, )and those only deal with audio). I have found zero instances where a bin/cue I’ve downloaded online perfectly matches (hashes) to the same disc I’ve ripped locally. I’ve had some instances where different pressings if the same content hash differently.

I’ve written tools to inspect content (say in an ISO file system), and those will hash to the same value (so different sector data but the same resulting file system). Audio converted to CDDA (16-bit PCM) will hash as well.

If audio is transcoded into anything else, there’s no way it would hash the same.

At my last job I did something similar for build artifacts. You need the same compiler, same version, same settings, the ability to look inside the final artifact and avoid all the variable information (e.g. time). That requires a bit of domain specific information to get right.

pigpang · a year ago

How you will calculate hash of file, when it broken, to lookup for?

rakoo · a year ago

You have all the hashes in the .torrent file. All you need is a regular check with it

(but then the .torrent file itself has to be stored on a storage that resists bit flipping)

everfree · a year ago

Just hash it before it's broken.

alex_duf · a year ago

if you store the merkle tree that was used to download it, you'll be able to know exactly which chunk of the file got a bit flip.

01HNNWZ0MV43FF · a year ago

You could do a rolling hash and say that a chunk with a given hash should appear between two other chunks of certain hashes

selcuka · a year ago

Just use the sector number(s) of the damaged parts.

Fnoord · a year ago

Distribute parity files together with the real deal, like they do on Usenet? Usenet itself is pretty much this anyway. Not sure if the NNTP filesystem implementations work. Also, there's nzbfs [1]

[1] https://github.com/danielfullmer/nzbfs

drlemonpepper · a year ago

storj does this

pyinstallwoes · a year ago

Submitting because I'm surprised why this isn't used more... couldn't we build a virtualmachine/OS's as an overlay on BTFS? Seems like an interesting direction.

Jhsto · a year ago

Just the other week I used Nix on my laptop to derive PXE boot images, uploaded those to IPFS, and netbooted my server in another country over a public IPFS mirror. The initrd gets mounted as read-only overlayfs on boot. My configs are public: https://github.com/jhvst/nix-config

I plan to write documentation of the IPFS process including the PXE router config later at https://github.com/majbacka-labs/nixos.fi -- we might also run a small public build server for peoples Flake configs, who are interested in trying out this process.

__MatrixMan__ · a year ago

I laughed when I saw that your readme jumps straight into some category theory. FYI others might cry instead.

You're doing some cool things here.

vmfunction · a year ago

>A prominent direction in the Linux distribution scene lately has been the concept of immutable desktop operating systems. Recent examples include Fedora Silverblue (2018) and Vanilla OS (2022). But, on my anecdotal understanding of the timelines concerning Linux distros, both are spiritual successors to CoreOS (2013).

Remember in the late 90's booting server off a CD-ROM was the thing.

jquaint · a year ago

This is really cool. Plan to take some inspiration from your config!

infogulch · a year ago

It's not an overlay provider itself, but uber/kraken is a "P2P Docker registry capable of distributing TBs of data in seconds". It uses the bittorrent protocol to deliver docker images to large clusters.

https://github.com/uber/kraken

XorNot · a year ago

The problem with being a docker registry is that you're still having to double-dip: distribute to the registry, then docker pull.

But you shouldn't need to: you should be able to do the same thing with a docker graph driver, so there is no registry - even daemon should perceive the local registry as "already available", even though in reality it's going to just download the parts it needs as it overlay mounts the image layers.

Which would actually potentially save a ton of bandwidth, since the stuff in an image is usually quite different to the stuff any given application needs (i.e. I usually base off Ubuntu, but if I'm only throwing a Go binary in there plus wanting debugging tools maybe available, then in most executions the actual image pulled to the local disk would be very small).

phillebaba · a year ago

Kraken is sadly a dead project, with little work being done. For example support for Containerd is non-existent or just not documented.

I created Spegel to fill the gap but focus on the P2P registry component without the overhead of running a stateful application. https://github.com/spegel-org/spegel

apichat · a year ago

https://github.com/uber/kraken?tab=readme-ov-file#comparison...

"Kraken was initially built with a BitTorrent driver, however, we ended up implementing our P2P driver based on BitTorrent protocol to allow for tighter integration with storage solutions and more control over performance optimizations.

Kraken's problem space is slightly different than what BitTorrent was designed for. Kraken's goal is to reduce global max download time and communication overhead in a stable environment, while BitTorrent was designed for an unpredictable and adversarial environment, so it needs to preserve more copies of scarce data and defend against malicious or bad behaving peers.

Despite the differences, we re-examine Kraken's protocol from time to time, and if it's feasible, we hope to make it compatible with BitTorrent again."

retzkek · a year ago

CVMFS is a mature entry in that space, heavily used in the physics community to distribute software and container images, allowing simple and efficient sharing of computing resources. https://cernvm.cern.ch/fs/

idle_zealot · a year ago

I'm not sure I see the point. A read-only filesystem that downloads files on-the-fly is neat, but doesn't sound practical in most situations.

crest · a year ago

It can be an essential component, but for on-site replication you need to coordinate your caches to make the most of your available capacity. There're efforts to implement this on top IPFS to have mutually trusted nodes elect a leader deciding who should pin what to ensure you keep enough intact copies of everything in the distributed cache, but like so many things IPFS it started out interesting, died from feature creep and "visions" instead of working code.

pyinstallwoes · a year ago

Imagine that any computation is a hash, then every possible thing becomes memoized not distinguishing between data/code. Then as a consequence you have durability, cache, security to an extent, verifiability through peers (could be trusted or degrees away from peers you trust).

cyanydeez · a year ago

If you were a billionaire and you wanted some software update, you could log into your super computer and have every shell mount the same torrent and it should be the fastest upload.

thesuitonym · a year ago

Every once in a while, someone reinvents Plan 9 from Bell Labs.

Maakuth · a year ago

This is the perfect client for accessing Internet Archive content! Each IA item automatically has a torrent that has IA's web seeds. Try Big Buck Bunny:

btfs https://archive.org/download/BigBuckBunny_124/BigBuckBunny_1... mountpoint

haunter · a year ago

I don’t know the internal workings of IA and the bittorent architecture but if an archive has too many items the torrent file won’t have them all. I see this all the time with ROM packs and magazine archives for example. +1000 items, the torrent will only have the first ~200 or so available

sumtechguy · a year ago

I think for some reason IA limits the torrent size. I have seen as low as 50 with a 1000+ item archive.

rnhmjoj · a year ago

Even better, try this:

    btplay https://archive.org/download/BigBuckBunny_124/BigBuckBunny_124_archive.torrent

Maakuth · a year ago

What is that tool?

dang · a year ago

BitTorrent file system - https://news.ycombinator.com/item?id=10826154 - Jan 2016 (33 comments)

sktrdie · a year ago

Or even better store data as an sqlite file that is full-text-search indexed. Then you can full-text search the torrent on demand: https://github.com/bittorrent/sqltorrent

ChrisArchitect · a year ago

Thoughts from 4 years ago:

https://news.ycombinator.com/item?id=23576063