Readit News logoReadit News
Posted by u/binaryapparatus 2 years ago
Show HN: Saf – simple, reliable, rsync-based, battle tested, rounded backupgithub.com/dusanx/saf...
I had this backup code working reliably for years, using local file system, vps/dedicated server, or remote storage for backup, then I finally get time to wrap README, iron few missing switches and publish. Should be production ready and reliable, so it could be useful to others. Contributors are welcome.

<https://github.com/dusanx/saf>

BoppreH · 2 years ago
How do you automate the checking if the backup worked correctly, in face of saf bugs, rsync bugs/misconfiguration, or bit rot?

My solution is to pick a few random files (plus whatever is new), and compute their hashes on both local and remote versions. But it's slow and probabilistic. ZFS also helps, but I feel it's too transparent to rely on (what if the remote storage changes filesystem).

binaryapparatus · 2 years ago
Those same questions always bug me, and I did try all from very smart to very brute force solutions. I love ZFS but then we can question ZFS and OS bugs in the same manner as saf or rsync -- that rabbit hole is deep and quickly becomes expensive since ZFS may need ECC ram and other more expensive components.

Lately, in last few years, I am leaning towards using many cheap backups instead of clever and more expensive ones, with the idea that many of them can't all break at the same time. Yes occasional checks are good but safety in numbers seems as a good strategy.

It is not an accident that saf tag line says "one backup is saf, two are safe, three are safer" ;)

binaryapparatus · 2 years ago
"saf bugs, rsync bugs/misconfiguration"

On top of many cheap backups, I am also trying not to rely on any single peace of technology (I know, it is not ideal that hardware and OS remains the same on any computer no matter what backup is used). If I use saf as my preferred rsync based solution I will also use Borg or duply/duplicity as a additional backup to avoid rsync bugs.

Having two or more rsync based backups, so they all go trough the same rsync pipe, makes much less sense than mixing completely different backup solutions, right?

dspillett · 2 years ago
> But it's slow and probabilistic.

A couple of things I do:

1. Generate a list of files on both sides and the sizes & dates, and compare that ignoring any that have changed/appeared since before the last backup cycle started. Unless your backups are truly massive in terms of number of files this is practical to automate and run at least as often as your backup cycle, and this catches many system errors or simple failures of the backups to run at all.

2. Occasionally checksum the whole damn lot in your latest snapshot and the originals. This can take a lot of time (and expense of you are using child storage with read access charges) so you want to do it less often but it catches bit rot and similar issues. Again you have to skip files that have been touched since the start of the last backup cycle.

3. If you keep a checksum (or list of files with checksum) of each snapshot, occasionally pick one and verify it from scratch. As with hashing the latest snapshot this can be quite resource intensive for massive backups but is fine for mine. You can also just compare meta-data (files, sizes, dates) to a stored list which will catch some types of filesystem corruption affecting your older snapshots.

One of these days I'll might get around to tidy+documenting+publishing my scripts that run all this…

BoppreH · 2 years ago
That's close to what I do[1]. The size and date comparison is done by rsync, and I keep a text file with all expected file hashes, so if there's any disagreement between copies I know which one to trust.

These hashes are also ordered so that the top files haven't been checked the longest; part of the script is to take the top N files, checksum them, and move them to the bottom of the list. This guarantees every file is checksum once per N days.

I also donwload a random file in every run, to make sure the connection is not broken.

My use case is personal photos and videos, so I also make sure that my local files are never changed.

And finally, I highly recommend Hetzner Storage Boxes. Not only are they dirty cheap while still giving you ZFS and samba access, you can actually SSH into the box and run simple commands on the files locally, like sha25sum, without paying for network transfers.

[1] https://github.com/boppreh/cloud_backup_script/

otterpro · 2 years ago
Wow, I like this a lot, as it looks easy to run and it can sync to multiple targets. My local backup consists of JBOD (not RAID, ZFS or BTRFS) so I think this should work nicely. I've been using a shell script for doing something similar for backup, but it lacked a lot of the features.
killingtime74 · 2 years ago
It might be safer to use an rsync lib that calls librsync or at least wraps the calls for you. I'm always suspicious of sub-shelling
gizmo · 2 years ago
How does it deal with interrupted backups?

Can it automatically prune backups older than N days?

I don’t see anything about encryption.

binaryapparatus · 2 years ago
> How does it deal with interrupted backups?

Any new backup is hardlinked against previous in temporary 'in-progress' directory, then renamed to proper name at the end. If backup breaks, new 'saf backup' by default first removes 'in-progress' than starts things again (linking with latest good one) but you can 'saf backup --resume' to try to finish interrupted one. I prefer clean try again (which is the default) but --resume works well too.

> Can it automatically prune backups older than N days?

Yes, manually by 'saf prune' on top of 'saf backup' doing prune itself. Prune periods are defined in each .saf.conf, per backup source location, with the defaults of 2/30/60/730/3650 days, for all/daily/weekly/monthly/yearly backups. All defaults are easy to change per source.

> I don’t see anything about encryption.

saf doesn't deal with encryption, only with transport. I prefer to use other specialized tool for the encryption if I have such backup target that needs one.

dspillett · 2 years ago
> I don’t see anything about encryption.

Many prefer to deal with encryption separately, encrypting the volumes being backed up to rather than relying on the backup system to manage that.

Of course this adds a consideration to your system: how to backup your encryption keys, and use them to remount the volumes when needed, in a way that does not render the whole thing pointless by accidentally exposing your keys to the wild. Then again, the encryption included in many backup systems has these issues for you to resolve too.

pmontra · 2 years ago
Why not rsnapshot? I've been using it to backup servers to servers for a lot years.
binaryapparatus · 2 years ago
In my understanding, rsnapshot is equivalent of 'saf backup' which is only one bit of saf functionality. saf has few more commands to be able to see and analyze what's on the backup target side.

rsnapshot uses centralized rsnapshot.conf, saf has git style .saf.conf per each backup source location.

Apart from using rsync, there are more differences than similarities between rshapshot and saf.

BigBalli · 2 years ago
how is it better/safer than manually using rsync?
sureglymop · 2 years ago
Have a look at restic for a good alternative to this.
paddim8 · 2 years ago
Typical HN to immediately tell everyone to use something else when someone posts something they spent time and effort on making, just because it's not 100% unique
__fst__ · 2 years ago
These alternative recommendations are exactly what I'm looking for when browsing the comments.
sureglymop · 2 years ago
I didn't mean to detract from the post. I use both rsync and restic and this looks interesting. Just wanted to provide a recommendation for an alternative that does very similar things; I hope that's okay.
gjvc · 2 years ago
"It behooves every man to remember that the work of the critic is of altogether secondary importance, and that, in the end, progress is accomplished by the man who does things." -- Theodore Roosevelt
tetris11 · 2 years ago
Resric is great but the lack of empty passwords, and the response by the developer about it is very grating:

https://github.com/restic/restic/issues/1786

bandyaboot · 2 years ago
He very politely said he thinks it’s better to keep the password requirement in place and was deciding to do that. What’s grating about that? Personally, I think his concern about users mistakenly not setting a password could be alleviated with an explicit —insecure flag, or similar.
BrandoElFollito · 2 years ago
This is the exact reason why I do not use restic.

This is a backup tool, not a security one. The fact that the author does not understand this is a real problem and a red flag.

aborsy · 2 years ago
It’s a good idea to enforce passwords for security. The features of backups done right are incremental backup, snapshot, deduplication, encryption and compression.

Deleted Comment