"Durability" is the term you're looking for. S3 and Glacier famously have 99.999999999% durability. The quote goes "if you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years."
I think the commenter was asking if they refresh the storage enough for it to not die if natural causes. If data is left around on most storage media it will be gone within 75 years if not refreshed somehow.
This is only relevant if you have billions of objects in S3, when your failure rate can be amortized. Otherwise either Amazon is going to go bankrupt or you’ll be the one person who has total data loss by chance.
AWS does regular integrity checks on all media, and repairs any errors it finds. This includes Glacier and Deep Glacier. (It's in the first few bits of the S3 docs)
Backblaze B2 is untrustworthy. I specified a bad encryption key- and it accepted the data, but it wasn't retreiveable.
For the extra $1/TB/mo Wasabi is the best For hot backups, IMHO.
Also, I theorize that you can avoid the major cost of AWS Glacier deep archive by downloading it through CloudFront- which has 1TB/no free.
So:
1. Make your restore request near the end of the month.
2. Glacier restore ~2TB of days (bulk restore, slow.) to S3 hot tier.
3. Download 1TB while on the tail end of the first month, and another 1TB when CloudFront resets.
The huge egress costs are minimized this way, as well as minimizing the GB/s charges of regular S3.
“Another challenge in conventional storage media is their unsuitability for long-term storage, with optical discs, solid-state drives, and hard-disk drives having lifespans of 25 years, 12 years, and 10 years, respectively […] Moreover, the stability of DNA was proved by the successful recovery of ancient DNA under burial conditions. The studies have shown that preservation of DNA does not require additional energy for data storage.“ [1]
No, DNA is terrible. Microsoft's Project Silica seems to be the contender for indefinite, maintenance free, storage. But, there's the whole "It's Glass" issue.
Yes, you can chuck your SSD into a freezer. Data retention time increases exponentially in lower temperatures, so keeping it in a regular +4C fridge is enough to extend retention by decades.
Just remember to heat up the disk before writing and after storage.
There are multiple problems. Storing stems or a rendered mix can be fully durable with some care to replicate it sufficiently. But what of for instance the session files, which are likely tied to custom hardware which eventually becomes scarce.
Object storage is a fine proposition for long term retention but it does nothing for the organizational problem that someone needs to continuously pay the bill and ensure the provider didn't lose anything, and that can easily get lost in M&A, estate liquidation, etc.
The bottom line is, if something is worth saving, you need someone to take on the role of archivist that will balance the technical and economic changes that go with preservation. There is nothing passive about it unless hope is the strategy.
There are high density binary microfilm optical formats on archival grade film stock that should be stable for several hundred years. Although tbh I'm an M-DISC guy.
Object storage in the cloud is likely to succeed there, but then cost and security issues arise.
If data are encrypted, then managing keys is another pain/cost dimension.
At the several decade point, keeping copies at multiple vendors becomes a discussion point, since even Google and Amazon are not likely to be immortal, and that Ukrainian data center might experience physical security challenges.
You're prohibited from duplicating physical media well past the point where it is likely to have degraded.
Thus effectively for much data the copyright balance no longer exists. Much work that should enter the public domain is instead merely wasted. Promotion of the arts and sciences is no longer served.
Spins me out that a storage and archive company didn't think to make regular copies (if I read the article correctly and they, rather than the client, were at fault)
Lots more discussion on the source, as referenced in the article: https://news.ycombinator.com/item?id=41504331
Do they just store it and forget it till I try to access, or copy it to new media every month/year to make sure it’s still accessible?
Same question for Backblaze B2 and such.
Does anyone know?
Backblaze makes similar claims.
Backblaze B2 is untrustworthy. I specified a bad encryption key- and it accepted the data, but it wasn't retreiveable.
For the extra $1/TB/mo Wasabi is the best For hot backups, IMHO.
Also, I theorize that you can avoid the major cost of AWS Glacier deep archive by downloading it through CloudFront- which has 1TB/no free.
So: 1. Make your restore request near the end of the month. 2. Glacier restore ~2TB of days (bulk restore, slow.) to S3 hot tier. 3. Download 1TB while on the tail end of the first month, and another 1TB when CloudFront resets.
The huge egress costs are minimized this way, as well as minimizing the GB/s charges of regular S3.
That’s an odd dig at B2. If you had done the bare minimum to test your backup it wouldn’t have been an issue.
So, it's just simple math that you can use to calculate your answer.
I’m trying to figure out an appropriate strategy for backing up things that are important to me.
If they move things around, great, else I’ll need to do it myself.
So yes, they copy it to new media, after it has failed.
Math is crazy.
Glacier might also have a heuristic for media lifetime, but ehhh.
“Another challenge in conventional storage media is their unsuitability for long-term storage, with optical discs, solid-state drives, and hard-disk drives having lifespans of 25 years, 12 years, and 10 years, respectively […] Moreover, the stability of DNA was proved by the successful recovery of ancient DNA under burial conditions. The studies have shown that preservation of DNA does not require additional energy for data storage.“ [1]
[0] https://www.nature.com/articles/s41598-024-58386-z
[1] https://link.springer.com/article/10.1007/s13534-024-00386-z
[1] https://unlocked.microsoft.com/sealed-in-glass/
Just remember to heat up the disk before writing and after storage.
https://www.ni.com/en/support/documentation/supplemental/18/...
Object storage is a fine proposition for long term retention but it does nothing for the organizational problem that someone needs to continuously pay the bill and ensure the provider didn't lose anything, and that can easily get lost in M&A, estate liquidation, etc.
The bottom line is, if something is worth saving, you need someone to take on the role of archivist that will balance the technical and economic changes that go with preservation. There is nothing passive about it unless hope is the strategy.
https://en.wikipedia.org/wiki/Optical_tape -- https://en.wikipedia.org/wiki/Write_Once_Read_Forever
If data are encrypted, then managing keys is another pain/cost dimension.
At the several decade point, keeping copies at multiple vendors becomes a discussion point, since even Google and Amazon are not likely to be immortal, and that Ukrainian data center might experience physical security challenges.
Deleted Comment
I have an M-Disc writer and some M-Disc DVDs, so I took note of this submission ~2 years ago:
PSA: Verbatim no longer sells real M Discs, now puts regular BD-Rs in M Disc packaging - https://old.reddit.com/r/DataHoarder/comments/yu4j1u/psa_ver... / https://news.ycombinator.com/item?id=33593967
https://youtu.be/-rfEYd4NGQg?si=QoCve6CAPajmmBiX
Deleted Comment
You're prohibited from duplicating physical media well past the point where it is likely to have degraded.
Thus effectively for much data the copyright balance no longer exists. Much work that should enter the public domain is instead merely wasted. Promotion of the arts and sciences is no longer served.
Dead Comment