Readit News logoReadit News
Posted by u/agomez314 3 years ago
Ask HN: What's the most stable form of digital storage?
I wrote a program which I'm proud of having done and would like to keep it for posterity. What's a good storage medium where I can keep and load again in the future? Requirements are: size < 1GB, must keep for at least 3 decades, must be easily transportable (for moves between houses and such) and can sit on a shelf. Bonus points for suggestions on an equally stable storage type that some computer will still be able to understand in the future.
jjav · 3 years ago
If the question is literally about just one program source code, the answer is easy: print it out.

All my oldest preserved code (early 80s) is on paper, the things it occurred to me at the time to print out. No fancy archival paper either, just listings printed out on my dot matrix printer onto fanfold printer paper.

Anything from that era that I didn't print out is gone.

From the late 80s onward I still have all the files that I've cared to save. The general answer to that is that there is no persistent medium, you need to commit to keep migrating that data forward to whatever makes sense every so often.

I copied my late 80s 5.25" floppies to 1.44MB floppies in the early 90s. In the mid 90s I copied anything accumulated to CD-Rs. In the 2000s I started moving everything to DVD-Rs.

From the late 2000s until today I have everything (going back to those late 80s files) on a ZFS pool with 4-way mirroring.

Of course, aside from preserving the bits you also need to be able to read them in a future. Avoid all proprietary formats, those will be hopeless. Prefer text above all else, that will always be easily readable. For content where text is impossible, only use open formats which have as many independent open source implementations as possible to maximize your chances of finding or being able to port code that can still read the file 30-40 years from now. But mostly just stick with plain text.

Loic · 3 years ago
But please, do not print on a laser printer. Use an inkjet printer or dot matrix printer. Laser prints have the bad tendency to "unstick" themselves from the paper, you end up losing everything.

"The best long term backup strategy is a string of robust middle term solutions." This was for me the most insightful comment I read (as far as I can remember) on Tim Bray's blog[0] many years ago.

[0]: https://www.tbray.org/ongoing/

aforwardslash · 3 years ago
Dot matrix also tends to clear out during the years, specially with more acidic paper (such as recycled ones). Saying this as thousands of pages of my programming-related dot matrix prints slowly fade away :)
orangepurple · 3 years ago
> Laser prints have the bad tendency to "unstick" themselves from the paper, you end up losing everything.

Have NEVER heard of this

bayindirh · 3 years ago
If you're printing with Inkjet, and you want to use color, make sure that all cartridges are pigmented (which is generally true for HP DesignJet series). Otherwise almost all HP black inks are pigment based, use all-black printouts.
mmcgaha · 3 years ago
I have seen the issue that you are talking about but I am not sure what causes it. I have 25 year old documents printed with laser printer that have not released from the paper at all yet I have ten year old documents where it happens when I bend the paper. My pet theory is that modern paper is not as good because of the high recycled content but that would obviously take a significant effort to test. Another possibility is that modern printers are the problem but this would be even harder to test. I don't print anything to keep anymore so it doesn't really matter anyway.
lstodd · 3 years ago
> Laser prints have the bad tendency to "unstick" themselves from the paper, you end up losing everything.

That's bullshit.

It's the other way around. Inkjet fades out and gets washed off by a little bit of moisture, while dot-matrix "just" fade out.

Don't skimp on toner and paper, don't get it to rot and it'll last centuries.

chrisseaton · 3 years ago
> But please, do not print on a laser printer.

My understanding is that until recently the UK was printing laws on vellum, for maximum archival durability... but they printed onto the vellum using a normal laser printer. So it must be pretty durable? A laser printer uses simple carbon, rather than complex inks.

FpUser · 3 years ago
I have some docs printed in late 80s / early 90s on HP laser. Still pretty much in perfect condition. No "unsticking".
atoav · 3 years ago
Heh of we are going that route why not just laser-engrave it into animal skin?
rdlw · 3 years ago
Here's another benefit to printing: once a decade, when you migrate storage boxes or move houses, what are the odds you'll look at your printouts and reminisce? Probably pretty good, since it's easy to look at something.

Meanwhile, personally, I back up old data but hardly ever look at it, since it's mixed in with old programs that probably won't work and a thousand photos I don't want to see. So maybe laser-etched platinum will last longer, but the barrier to reading it will certainly be higher.

jasode · 3 years ago
>If the question is literally about just one program source code, the answer is easy: print it out.

Doesn't seem like it's just one source code file if op states : "Requirements are: size < 1GB,"

Depending on font size (say 10pt) and average characters per line, that would be printing several hundred thousand paper pages which is not feasible for the average homeowner to roundtrip back into usable digital files.

Instead of a cheap flatbed scanner, you now need a high-speed auto-fed document scanner and then run batch jobs to OCR several hundred thousand tif images back to digital source files.

One could reduce the paper count by compressing the source text to zip and then printing a form of binary-to-text (e.g. UUENCODE) but now the papers have random-looking gibberish instead of readable text.

Printing <10 MB source code is more realistic than <1 GB.

(But I'm guessing the author may actually has much less than 1,000,000,000 bytes of original source code if one leaves out 3rd-party dependencies.)

ri0t · 3 years ago
For a fully reproducible and backed up build, you will need to print out the 3rd (and 4th and so on!) party stuff as well :(

Good luck with anything that has a "node_modules" folder ;) Please do not deforest earth for that.

adhesive_wombat · 3 years ago
But do use some kind archival-grade paper. The £2 per ream supermarket crap that curls up in hot weather is probably full of acid and will become brittle and fall apart eventually.

And I would check if there are issues with either cheap ink fading or cheap toner flaking off on the multi-decade timescale. Though you will probably appreciate a decent printer anyway to chug through 1GB of text!

seszett · 3 years ago
> The £2 per ream supermarket crap that curls up in hot weather is probably full of acid

I wouldn't be so sure. I need paper that is not basic for a hobby of mine (cyanotype printing, a photographic process that is very sensitive to high pH) and that is actually quite difficult to find these days. Almost all available paper is acid-free, because it doesn't cost much to the manufacturer to add calcium carbonate buffer to the paper.

It doesn't mean that cheap paper is good for archival (if only because it likely lacks mechanical strength) but paper made in the last two decades or so is rather unlikely to become yellow and brittle in the future, and it should keep quite long if stored in correct conditions.

cookiengineer · 3 years ago
But, meanwhile we got USB. I'd argue that USB thumbdrives/HDDs will continue to work for another decade; and ext4 will probably survive that, too.

Everything I stored on Diskettes, CDs, DVDs, Bluerays were only short-term backups in my opinion; due to the rapidly evergrowing need for more space and Sony trying to push their patented technologies to all markets. I had to buy a drive on eBay to restore backups years later, only to realize that the CDs were totally unreadable due to UV degradation.

These days my backup strategy is redundant USB hard drives, with the assumption that USB will continue to be supported longer than the current SATA versions and current disc-based mediums.

The only thing that survived all this time were ZIP drives and DVD-RAMs. They are still awesome. But sadly nobody uses them anymore, so access to replacement mediums and drives is a little limited :(

hulitu · 3 years ago
There are 2 big problems with USB: data retention and connector format.
yencabulator · 3 years ago
> These days my backup strategy is redundant USB hard drives

If you're talking spinning rust, beware hard drives that haven't been spun up in a long while have a tendency to "stick". I'd suggest starting every hard drive up at least yearly and scrubbing the contents.

kurupt213 · 3 years ago
Cuneiform. We still have fired clay bablylonian tablets from the early Bronze Age
TheRealPomax · 3 years ago
So, "fired clay" then =)
Spooky23 · 3 years ago
Great points. I worked with some archivists on a project several years ago (When ODF was a big thing) and was surprised at the amount of controversy.

There’s a couple of schools of thought. In general archivists want to preserve the original document, but at that time they were already losing access to 1980s word processing formats.

Some folks advocate PDF/A output as a “standard” preservation technique. The people I was working with were making a point in time TIFF image of whatever was being preserved and storing it side by side at the time. (I think they transitioned to PDF/A when the spec was revised) PDF/A is the standard for US Courts, so renderers will be available for a hundred years or more.

It’s an interesting problem space because time is not kind to electronic documents. Even stuff like PowerPoint from circa 2000 doesn’t always render cleanly today. When “H.269” is released in 2050, will anyone ship H.264 codecs?

TacticalCoder · 3 years ago
> I copied my late 80s 5.25" floppies to 1.44MB floppies in the early 90s. In the mid 90s I copied anything accumulated to CD-Rs. In the 2000s I started moving everything to DVD-Rs.

BluRay discs are expected to last 50 to 100 years at least. It's longer than magnetic tapes. Still not paper but, well, kinda inconvenient to "print" 1 GB of data on paper in a way that's easy to store / re-read.

I have 80s floppies (5"1/4) that still can be read fine but I'd say at least 1/3rd of them are now failing. Still: after about 35 years, I'd say it's not bad. I expect BluRay discs to completely outlive me.

tmaly · 3 years ago
It is interesting to think of ancient Egypt's use of papyrus for paper. Very little of it remains. But all of those carvings in rock, you can go to a museum and see stuff 2000 years old.
orangepurple · 3 years ago
Use a ZFS snapshot on a rotation so you can restore accidentally deleted files (a hedge against user error, ransomware, etc.)
crazypython · 3 years ago
Or microfiche, tiny paper.
bradfa · 3 years ago
The only hard part is probably finding someone who can "print" onto microfilm these days. Might be worth talking to your local librarian, they probably know who still does it and how much it costs.

Film is incredibly durable, will easily last 100 years.

ffhhj · 3 years ago
This. Inkjet printer and acid-free cotton paper.
jiggawatts · 3 years ago
"Hashes + Copies + Distribution"

I used to work in the data protection industry, doing backup software integration. Customers would ask me stupid questions like "what digital tape will last 99 years?"

They have a valid business need, and the question isn't even entirely stupid, but it's Wrong with a capital W.

The entire point of digital information vs analog is the ability to create lossless copies ad infinitum. This frees up the need to reduce noise, increase fidelity, and rely on "expensive media" such as archival-grade paper, positive transparency slides, or whatever.

You can keep digital data forever using media that last just a few years. All you have to do is embrace its nature, and utilise this benefit.

1. Take a cryptographic hash of the content. This is essential to verify good copies vs corrupt copies later, especially for low bit-error-rates that might accumulate over time. Merkle trees are ideal, as used in BitTorrent. In fact, that is the best approach: create torrent files of your data and keep them as a side-car.

2. Every few years, copy the data to new, fresh media. Verify using the checksums created above. Because of the exponentially increasing storage density of digital media, all of your "old stuff" combined will sit in a corner of your new copy, leaving plenty of space for the "new stuff". This is actually better than accumulating tons of low-density storage such as ancient tape formats. This also ensures that you're keeping your data on media that can be read on "current-gen" gear.

2. Distribute at least three copies to at least three physical locations. This is what S3 and similar blob stores do. Two copies/locations might sound enough, but temporary failures are expected over a long enough time period, leaving you in the expected scenario of "no redundancy".

... or just pay Amazon to do it and dump everything into an S3 bucket?

michaelt · 3 years ago
S3 eliminates the risk of a disk becoming unreadable, or losing data in a fire. And it's overwhelmingly likely S3 will still exist in an easily readable form in 30 years time.

But it doesn't provide protection against you forgetting to pay AWS, you losing your credentials, your account getting hacked, or your account getting locked by some overzealous automation.

wildmanxx · 3 years ago
> And it's overwhelmingly likely S3 will still exist in an easily readable form in 30 years time.

There is no indication that this statement holds true. Not even remotely.

Businesses fold all the time. How many services still exist today that existed 30 years ago? Not in some archive, but still operational?

In addition to that problem, tech half-life continues to decrease. 30 years in the future is likely more comparable to 60 years in the past. Hello punch-cards.

dale_glass · 3 years ago
For S3 specifically you want to use Glacier. It's made for long term storage and is very, very cheap to store in.

Be warned though that restoration takes special procedures, time, and can be expensive. So Glacier is most definitely a place for storing stuff you hope you'll never need, not just a cheap file repository.

The Glacier fees for retrieving data in minutes are incredibly awful, so take that into account. Count on waiting 12 hours to get your stuff for cheap.

maxwelldone · 3 years ago
Glacier Deep is the cheapest option. It does come with a catch that there's a minimum of 180 days commitment for their infrequent access tier. Last time I checked, the cost for US-East-1 is roughly like this:

At $0.00099/GB/month, it would cost ~$12/year to store 1TB. Retrieval cost is $0.0025/GB and bandwidth down is $0.09/GB (exorbitant! But you get 100GB/mo free)

So, retrieving 1TB (924GB chargeable) once will run ~$85. I've also excluded their http request pricing which shouldn't matter much unless you've millions of objects.

For the same amount of data, Backblaze costs ~$60/year to store but only $10 to retrieve (at $0.01/GB).

I suppose an important factor to consider in archival storage is the expected number of retrievals, and whether you can handle the cost.

diarrhea · 3 years ago
Sounds like points 1 and 2 can be elegantly combined using "next-gen" filesystems like zfs or btrfs. The hashing and scrubbing (automatic repair) happens in the background and the swapping to new/fresh media is automatic through replacing failing hard drives. Plus, the two are open and widely adopted standards.

I always thought a, say, zfs pool with 2-disk redundancy is not only redundant (RAID) but also servers as a backup (through snapshots). The 3-2-1 rule is good, but I feel like zfs is powerful enough to change that. A pool with scrubbing, some hardware redundancy and snapshots could/should no longer require two backups, just a single, offsite one.

michaelgrafl · 3 years ago
What if I don't want to backup stuff, but archive and then forget about it?

Edit: Oh, and I want it to keep existing after I'm no longer alive.

Cthulhu_ · 3 years ago
If it's code like the OP seems to indicate, publish it on github; many services draw copies of source code from Github, and they themselves once put all code into cold storage for posterity: https://archiveprogram.github.com/arctic-vault/

> Each was packaged as a single TAR file.

> For greater data density and integrity, most data was stored QR-encoded, and compressed.

> A human-readable index and guide found on every reel explains how to recover the data

> The 02/02/2020 snapshot, consisting of 21TB of data, was archived to 186 reels of film by our archive partners Piql and then transported to the Arctic Code Vault, where it resides today.

Tao331 · 3 years ago
That's the "pay someone else to do it" option.

It's this way because "archive and then forget about it" isn't really a thing. It turns out an archive that is not maintained is no archive.

imtringued · 3 years ago
Build a pyramid and carve your data into walls deep inside the pyramid.
toomuchtodo · 3 years ago
Any medium that is stable physically for at least a few decades and can be read optically. Acid free paper with the data machine encoded, laser etched metal, etc. Anything traditional would need to be online (HDD), easily reread and verified over time (tape), or is not recommended (SSD).

It costs the Internet Archive $2/GB to store content in perpetuity, maybe create an account, upload your code as an item, donate $5 to them, and call it a day. Digitally sign the uploaded objects so you can prove provenance in the future (if you so desire); you could also sign your git commits with GPG and bundle the git repo up as a zip for upload.

EDIT: @JZL003

The Internet Archive has their own storage system. I would assume it caps out because they're operating under Moore's Law assumption that cost of storage will continue to decrease into the future (and most of their other costs are fixed). Of course, don't abuse the privilege. There are real costs behind the upload requests, and donating is cheap and frictionless.

https://help.archive.org/help/archive-org-information/

> What are your fees?

> At this time we have no fees for uploading and preserving materials. We estimate that permanent storage costs us approximately $2.00US per gigabyte. While there are no fees we always appreciate donations to offset these costs.

> How long will you store files?

> As an archive our intention is to store and make materials in perpetuity.

https://archive.org/web/petabox.php

BrianHenryIE · 3 years ago
I never thought to donate to Internet Archive before. Thanks, done. I use the Wayback Machine too much to not pay for it!
JZL003 · 3 years ago
Where did you get that number, out of curiosity. Google's cloud storage is 2 cents per gb per month and backblaze b2 is $0.005 p/ gb per month. I understand in perpetuity is more expensive but why does it cap out as opposed to being a yearly price per gb (maybe if you assume hard drive storage will decrease at a similar rate?)

Quick envelope math, if they were using backblaze pricing, $5 would give a gb 83 years of storage. But it's unclear if backblaze is actually regionally duplicated

lostmsu · 3 years ago
The idea is that cost of storing 1GB will reduce over time ending up being a convergent infinite sum.
wmf · 3 years ago
tarboreus · 3 years ago
83 years != perpetuity, hosted != archived.
beagle3 · 3 years ago
This is very relevant, and perhaps deserves a submission of its own:

http://news.bbc.co.uk/2/hi/technology/2534391.stm

"""But the snapshot of in the UK in the mid-1980s was stored on two virtually indestructible interactive video discs which could not be read by today's computers. """

I can't find the back story now, but if they weren't able to source a working laser disk reader from a member of the public (which IIRC took quite a bit of effort to find), then accessing this data - digitized in the early 1980s - would have cost a fortune.

The inspiration for this project, the 900-year-old Domesday Book, is just as readable today as it was in 1980 (and in 1200 or so). The ability to read data with one's eyes should not be underestimated.

leokennis · 3 years ago
Remark on the side:

This entire page is about 122 kB, is clearly laid out and easy to read.

If I check a similar short-ish news item today (https://www.bbc.com/news/business-61185298) my browser (with ad blocker) needs to load 3.8 MB of data (31 times as much) and I can see less of the actual content.

Instead of Web3, can we maybe go back to Web1?

maxwelldone · 3 years ago
To add on to your observation, reading mode is even better to look at and loads just shy of 16KB.

As an aside, I still don't understand what Web3 aims to solve but I feel Web 2 is good enough if people don't go crazy with js, images, ads and other shenanigans.

cdumler · 3 years ago
There isn't. Sorry, but there just isn't a permanent format. The real problem isn't the storage media but that technological standards evolve. Tape media is excellent at surviving. I have a 9 track digital tape keepsake from when I used to work with it regularly some 20 years ago. I'm absolutely certain that the data on it is still good. I don't have the 300-pound "dishwasher" drive that can read it, the three-phase power to run it, nor a DEC Vax that understands EBCDIC encoding.

The only true solution is a living one, where you have make sure you have the ability to get your data from an old format to a new one periodically. More importantly, you should look into the idea of 3-2-1 Backups. Anything that you intend to keep indefinitely is subject random events, ie fire, flood, tornado, theft, etc. Having multiple archives in separate systems is more import than worrying trying to ensure a single copy will last a long time.

Storing less than a gigabyte is very cheap to do in multiple formats, such as USB flash drive, external hard drive, CD, BlueRay disc, etc. You can hedge against data corruption with PAR2 files. Also, consider storing a copy on the cloud, ie Backblack B2, AS S3, etc. Again, I suggest creating PAR2 files and/or using an archive format that can resist damage.

Just create calendar events to check periodically the integrity of your archives. Having problems reading a CD, use the hard drive backup to burn a new one. This also a good time to consider if one or more your formats is no longer viable.

Finally, realize that a program runs within an environment and those get replaced over time. You need to no only back up your program, but probably want to store the operating system and tools around it.

2000UltraDeluxe · 3 years ago
An addition to your list:

Use mainstream technology media formats for physical storage. It's trivial to get a USB floppy drive for reading floppies from the early 80's, but getting hold of a new drive to read LS-120 disks from the late 90's/early 00's is pretty much impossible. BluRay is probably the best bet for physical media for the next 20 years. I've done some trials with SD cards but they seem less reliable than BluRay.

dotancohen · 3 years ago

  > You can hedge against data corruption with PAR2 files.
This is the key phrase from the entire thread. PAR2 will enable the user to create recovery files, if (actually, when) part of the original data will be corrupted. The recovery files (by default 5% the size of the original files) should be stored alongside the original files in each location.

usr1106 · 3 years ago
DEC VAXes have never used EBCDIC. At least not natively. Probably they could convert from/into it, but the Unix/Linux program dd on my Raspberry Pi can do that, too.

Coding is not the problem, the hardware is.

kps · 3 years ago
9-track tape is common enough that you can pay to have it read at reasonable cost.
bryanrasmussen · 3 years ago
If you want to keep it secure for at least three decades you should follow the principle of Lockss https://www.lockss.org/ "Lots of copies keeps stuff safe".

You might like to read through the site, but if not then I would suggest keeping it safe via storage in multiple formats and locations. If I really wanted to keep something safe and wanted to put effort into it I would put it on a remote service, an external physical media that I might store somewhere else safe, and whenever I get a new computer it would get backed up to my computer. This of course puts extra managerial requirements on you, which for me would be difficult because of the ADHD problems, but of course you would need to make sure to keep your remote service or make sure if you are getting rid of it that you have a plan for moving stuff etc.

In my case I have multiple computers so I would also make sure important to preserve stuff was backed up to all of them.

All of which reminds me I should update a bunch of my stuff.

jzer0cool · 3 years ago
Is this something to store on their site? I see it is open source but didn't see any examples of a running site - anyone come across an ide just to explore some feature sets here?
profquail · 3 years ago
M-DISC: https://en.m.wikipedia.org/wiki/M-DISC

They’re special DVD and Blu-ray discs designed for long-term storage. DVD and Blu-ray are so widely used, it seems likely you’d be able to find some equipment in 30 years that could still read them.

cdubzzz · 3 years ago
Do you really think physical disc and media players for them will last long? I still cling to an old blu-ray player but every time I buy some discs it feels like I’m sifting through the ruins of a collapsed building (in some giant bin in the middle of a large retailer hallway). I also feel like I never see a single other person looking at or purchasing discs…
webmaven · 3 years ago
> Do you really think physical disc and media players for them will last long?

Yes.

There are too many use cases for physical and immutable long-term offline storage for this niche to go unfilled, but the niche is too small (at present) to prompt the development of a replacement medium and format, so while I am sure that the materials and read/write hardware will continue to evolve (better data longevity guarantees, read/write speed, physical durability, etc.) the implementations will remain compatible, or at least the reading ones will.

NateEag · 3 years ago
The research suggests that M-DISC Blu-Rays should be fairly durable if not handled often.

I think the disc players are the weak link. I can definitely imagine them going away nigh-entirely in a decade or three.

xupybd · 3 years ago
I didn't manage to find anywhere to buy these in my country. They could be tricky to get.
wmf · 3 years ago
Microsoft glass storage is probably close to the best but not commercially available: https://www.microsoft.com/en-us/research/project/project-sil...

35 mm film is also interesting but probably costs a fortune: https://www.piql.com/services/long-term-data-storage/

Mindwipe · 3 years ago
This is definitely the correct answer - the glass project is really good, and having spoken to people working on it they are absolutely the closest to nailing this. It is close too, at least to a wider rollout.
the_only_law · 3 years ago
Ah damn that glass thing actually seems really cool. I’ll have to look and see if there’s much info out there about it right now. Ive heard a little bit about very fast lasers (link mentions femtosecond lasers) and was curious what all sort of non-academic uses they see.