I wrote a program which I'm proud of having done and would like to keep it for posterity. What's a good storage medium where I can keep and load again in the future? Requirements are: size < 1GB, must keep for at least 3 decades, must be easily transportable (for moves between houses and such) and can sit on a shelf. Bonus points for suggestions on an equally stable storage type that some computer will still be able to understand in the future.
All my oldest preserved code (early 80s) is on paper, the things it occurred to me at the time to print out. No fancy archival paper either, just listings printed out on my dot matrix printer onto fanfold printer paper.
Anything from that era that I didn't print out is gone.
From the late 80s onward I still have all the files that I've cared to save. The general answer to that is that there is no persistent medium, you need to commit to keep migrating that data forward to whatever makes sense every so often.
I copied my late 80s 5.25" floppies to 1.44MB floppies in the early 90s. In the mid 90s I copied anything accumulated to CD-Rs. In the 2000s I started moving everything to DVD-Rs.
From the late 2000s until today I have everything (going back to those late 80s files) on a ZFS pool with 4-way mirroring.
Of course, aside from preserving the bits you also need to be able to read them in a future. Avoid all proprietary formats, those will be hopeless. Prefer text above all else, that will always be easily readable. For content where text is impossible, only use open formats which have as many independent open source implementations as possible to maximize your chances of finding or being able to port code that can still read the file 30-40 years from now. But mostly just stick with plain text.
"The best long term backup strategy is a string of robust middle term solutions." This was for me the most insightful comment I read (as far as I can remember) on Tim Bray's blog[0] many years ago.
[0]: https://www.tbray.org/ongoing/
Have NEVER heard of this
That's bullshit.
It's the other way around. Inkjet fades out and gets washed off by a little bit of moisture, while dot-matrix "just" fade out.
Don't skimp on toner and paper, don't get it to rot and it'll last centuries.
My understanding is that until recently the UK was printing laws on vellum, for maximum archival durability... but they printed onto the vellum using a normal laser printer. So it must be pretty durable? A laser printer uses simple carbon, rather than complex inks.
Meanwhile, personally, I back up old data but hardly ever look at it, since it's mixed in with old programs that probably won't work and a thousand photos I don't want to see. So maybe laser-etched platinum will last longer, but the barrier to reading it will certainly be higher.
Doesn't seem like it's just one source code file if op states : "Requirements are: size < 1GB,"
Depending on font size (say 10pt) and average characters per line, that would be printing several hundred thousand paper pages which is not feasible for the average homeowner to roundtrip back into usable digital files.
Instead of a cheap flatbed scanner, you now need a high-speed auto-fed document scanner and then run batch jobs to OCR several hundred thousand tif images back to digital source files.
One could reduce the paper count by compressing the source text to zip and then printing a form of binary-to-text (e.g. UUENCODE) but now the papers have random-looking gibberish instead of readable text.
Printing <10 MB source code is more realistic than <1 GB.
(But I'm guessing the author may actually has much less than 1,000,000,000 bytes of original source code if one leaves out 3rd-party dependencies.)
Good luck with anything that has a "node_modules" folder ;) Please do not deforest earth for that.
And I would check if there are issues with either cheap ink fading or cheap toner flaking off on the multi-decade timescale. Though you will probably appreciate a decent printer anyway to chug through 1GB of text!
I wouldn't be so sure. I need paper that is not basic for a hobby of mine (cyanotype printing, a photographic process that is very sensitive to high pH) and that is actually quite difficult to find these days. Almost all available paper is acid-free, because it doesn't cost much to the manufacturer to add calcium carbonate buffer to the paper.
It doesn't mean that cheap paper is good for archival (if only because it likely lacks mechanical strength) but paper made in the last two decades or so is rather unlikely to become yellow and brittle in the future, and it should keep quite long if stored in correct conditions.
Everything I stored on Diskettes, CDs, DVDs, Bluerays were only short-term backups in my opinion; due to the rapidly evergrowing need for more space and Sony trying to push their patented technologies to all markets. I had to buy a drive on eBay to restore backups years later, only to realize that the CDs were totally unreadable due to UV degradation.
These days my backup strategy is redundant USB hard drives, with the assumption that USB will continue to be supported longer than the current SATA versions and current disc-based mediums.
The only thing that survived all this time were ZIP drives and DVD-RAMs. They are still awesome. But sadly nobody uses them anymore, so access to replacement mediums and drives is a little limited :(
If you're talking spinning rust, beware hard drives that haven't been spun up in a long while have a tendency to "stick". I'd suggest starting every hard drive up at least yearly and scrubbing the contents.
There’s a couple of schools of thought. In general archivists want to preserve the original document, but at that time they were already losing access to 1980s word processing formats.
Some folks advocate PDF/A output as a “standard” preservation technique. The people I was working with were making a point in time TIFF image of whatever was being preserved and storing it side by side at the time. (I think they transitioned to PDF/A when the spec was revised) PDF/A is the standard for US Courts, so renderers will be available for a hundred years or more.
It’s an interesting problem space because time is not kind to electronic documents. Even stuff like PowerPoint from circa 2000 doesn’t always render cleanly today. When “H.269” is released in 2050, will anyone ship H.264 codecs?
BluRay discs are expected to last 50 to 100 years at least. It's longer than magnetic tapes. Still not paper but, well, kinda inconvenient to "print" 1 GB of data on paper in a way that's easy to store / re-read.
I have 80s floppies (5"1/4) that still can be read fine but I'd say at least 1/3rd of them are now failing. Still: after about 35 years, I'd say it's not bad. I expect BluRay discs to completely outlive me.
Film is incredibly durable, will easily last 100 years.
I used to work in the data protection industry, doing backup software integration. Customers would ask me stupid questions like "what digital tape will last 99 years?"
They have a valid business need, and the question isn't even entirely stupid, but it's Wrong with a capital W.
The entire point of digital information vs analog is the ability to create lossless copies ad infinitum. This frees up the need to reduce noise, increase fidelity, and rely on "expensive media" such as archival-grade paper, positive transparency slides, or whatever.
You can keep digital data forever using media that last just a few years. All you have to do is embrace its nature, and utilise this benefit.
1. Take a cryptographic hash of the content. This is essential to verify good copies vs corrupt copies later, especially for low bit-error-rates that might accumulate over time. Merkle trees are ideal, as used in BitTorrent. In fact, that is the best approach: create torrent files of your data and keep them as a side-car.
2. Every few years, copy the data to new, fresh media. Verify using the checksums created above. Because of the exponentially increasing storage density of digital media, all of your "old stuff" combined will sit in a corner of your new copy, leaving plenty of space for the "new stuff". This is actually better than accumulating tons of low-density storage such as ancient tape formats. This also ensures that you're keeping your data on media that can be read on "current-gen" gear.
2. Distribute at least three copies to at least three physical locations. This is what S3 and similar blob stores do. Two copies/locations might sound enough, but temporary failures are expected over a long enough time period, leaving you in the expected scenario of "no redundancy".
... or just pay Amazon to do it and dump everything into an S3 bucket?
But it doesn't provide protection against you forgetting to pay AWS, you losing your credentials, your account getting hacked, or your account getting locked by some overzealous automation.
There is no indication that this statement holds true. Not even remotely.
Businesses fold all the time. How many services still exist today that existed 30 years ago? Not in some archive, but still operational?
In addition to that problem, tech half-life continues to decrease. 30 years in the future is likely more comparable to 60 years in the past. Hello punch-cards.
Be warned though that restoration takes special procedures, time, and can be expensive. So Glacier is most definitely a place for storing stuff you hope you'll never need, not just a cheap file repository.
The Glacier fees for retrieving data in minutes are incredibly awful, so take that into account. Count on waiting 12 hours to get your stuff for cheap.
At $0.00099/GB/month, it would cost ~$12/year to store 1TB. Retrieval cost is $0.0025/GB and bandwidth down is $0.09/GB (exorbitant! But you get 100GB/mo free)
So, retrieving 1TB (924GB chargeable) once will run ~$85. I've also excluded their http request pricing which shouldn't matter much unless you've millions of objects.
For the same amount of data, Backblaze costs ~$60/year to store but only $10 to retrieve (at $0.01/GB).
I suppose an important factor to consider in archival storage is the expected number of retrievals, and whether you can handle the cost.
I always thought a, say, zfs pool with 2-disk redundancy is not only redundant (RAID) but also servers as a backup (through snapshots). The 3-2-1 rule is good, but I feel like zfs is powerful enough to change that. A pool with scrubbing, some hardware redundancy and snapshots could/should no longer require two backups, just a single, offsite one.
Edit: Oh, and I want it to keep existing after I'm no longer alive.
> Each was packaged as a single TAR file.
> For greater data density and integrity, most data was stored QR-encoded, and compressed.
> A human-readable index and guide found on every reel explains how to recover the data
> The 02/02/2020 snapshot, consisting of 21TB of data, was archived to 186 reels of film by our archive partners Piql and then transported to the Arctic Code Vault, where it resides today.
It's this way because "archive and then forget about it" isn't really a thing. It turns out an archive that is not maintained is no archive.
It costs the Internet Archive $2/GB to store content in perpetuity, maybe create an account, upload your code as an item, donate $5 to them, and call it a day. Digitally sign the uploaded objects so you can prove provenance in the future (if you so desire); you could also sign your git commits with GPG and bundle the git repo up as a zip for upload.
EDIT: @JZL003
The Internet Archive has their own storage system. I would assume it caps out because they're operating under Moore's Law assumption that cost of storage will continue to decrease into the future (and most of their other costs are fixed). Of course, don't abuse the privilege. There are real costs behind the upload requests, and donating is cheap and frictionless.
https://help.archive.org/help/archive-org-information/
> What are your fees?
> At this time we have no fees for uploading and preserving materials. We estimate that permanent storage costs us approximately $2.00US per gigabyte. While there are no fees we always appreciate donations to offset these costs.
> How long will you store files?
> As an archive our intention is to store and make materials in perpetuity.
https://archive.org/web/petabox.php
Quick envelope math, if they were using backblaze pricing, $5 would give a gb 83 years of storage. But it's unclear if backblaze is actually regionally duplicated
http://news.bbc.co.uk/2/hi/technology/2534391.stm
"""But the snapshot of in the UK in the mid-1980s was stored on two virtually indestructible interactive video discs which could not be read by today's computers. """
I can't find the back story now, but if they weren't able to source a working laser disk reader from a member of the public (which IIRC took quite a bit of effort to find), then accessing this data - digitized in the early 1980s - would have cost a fortune.
The inspiration for this project, the 900-year-old Domesday Book, is just as readable today as it was in 1980 (and in 1200 or so). The ability to read data with one's eyes should not be underestimated.
This entire page is about 122 kB, is clearly laid out and easy to read.
If I check a similar short-ish news item today (https://www.bbc.com/news/business-61185298) my browser (with ad blocker) needs to load 3.8 MB of data (31 times as much) and I can see less of the actual content.
Instead of Web3, can we maybe go back to Web1?
As an aside, I still don't understand what Web3 aims to solve but I feel Web 2 is good enough if people don't go crazy with js, images, ads and other shenanigans.
The only true solution is a living one, where you have make sure you have the ability to get your data from an old format to a new one periodically. More importantly, you should look into the idea of 3-2-1 Backups. Anything that you intend to keep indefinitely is subject random events, ie fire, flood, tornado, theft, etc. Having multiple archives in separate systems is more import than worrying trying to ensure a single copy will last a long time.
Storing less than a gigabyte is very cheap to do in multiple formats, such as USB flash drive, external hard drive, CD, BlueRay disc, etc. You can hedge against data corruption with PAR2 files. Also, consider storing a copy on the cloud, ie Backblack B2, AS S3, etc. Again, I suggest creating PAR2 files and/or using an archive format that can resist damage.
Just create calendar events to check periodically the integrity of your archives. Having problems reading a CD, use the hard drive backup to burn a new one. This also a good time to consider if one or more your formats is no longer viable.
Finally, realize that a program runs within an environment and those get replaced over time. You need to no only back up your program, but probably want to store the operating system and tools around it.
Use mainstream technology media formats for physical storage. It's trivial to get a USB floppy drive for reading floppies from the early 80's, but getting hold of a new drive to read LS-120 disks from the late 90's/early 00's is pretty much impossible. BluRay is probably the best bet for physical media for the next 20 years. I've done some trials with SD cards but they seem less reliable than BluRay.
Coding is not the problem, the hardware is.
You might like to read through the site, but if not then I would suggest keeping it safe via storage in multiple formats and locations. If I really wanted to keep something safe and wanted to put effort into it I would put it on a remote service, an external physical media that I might store somewhere else safe, and whenever I get a new computer it would get backed up to my computer. This of course puts extra managerial requirements on you, which for me would be difficult because of the ADHD problems, but of course you would need to make sure to keep your remote service or make sure if you are getting rid of it that you have a plan for moving stuff etc.
In my case I have multiple computers so I would also make sure important to preserve stuff was backed up to all of them.
All of which reminds me I should update a bunch of my stuff.
They’re special DVD and Blu-ray discs designed for long-term storage. DVD and Blu-ray are so widely used, it seems likely you’d be able to find some equipment in 30 years that could still read them.
Yes.
There are too many use cases for physical and immutable long-term offline storage for this niche to go unfilled, but the niche is too small (at present) to prompt the development of a replacement medium and format, so while I am sure that the materials and read/write hardware will continue to evolve (better data longevity guarantees, read/write speed, physical durability, etc.) the implementations will remain compatible, or at least the reading ones will.
I think the disc players are the weak link. I can definitely imagine them going away nigh-entirely in a decade or three.
35 mm film is also interesting but probably costs a fortune: https://www.piql.com/services/long-term-data-storage/