How to Store Data on Paper?

pastage · 9 months ago

Bits per [cm²|cm³|kg] is interesting like you get with cuneiform ceramic tablets[1], this one get about 1 word per cm² and cuneiform is crazy dense, I have no real grasp of how sumerian or akkadian words worked. I think it was heavily context based because from some lecture[2] at the British Museum.

I have seen people do ceramics where information was stacked in layers and had to be destroyed to extract. The ultimate form of shifting media to preserve and read information. I guess that could done with better resolution with 3D printed Zirconia (0.1 mm³ blobs) so 1Mb /cm³

Edit: this idea of a cold storage is from Footfall by Niven and Pournelle, where information was stored on monoliths where layers could be incrementally extracted with tools documented on the above layers. i.e. start with 0.1 bit per m² and go down, done with the hand wavy handling of practical problems in science fiction.

[1] https://www.bookandsword.com/2016/10/29/the-information-dens...

[2] https://youtu.be/XVmsfL5LG90

pavel_lishin · 9 months ago

I don't recall the Fithp's artefacts requiring destruction to read; I thought they were just created with (presumably) lasers, writing the information in a way that would resist erosion - if one does get eroded, you just slice the eroded part away, re-revealing it again.

pastage · 9 months ago

I do not remember, and tried not not imply destruction. It is just the easiest way to do it on your own with ceramics.

dsign · 9 months ago

Akkadian is/was syllabic. The language is pretty well preserved I believe, some say there is more text in Akkadian than in classical Latin[^1].

[^1]: Can't find the source right now, so take this with a grain of salt.

tocs3 · 9 months ago

I have been thinking about this for a long time. Thanks for the link.

The biggest advantage of character-based encodings is that they can be decoded by humans (as opposed to dot-based encodings), which means that you don’t need a camera or a scanner to recover the data.

This is an interesting point. In our post apocalyptic future scholars will be using their quills to translate archives of these (in my imagination anyway). Of course they would have to translate into binary and then into human chars.

I can imaging they will be sad they cannot listen to the mp3's.

Adding color allows on to code more information per dot (3x more with three colors).

Is this right? Wouldn't it be base-3 encoding? Three bits of binary can count to 8. Three trits of base three can count to 27. Color has all sorts of disadvantages but maybe a much greater payoff (unless I m mistaken).

kragen · 9 months ago

If a pixel can be printed with no colors (white), cyan, magenta, yellow, cyan and magenta (blue), magenta and yellow (red), yellow and cyan (green), or all three inks (black), that's 8 colors, 3 bits per pixel, not just 3 colors. Typically laser and inkjet printers do more or less work like this, but also have a fourth ink, which is black.

I am very skeptical of this idea that people will be able to write but unable to produce useful digital computers. Computers are a mathematical discovery, not an electronic invention. Electronics makes them a thousand times faster, but a computer made out of wood, flax threads, and bent copper wire would still be hundreds of times faster than a person at tabulating logarithms, balancing ledgers, calculating ballistic trajectories, casting horoscopes, encrypting messages, forecasting finances, calculating architectural measurements, or calculating compound interest. So I think automatic computation as such is likely to survive if any human intellectual tradition does.

tocs3 · 9 months ago

I am very skeptical of this idea that people will be able to write but unable to produce useful digital computers.

I agree. When I first saw the post and the mention of humans in the reading end of the loop, I though "maybe there is a scifi story here". Hard to imagine a scenario that left humans but not many artifacts except caches of paper (or other "printed" media). Maybe a remote tribe of uncontacted people (or another species altogether) inherit the Earth after a modern world apocalypse kills off everyone in the technologically more advanced world.

A civilization starting from scratch would still need to develop a fair bit of math and tech/science sophistication before understanding and starting to use artifacts left behind. In particular optical/color on paper scanners would have been difficult before the 20th century.

usrbinbash · 9 months ago

> In our post apocalyptic future scholars will be using their quills to translate archives of these

Imagine tomes of programming lore, dutifully transcribed by rooms of silent scribes, acolytes carrying freshly finished pages to and fro, each page beautifully illuminated wih pictures of the binary saints, to ward off Beelzebug.

sweettea · 9 months ago

See also: the first part of A Canticle for Leibowitz.

adzm · 9 months ago

The inhernt errror resilience in charactre encoding of human languige is also an intersetnig point.

myself248 · 9 months ago

This is why, when pulling wire, I write out the numbers longhand on the end of each one. "SEVENTEEN" is a lot more smudge-resistant and unambiguous umop-apisdn than "L1".

mackmgg · 9 months ago

> Is this right? Wouldn't it be base-3 encoding? Three bits of binary can count to 8. Three trits of base three can count to 27. Color has all sorts of disadvantages but maybe a much greater payoff (unless I m mistaken).

In this case they're not directly using the color to store information, they just have three differently colored QR codes overlayed on top of each other. With that method you can use a filter to separate them back out and you've got three separate QR codes worth of data in one place. The way they're added ends up using more than just three colors in that example.

If you were truly to use colored dots to store binary information without worrying about using a standard like QR, I think you'd be going from base-2 (white and black) to base-3 (red, blue, green) or more likely base-4 (white, red, blue, green) or even base-8 (if you were willing to add multiple colors on top of each other) in which case yeah you'd have way more than just 3x the data density.

CorrectHorseBat · 9 months ago

>this case they're not directly using the color to store information, they just have three differently colored QR codes overlayed on top of each other. With that method you can use a filter to separate them back out and you've got three separate QR codes worth of data in one place. The way they're added ends up using more than just three colors in that example.

That's only true if you can print and read colors in a higher resolution/don't destroy information at 3x the density with color, I'm not sure if that's generally true.

>If you were truly to use colored dots to store binary information without worrying about using a standard like QR, I think you'd be going from base-2 (white and black) to base-3 (red, blue, green) or more likely base-4 (white, red, blue, green) or even base-8 (if you were willing to add multiple colors on top of each other) in which case yeah you'd have way more than just 3x the data density.

Base 8 is exactly 3x the data density. (Log(8)/log(2))

Clamchop · 9 months ago

CMYK makes more sense for printing, e.g. https://en.m.wikipedia.org/wiki/High_Capacity_Color_Barcode

spencerflem · 9 months ago

I think for that use-case (copying by quill), just writing plaintext from the start would be the move

CorrectHorseBat · 9 months ago

Adding 3 colors would make it base 5 (BW+rgb) and give log(5)/log(2) or about 2.3 times the information per dot.

LgWoodenBadger · 9 months ago

2 dots at 5 possibilities each gives 25 (5^2)

2 dots at 2 possibilities each gives 4 (2^2)

They only diverge from there. Or am I doing my math wrong?

benhurmarcel · 9 months ago

I have this type of issue professionally too, even though we don't use paper. For regulatory reasons, the only approved format we are allowed to use for long term archiving is PDF/A. No attachments, only pages in a single PDF document.

It has shown to be an issue for including data, or spreadsheets. Most colleagues just print Excel files to a PDF that gets appended, but while it complies with the regulation it's basically unusable as-is.

anthk · 9 months ago

DjVU should be the standard format.

rickcarlino · 9 months ago

I got curious about OCR as a sort of poor man’s microfiche. I printed a test paragraph on high quality paper with a laser printer. The smallest font I could read under a USB microscope was 2.5pt, though I could probably have gone smaller if I used polymer paper. The fibers of the paper are quite apparent under a microscope. Transparency film paper was too smudgy.

lifthrasiir · 9 months ago

I pondered this from time to time and concluded that paper data storage is of very limited use, mainly because of the information density. Any remotely human-readable form is too sprase to be useful (<10 KB/page), while dot-based or color-based approaches are heavily limited by printing techniques (<500 KB/page). It is hard to preserve paper, unless you are willing to sacrifice its information density even more.

For this reason, paper is at best useful as a bootstrapping mechanism, which would allow readers to construct a mechanism to read more densely encoded data. My best guess is that the main storage of information in this case would likely be microfilms, which should be at least 100x dense than the ideal paper data storage. Higher density allows for using less dense encodings to aid readers. And as far as I know microfilms are no harder to preserve than papers.

pastage · 9 months ago

It is degrading too fast, microfilm archives need to be digitilized now, the solvent and image chemicals and media are all part of the problem with microfilms. Archival paper is a nice medium that can be stored a long time. This is of course a question of how long you want to store your information if you want to do 00500 years it is probably good.

Or just go with metal https://rosettaproject.org/

Or try to create a culture for humans and store information in that.

xyzzy123 · 9 months ago

Metal engraving fairly accessible these days.

Fiber laser in 100W range would do it, maybe $10k?

You could do photochemical etching but would be more fuss and wouldn't last as long as a laser engraving.

Probably looking at order of 1gig/1000kg if using 1mm 316 plate (napkin math only, naive estimate). Interesting to explore.

lifthrasiir · 9 months ago

Maybe. Anything that can be photographically etched and is durable enough would work well.

tokai · 9 months ago

This. The right paper will last significantly longer than microfilm.

IAmBroom · 9 months ago

> It is hard to preserve paper, unless you are willing to sacrifice its information density even more.

We have paper books from 500 years ago. Microfiche is already deteriorating.

If you keep paper dry and flat, and use pH-neutral inks and paper, it is extremely stable.

01HNNWZ0MV43FF · 9 months ago

Dry and flat... Laminated? Or will the plastic degrade quicker than the paper?

b112 · 9 months ago

I wonder, as others have said, an easily OCRable font. However, maybe an added compressor, zip type program specially designed for the limited character set.

If we just have text files, and mayve vector graphics for simple schematics, that's a lot of info.

ctrlc-root · 9 months ago

Those fonts do exist:

https://en.wikipedia.org/wiki/OCR-A

https://en.wikipedia.org/wiki/OCR-B

account-5 · 9 months ago

Color Dot Encodings is interesting, you could encode data in a floor mozaic. And with my limited understanding the more colours the high the amount of data?

You could encode data in monolithic structures this way. They'd last longer than paper and given future generations lots of confusion trying to figure out the meaning.

datadrivenangel · 9 months ago

Except when the colors fade over time and people steal the purple ones to decorate their homes preferentially.

goda90 · 9 months ago

Just "backup" the data with duplication. For example you could color the floor beneath the mosaic, and the grout used for each tile, so as each layer is removed or faded, it still lasts a little longer. Duplicate your mosaic on both the floor and the ceiling. Duplicate your mosaic in multiple buildings in multiple cities.

6510 · 9 months ago

I haven't build it because it costs a bit to much for my budget but someone some day should build the megalithic computer according to my vision: We take a river flowing down a mountain in a suitable location and carve out square canals. The AND gate is done by having a giant door attached to two blocks hollowed out from the bottom. If both blocks are submerged in water together they lift the door and water may flow into the rest of the circuitry. A grid of basins functions as the display and to store values. The input is done by putting weights onto the floating blocks thereby preventing them from lifting specific doors. I doubt it can be made large enough to run doom but it doesn't hurt to be ambitious.

kragen · 9 months ago

This is a brilliant idea, not because it is practical but because it is not.

ryukoposting · 9 months ago

Fun fact: magazines actually distributed software on paper briefly in the 1980s.

https://youtu.be/mIGotStRCkA?si=toG5xeLMZzjIGTxC

It's more like a long, linear barcode, but still. More often, they put the source code in the magazine and you'd just type it into your machine.

lizknope · 9 months ago

I typed in a lot of Atari BASIC code from magazines but I never heard of this. Really cool!

Ghoelian · 9 months ago

Oh yeah, I forgot all about those! The ZX spectrum was way before my time, but for some reason I still spent a lot of time typing code over from a magazine into a spectrum emulator as a kid.