How to fit any dataset with a single parameter

This reminds me of a joke idea I read somewhere: You can encode the entire Encyclopaedia Britannica using a single mark on a simple stick!

Just encode the text as a ascii codes after the decimal dot of a zero. (0.656168.. etc). Then just mark that ratio of the sticks length and you're done...

pveierland · 4 years ago

My shot at the calculation for fun :-)

Stick encoding with graphite resolution (0.335 * 10^-9 meter) [1]: "Uti" (31 bits -> 3 UTF8 characters)

Stick encoding with Planck resolution (1.616255 * 10^-35 meter) [2]: "Utility ought " (115 bits -> 14 UTF8 characters)

Complete first sentence: "Utility ought to be the principal intention of every publication." [3]

It appears that this storage scheme may not be suited towards the safekeeping of literature.

[1] https://www.wolframalpha.com/input/?i=floor%28floor%28log_2%...

[2] https://www.wolframalpha.com/input/?i=floor%28floor%28log_2%...

[3] https://digital.nls.uk/encyclopaedia-britannica/archive/1441...

Nition · 4 years ago

"Simply" scale up the stick to your desired minimum level of precision!

dotancohen · 4 years ago

I would look more into the limitations of the read tech (how accurately can we measure the stick length and the mark location) rather than inherent limitations of the medium. Very few technologies reach the actual theoretical limitations of the materials they are made of.

ur-whale · 4 years ago

> (115 bits -> 14 UTF8 characters)

Not exactly. If the 115bits are the hash/retrieval key of the actual content, then that can be a lot of information. Just have to have a big enough DB.

mjburgess · 4 years ago

I don't see this as a joke, but a radical and important point.

Reality, in being geometrical, is infinitely informationally dense (with a discrete conception of information).

This distinction between geometrical space and time, and discrete algorithmic computability is unbridgeable.

And hence there is an extemely firm footing on which to reject: AI, brain scanning readers, teleporters, etc and most sci-fi computationalim.

Almost nothing can be simulated, as in, realised by a merely computational system.

joshgrib · 4 years ago

I just wanted to add that reality being a non-discrete geometric thing is still an assumption - we don't know for sure that it isn't discrete and a lot of quantum stuff points more toward a discrete reality than a continuous one.

So assuming continuous space/time and discrete information then I'd agree, but as far as we know space/time aren't continuous, but just appear that way to us. It doesn't seem like we know for sure that it is discrete, but at the least I'd say it's solid evidence that stuff like brain scanning is definitely in the realm of possibility.

This answer from the Physics StackExchange nicely covers how time/space could appear continuous to us even if they are in fact discrete at lower levels. Also some interesting discussion in the other answers https://physics.stackexchange.com/a/35676

tbabb · 4 years ago

I think there are at least two things wrong with this take:

One is that I don't think it follows from the premise that the continuity of the physical world precludes AI, brain scanning, etc. Even if the physical world were continuous (likely not, see below), an arbitrary degree of approximation could be attained, in principle. At the very least I would not call the footing "firm".

The second is that the universe is very likely not continuous anyway. The Beckenstein bound[1] puts an upper limit on the number of bits of information a region of space may contain. If the ruler tickmark were either measured or localized to the precision required to encode the information, the information density would cause it to collapse into a black hole. This would happen once your measurement needs to be about as precise as a Planck length, which would allow you to encode about 115 bits of information with your tickmark.

(This in of itself is independent of the fact that you would need to construct the ruler out of objects that the universe permits; your ruler tickmark would need to be made of and measured with discrete fundamental particles, which by their very nature are quantized).

[1] https://en.wikipedia.org/wiki/Bekenstein_bound

yongjik · 4 years ago

In theory, pi has infinite digits. You could publish a book of a trillion digits of pi, and you have barely scratched the surface: in fact you published a precisely 0.00000% of all digits of pi.

In practice, you "only" need ~42 digits of pi to draw a circle spanning the entire known universe (diameter of 8.8 * 10^26 m) and it will deviate from the ideal circle by less than the size of a proton (0.8 * 10^−15 m).

Having a theoretically infinite precision does not mean that it makes a measurable difference.

gnramires · 4 years ago

I used to think there must be something 'special' to brains to distinguish us from computers. There isn't. Brains encode finite amounts of information (quantum mechanics seems to imply bounded local information). We are a huge information network ourselves -- that's what consciousness is (with some added bits like self-identity and various particulars structures that dictate the character of our experience).

But that doesn't mean brains aren't special -- it means brains are special and computers are special. Even more: it seems to imply computers, AI, etc. can be as special as ourselves, sentient, and perhaps even more special in ways we haven't realized yet.

It's difficult to even imagine a physical theory with unbounded local information. It seems to open the possibility to crazy things like hypercomputation, which do not seem very well defined. (For example: every 1-1/n seconds, (n>1) from now, flip a switch ON/OFF. At what state will the switch be at t>1s? An at exactly t=1s?)

Note: while information and information flow itself bounded (hence no hypercomputation), I don't know of any obvious objections to continuous time. (I'm not sure the continuity of time has any profound implications)

tsimionescu · 4 years ago

This is not just a false idea, but an obviously false one, contradicted by all the laws of physics. If you were right, then any finite volume would contain an infinite amount of information, which would mean it has infinite entropy, temperature, and energy.

Also, by the same logic you apply to space, you could say that time is infinitely divisible, so you could create a computer which finishes an infinite amount of steps in a finite amount of time.

wallacoloo · 4 years ago

These are weird conclusions. Any attempt to measure “reality” gives some amount of uncertainty. The only way for this to lead to the relatively stable experience you perceive is if those small variations in measurement lead to relatively small differences in perception. In which case, you can truncate the resolution of your simulation and still get plausible results.

I assure you there are plenty of groups out there simulating systems that operate with similar densities (but lower volume) to the brain.

dlivingston · 4 years ago

Even if space and time were continuous (which things like the Planck length would discredit), there are still discrete objects in that continuum.

Elementary particles, for example, are discrete. You could argue that they have continuous effects vis a vis the EM field and spatial positioning, but ensemble effects usually render that irrelevant at large enough scales.

eru · 4 years ago

How do you know that reality is 'being geometrical, is infinitely informationally dense'?

You might be interested in the Bekenstein bound (https://en.wikipedia.org/wiki/Bekenstein_bound):

> In physics, the Bekenstein bound (named after Jacob Bekenstein) is an upper limit on the thermodynamic entropy S, or Shannon entropy H, that can be contained within a given finite region of space which has a finite amount of energy—or conversely, the maximal amount of information required to perfectly describe a given physical system down to the quantum level.[1] It implies that the information of a physical system, or the information necessary to perfectly describe that system, must be finite if the region of space and the energy are finite. In computer science, this implies that there is a maximal information-processing rate (Bremermann's limit) for a physical system that has a finite size and energy, and that a Turing machine with finite physical dimensions and unbounded memory is not physically possible.

Lots of math works out well that as a continuous approximation, eg the Navier-Stokes differential equations seem to describe fluids well on everyday scales. But we know very well that water is made of molecules, so we know that this particular continuous approximation will fail at small enough scales.

By the way, the closest thing we have come to for teleportation is cutting-and-pasting of quantum states. So no classical, digital computers involved there.

Radim · 4 years ago

Others already noted how shaky your assertion "Reality, in being geometrical, is infinitely informationally dense" is.

Instead, let me throw out another extreme (but fun) view in the opposite direction: Finitism [0].

These guys not only reject the existence of the continuum; they reject all infinities altogether! In finitism, even discrete things exist only as finite objects (that is further constructable – Ultrafinitism [1]).

So no infinite universe, no "set of all natural numbers", no "limits" and other ideals over infinite domains. Screw Platonism. Hello Wittgenstein (and Wolfram).

I don't know how far that theory can be taken in a practical sense – most body of science is built on Platonism [2] – but I have to say finitism does appeal to my CS heart and my earthly experience.

[0] https://en.wikipedia.org/wiki/Finitism

[1] https://en.wikipedia.org/wiki/Ultrafinitism

[2] https://en.wikipedia.org/wiki/Primitive_recursive_arithmetic

dnautics · 4 years ago

> And hence there is an extemely firm footing on which to reject: AI

This is like saying it's impossible to build a water pump without solving the quantum mechanical interactions that govern water flow.

idiotsecant · 4 years ago

Reality isn't quite infinitely dense though, it's hard to imagine how you could encode data geometrically over a distance smaller than the planck length, for example. Reality just has a very, very fine resolution.

mgraczyk · 4 years ago

I wouldn't say that the distinction is "unbridgeable". Keep in mind that we send packets over plenty of "geometrical" and seemingly infinitely dense analog channels all the time, like electricity over a copper wire or EM waves over the air.

The "mark on a stick" channel has a capacity like any other channel. If you're sending just one symbol, you could easily calculate the information capacity given a desired probability of bit-error.

Assuming you can put the mark in exactly the right spot, you can model the "noise" as a distribution over the values that the reader will measure. If you model this as `mark + zero-mean normal distribution` with a known variance, then your stick is just an AWGN channel.

macrolocal · 4 years ago

As an aside, this idea that reality is "geometrical" leads to all kinds of fascinating paradoxes[1].

[1] https://arxiv.org/abs/1609.01421

vidarh · 4 years ago

Depends how you define simulation. There's no reason to assume that you need, or even that it's desirable, to simulate more than "almost nothing" in order to simulate what you'd want to simulate - even a whole universe can be simulated with ease if your required fidelity is low enough.

So, if you want to simulate every possible interaction in realtime, sure. But you can increase overall capabilities by making sacrifices along several axes: duration, speed, precision, persistence of changes from a generative baseline and lots more.

In other words it depends entirely of your purpose for simulation.

WithinReason · 4 years ago

How do you know reality is not computational?

whatshisface · 4 years ago

It's computational again when you allow the computation to be precise enough for any practical purpose, which is not infinitely precise.

Ginden · 4 years ago

Existence of analog reality is quite problematic, because it opens can of worms - eg. hypercomputation is possible.

Deleted Comment

potatoman22 · 4 years ago

To your last point: "All models are wrong, but some are useful."

somewhereoutth · 4 years ago

Indeed! I term it the 'Cardinality Barrier'.

It is gratifying to know that the mathematician Georg Cantor demolished AI some hundred odd years before any engineer had thought seriously of it.

nathancahill · 4 years ago

Something that I love about analog cameras and film: the "resolution" of film is infinite. Infinitely informationally dense as you say.

robbedpeter · 4 years ago

Planck length and the speed of light define the limits.

It's also a resolution of Xeno's Paradox.

still_grokking · 4 years ago

Only if real numbers are a "real" thing.

Deleted Comment

naasking · 4 years ago

> Reality, in being geometrical,

Prove it.

hammock · 4 years ago

What you are saying is P/=NP

perl4ever · 4 years ago

>infinitely informationally dense

That can't possibly be true, because then there would be no point to space and time. If a single location could hold an infinite amount of information, then the rest of reality would be redundant.

>hence there is an extemely firm footing on which to reject [the Matrix]

This seems to be the opposite of the conclusion that your premise implies. I'm the one that doesn't believe in matryoshka simulated universes, but infinite information density is what would make it possible, no?

If we lived in a non-discrete universe, why would computation be unable to exploit it?

gh02t · 4 years ago

And of course in a footnote of the documentation for this encoding "Sufficiently precise measurement of this mark left as an exercise to the reader."

fouc · 4 years ago

You reminded me of https://en.wikipedia.org/wiki/Gödel_numbering

> Each letter of the message is represented in order by the natural order of prime numbers—that is, the first letter is represented by the base 2, the second by the base 3, the third by the base 5, then by 7, 11, 13, 17, etc. The identity of the letter occupying that position in the message is given by the exponent, simply: the exponent 1 meaning that the letter in that position is an A, the exponent 2 meaning that it is a B, 3 a C, 4 a D, up to 26 as the exponent for a Z. The message as a whole is then rendered as the product of all the bases and exponents. Examples. The word 'cab' can thus be represented as 2^3 x 3^1 x 5^2, or 600.

Excerpt From: Frederik Pohl. “Starburst.”

meowphius · 4 years ago

That is from the novel "Hard-Boiled Wonderland and the End of the World" by Haruki Murakami[0].

[0]: https://everything2.com/title/Encyclopedia+on+a+toothpick

optimalsolver · 4 years ago

Martin Gardner illustrated this principle in "Paradoxes to Puzzle and Delight" back in 1982, and I'm sure he wasn't the first to think of the concept.

ohazi · 4 years ago

You jest, but this is almost exactly how arithmetic coding works.

If you arrange your symbols and contexts carefully, you can even use this as a technique for progressive or lossy compression -- i.e. the more accurately you specify the ratio, the higher fidelity your result.

joiguru · 4 years ago

One theoretical objection to this idea is that distances measured have an accuracy limit of Planck length (1.616e-35 meters). So if your number needs more precision then "just mark" step can't be done.

alittlesalami · 4 years ago

This is Arithmetic Coding: https://en.wikipedia.org/wiki/Arithmetic_coding

thehappypm · 4 years ago

You can do it easier than that -- just make a stick of length .65168 meters. (Same idea, just the "full 1 meter stick" is defined elsewhere, not by the length of the stick).

tshaddox · 4 years ago

That’s way harder, because you have to cut a stick to write and have a meter ruler to read.

an1sotropy · 4 years ago

I remember reading about this as a kid in one of Martin Gardner's books, either "aha!" or "aha! Gotcha"

signa11 · 4 years ago

how big of a stick are we talking about here ?

Dead Comment

Fun read. But if we allow ourselves as much precision as we need, we don't even need a parameter. Any constant that is a normal number should suffice. Such constants already contain every possible sequence of digits you could muster -- i.e., they already contain every possible dataset.

EDIT: I replaced "transcendental" with "normal" after reading Scarblac's comment below: https://news.ycombinator.com/item?id=28699622 -- many important transcendental numbers, including π (Pi), are thought to be (but have not been proven to be!) normal.

KerrickStaley · 4 years ago

It's not true that any transcendental number would work. A transcendental number is a number which isn't the root of any polynomial with rational coefficients. This property doesn't imply that it contains all number sequences.

It hasn't even been proven that pi contains all number sequences.

See https://math.stackexchange.com/questions/216343/does-pi-cont... for more details.

cs702 · 4 years ago

You're right. I edited my comment. Thank you for pointing it out!

Scarblac · 4 years ago

I don't believe just being transcendental is sufficient for that property.

E.g. I can't imagine that pi with all the '1' digits replaced by '2' isn't transcendental, but it clearly doesn't contain every sequence of digits.

I think you mean normal numbers.

cs702 · 4 years ago

You're right: normal, not transcendental.

IIRC, most real numbers are normal, and many important transcendental numbers are thought to be (but have not been proven to be) normal.

I edited my comment. Thank you for the correction!

gnramires · 4 years ago

Right, but any constant doesn't "encode" this information. To use a normal constant to encode information, you need to encode the location of the substring of interest. Generally, the location of the substring of interest needs to require as many bits as the substring itself (unless there's an intimate relationship between the number and the substrings of interest). So arguably it's the location that encodes the data, the number is irrelevant (and why not encode the data directly into this location?).

creddit · 4 years ago

This actually isn't known to be true afaict: https://www.askamathematician.com/2009/11/since-pi-is-infini...

cs702 · 4 years ago

You're right. I edited my comment. Thank you for pointing it out!

analog31 · 4 years ago

You could use the digits of pi as an end-of-file marker, which would indeed make it transcendent. ;-)