Readit News logoReadit News
xiphias2 · 4 years ago
This is actually really cool article, though it could be a bit better written for sure.

The new Tesla 8 and 16 bit floating number formats in the Dojo system supports this kind of rounding up/down using a PRNG when compressing neural network parameters from higher precision floats (it is specified in the floating point paper).

The random rounding is needed to not have a bias in the neural network weight updates, but this article improves the rounding method to be more accurate (closer to the real rounding than uniform sampling) without reintroducing bias.

IanCal · 4 years ago
I would have naiively thought that you would just send a 1 with a probability equal to the number you want to send.

Is this compared in the post? Some of it went over my head.

rav · 4 years ago
Yes, that is covered in this half paragraph:

> Randomized rounding, where the sender sends 1 with probability x and 0 otherwise, and the receiver uses the received bit X as the estimate x', has the property that E[x'] = x. Unbiased estimators are arguably more natural for many estimation problems. Here the measure of performance would be the maximum variance for the estimate over all inputs x, so for randomized rounding the cost is 1/4 (when x = 1/2).

The "cost"/"variance" of 1/4 in the above paragraph is talked about later in the article as the "expected squared error"; the main result is a scheme that has a cost of just 0.04599, which is better than the "send a 1 with a probability x" scheme, which has a cost of 1/4 = 0.25.

IanCal · 4 years ago
Thank you, I found it tricky to read things as I usually do as I had less to latch on to.
ant6n · 4 years ago
The post doesn't put in a lot of effort to be understandable.
wk_end · 4 years ago
What are some real-world situations where you can only send a single bit but have shared randomness? That seems like a bizarre constraint. Is this just - and no shame in this - math for math’s sake?
mprime1 · 4 years ago
Real world situation that is very similar to this: you send a friend a few bits (a seed) they plug it into Minecraft and the both of you can play in the same exact yet practically infinite world. That is possible due to the shared randomness built into the world generation on both computers.
0cVlTeIATBs · 4 years ago
This sounds halfway related to a problem my statistics professor worked on, sending data over a shared interface without detection. Note that these scenarios assume discrete time.

"Shared randomness" might be the current noise level on some radio frequency in enemy territory. You and your receiving friend can both hear everything, but as soon as one of you starts transmitting you must have a very careful strategy to avoid detection.

asxd · 4 years ago
I could imagine a scenario where you have many clients sending you information repeatedly, and you’d like to restrict the size of each packet. I think what this post is saying is that during the initial client setup you could add an arbitrary amount of shared randomness to increase accuracy while keeping the packet size down.

To more directly answer your question, the post mentions federated learning of machine models as an application.

JoachimSchipper · 4 years ago
Concretely, imagine a sensor node sampling a lot of low-quality sensors frequently. You can afford some setup (to share an AES key, generating as much shared (pseudo-)random as you want), but you want your per-measurement overhead to be small.

… it’s not the most practical, and the single-but version seems to be a toy problem, but I can imagine it being useful in some very particular cases?

Deleted Comment

TuringNYC · 4 years ago
This wouldn't count as "real world" but this topic reminds me of communication protocols described in the novel: https://en.wikipedia.org/wiki/The_Three-Body_Problem_(novel)
mateo1 · 4 years ago
The article clearly mentions distributed learning and its really not that hard to generalise on literally any partially parallelizable computing problem with bandwidth limitations between the compute nodes.

Deleted Comment

bjornsing · 4 years ago
I hate these articles that jump straight into some complex solution without stating the problem clearly... If the problem really is to send a real number using a single bit (and some shared randomness) then clearly that’s just impossible. Next!
asxd · 4 years ago
The problem is a variation of that, which makes it possible, as explained in the post. The large caveat is that the accuracy is probabilistic, and it’s expected that the receiver is getting many 1-bit from various sensors that are all measuring the same phenomena.

It’s a bit like the concept of dithering in images, where visually we can represent shades of grays just by using many 1 bit pixels.

oh_my_goodness · 4 years ago
Yes, it's a variation on the problem ... but sending many bits to the receiver is a really significant variation on just sending one bit.

I'm not sure whether so many people would be intrigued to read "How to send a real number using arbitrarily many bits."

magicalhippo · 4 years ago
> If the problem really is to send a real number using a single bit (and some shared randomness) then clearly that’s just impossible. Next!

Not impossible. You can perfectly well send say, pi, by implementing a protocol which sends 1 if the number is pi and 0 if not.

And it's about as useful as the mentioned algorithms.

yshklarov · 4 years ago
But no! The very first sentence clearly states: "In this post, we'll look at the natural problem of how to communicate an estimate of a real value in [0,1], using just 1 bit." The second paragraph sketches the motivation for the problem.

How much more exposition do you want from a casual blog post? There's a link to the academic paper if you want full details (you can follow up on the numerous references to other papers, which explain applications.)

adrian_b · 4 years ago
The blog reproduces rather faithfully what the academic paper says.

The academic paper is confusing, because it defines the problem as it would have been a singular event, the sender chooses a real number, then sends 1 bit to the receiver and the receiver miraculously guesses the real number.

You have to waste time by carefully studying the paper to see that the statistical methods used are meaningless for a singular event, so in fact a large number of transmissions of a single bit are assumed and the accuracy (i.e. the variance) of the estimation of the real number is computed over that very large but unspecified number of transmitted bits.

In any real application, the main quality of an estimation algorithm would be determined by how many single bits must be transmitted and received to guarantee a certain accuracy (variance) of the estimation, i.e. a certain number of recovered bits of the real number, but I have lost patience when reading the paper before discovering whether the authors even mention this somewhere.

What should be clear for everyone, is that recovering N bits of a real number, i.e. reducing the variance of the estimation to that level, always requires transmitting many more single bits than N.

Nevertheless, even if more bits are sent, there are many cases when this is useful, e.g. because losing some single bits or receiving them with errors has little effect, but there may also be other reasons.

asdf3331 · 4 years ago
some of the comments on this article are so embarrassing. It's just a short and clearly written article on a pretty simple and fundamental type of problem in information theory. Similar ideas are the basis for the systems we use in data compression and transmission and error correction among other things.

Maybe better to think for a few seconds before dismissing the problem as pointless and claiming to hate it.

magicalhippo · 4 years ago
It describes the problem, but not fully. Either do it properly or link to a complete description.

In this case, unlike what the article states as the problem, the point seems not to be how to send a real number using a single bit, but how to aggregate such bits from many agents that measure some shared thing.

That's two very, very different things.

amelius · 4 years ago
Or encode the number using timing.
zitterbewegung · 4 years ago
That doesn’t work because it requires two clocks that are synchronized .
bloaf · 4 years ago
So?

Suppose you have two clocks that are synchronized to +-dt1, and sending messages takes some known amount of time +-dt2 and you want to make sure the sender is able to transmit within +dt3 of whatever event triggered the send event.

It seems to me that you'd be able to transmit at least dt3/(2*(dt1+dt2)) distinct values with the pretty naive approach of discretizing time into segments longer than the uncertainty between the message-and-clock error. I'd be willing to bet that a sophisticated approach could do much better.

amelius · 4 years ago
Yes, you need more than just a bit, but that is the case also for the article.

Further, you could send the bit as a pulse, where the pulse-width is proportional to the number.

Aerroon · 4 years ago
The first thing I thought of based on the title was sending at a specific time that then is used as a lookup from something like PI or e.
fwip · 4 years ago
According to the paper, this is sending an estimate of a real number with a finite number of bits (possibly one). The method aims to reduce worst-case(?) error by relying on preshared state (in this case, the seed to a PRNG).
Gormisdomai · 4 years ago
What is "shared randomness" in the context of this article? When I google it I find papers on quantum computing - is that a prerequisite for this implementation?
cpeterso · 4 years ago
Shared randomness is when the sender and receiver each have a PRNG or list of random numbers that are synchronized. Both sides know what the “current” and “next” random numbers are.
lampington · 4 years ago
Thank you! I was confused by the ℎ∼𝑈[−1/2,1/2], thinking that it meant either that they both have a set of random numbers (in which case, how does the receiver know which one of them the sender used?) or that they both had the same single random number in that range (in which case it looks like it's basically the same as the randomised rounding case).

Your explanation that they are both going through the same list at the same rate clarifies that. They have n random numbers and are both using the same one for each transmission/receipt event.

I guess that in practice, synchronising that would be a pain. Either they sync on packets sent/received, which means that one dropped packet messes everything up, or they sync on time, in which case there's clock skew to worry about. Or they could have sync information in the data sent ("I'm using the 1234th item in the shared randomness"), which makes a bit of a nonsense of the whole "one bit" thing.

Am I missing something there?

londons_explore · 4 years ago
I see this as the future of machine learning.

Using a single bit to communicate weight updates during the learning process reduces bandwidth required, and allows highly parallel training.

I suspect in the future we'll even see methods of sub-1-bit weight updates to further decrease bandwidth requirements to keep massive models approximately in sync between distant learning nodes.

asxd · 4 years ago
What would a sub 1-bit weight update look like? I’m having trouble wrapping my head around the concept of (what I assume would be) a fraction of a bit?
fennecfoxen · 4 years ago
First imagine a less-than-bit fact. Suppose you have a set of 1024 computers and you want to pick out a specific computer based on many facts about a single computer. 1 in 4 computers runs Linux and the rest run Windows. Knowing a computer runs Windows is about half a bit of the information necessary to identify the computer.

A less-than-bit update can be combined with another such an update to encode a single bit. For instance, my protocol could use a bit to indicate “the computer runs Linux, or it is one of the 256 Windows computers in this list”. You know the computer has a 50% chance of running Windows.

londons_explore · 4 years ago
Imagine we have 2 weights we want to update. For each weight, choose a pseudo-random number known by both sender and receiver. If a "1" bit is received, update both weights with the random number. If a zero is received, don't.

After enough iterations, each weight can converge on any value.