vitpro2213 (u/vitpro2213)

vitpro2213 commented on Multi-Array Queue github.com/MultiArrayQueu... · Posted by u/vitpro2213

cyansmoker · 2 years ago

Hi @vitpro2213 it's very interesting (at least to me) to find about this data structure a few months after I had a need for its somewhat distant cousin: https://github.com/Fusion/slotmachine

In my case, I needed a way to book and release two-ports tuples really fast to accommodate a RTP simulator. So, I wrote that slotmachine data structure and have been running in in production for months and can confirm: yes, performance is good.

Note: I should mention that my approach is almost exactly opposite to yours: I create a final backing slice, then create the traversal slices.

vitpro2213 · 2 years ago

Will have a look at it, thanks

vitpro2213 commented on Multi-Array Queue github.com/MultiArrayQueu... · Posted by u/vitpro2213

jauntywundrkind · 2 years ago

I'm sorry but that doesn't seem at all like any kind of fundamental constraint. At most basic, simply not have your readers advance until after they're done reading? This seems trivial as fuck. I've seen a lot of protestations that have seemed like, ok, maybe perhaps perhaps perhaps I'm missing some factor? But no, it really seems like the problem of understanding when data is done being read isn't anywhere near as hard as the rebuttals here make it seem, and it seems like everything works much better when we can accept this constraint.

Perhaps we want to have any of the given multicast readers able to read more than one element at a time, and that does complicate things somewhat. But hardly impossible to handle.

Again: deeply disagreeing with the premise here that this can't be done. And it isn't even really a significant penalty, if your consumers do have to be async consumers that need to hold open their reading for a while. Unclear what the protests are.

vitpro2213 · 2 years ago

Hi jauntywundrkind, just to make sure we have the same understanding:

The smaller arrays are not "left behind" in the garbage sense - the queue will use them again and again in the next rounds. See simulator. Re-use, not Re-cycle - the Garbage-Free Mantra.

If the Queue re-cycles the smaller arrays, it would not be garbage-free anymore.

If you still believe that the smaller arrays should be re-cycled (would be curious why), then comes the technical problem:

Let's imagine a reader stands immediately before reading the array (e.g. to check if the writer has already written). Now the OS preempts him. For how long: We don't know. In the meantime all things in the queue move forward and the program code in some other thread (writer probably) decides to de-allocate the array (and indeed does it).

Now the preempted reader wakes up and the first thing it does is to read from that (deallocated) array ...

vitpro2213 commented on Multi-Array Queue github.com/MultiArrayQueu... · Posted by u/vitpro2213

Szpadel · 2 years ago

Isn't this only issue when you allow referencing data in queue?

If queue only allows to copy out data you can increase reader pointer after data were copied to different buffer, therefore nothing can be at the place we are removing

vitpro2213 · 2 years ago

With ConcurrentMultiArrayQueue, there can be N threads INSIDE of the program code of the Queue, running or preempted (for a not predictable time) and you cannot control it.

vitpro2213 commented on Multi-Array Queue github.com/MultiArrayQueu... · Posted by u/vitpro2213

jauntywundrkind · 2 years ago

Yeah, leaving small old buffers behind seems like a major no-no to me. It could be useful if you think you'll shrink back down, but it feels like cache locality suffering and iteration/tracking penalties strongly incentive getting rid of the old buffer asap.

One other thing I want to shout out, I saw what I thought was a really neat multicast ring buffer the other day where the author has an atomic for each element, rather than the typical reader/writer atomics. The promise was having much less contention on any given atomic, in most cases. https://github.com/rezabrizi/SPMC-Queue https://news.ycombinator.com/item?id=40410172

vitpro2213 · 2 years ago

Removing anything in non-blocking structures is problematic, see e.g. the referenced lecture of Professor Scott.

You never know how many concurrent threads still "are" on the place you wish to remove.

You would have to deal with stuff like hazard pointers, limbo lists and the like.

Better to keep the small arrays there.

vitpro2213 commented on Multi-Array Queue github.com/MultiArrayQueu... · Posted by u/vitpro2213

time0ut · 2 years ago

Interesting. A lot of care went into this. Thank you for sharing.

Is there a reason you chose not implement java.util.Queue?

vitpro2213 · 2 years ago

Implementing Queue would mean also implementing Collection and Iterable, and this would bring pains and ugliness, especially with the concurrent code.

Look e.g. at the disclaimers at the size method of java.util.concurrent.ConcurrentLinkedQueue.

vitpro2213 commented on Multi-Array Queue github.com/MultiArrayQueu... · Posted by u/vitpro2213

sparkie · 2 years ago

I think it's semi-novel as I have not seen a queue exactly like this, but the structure of the queue itself is not novel - it closely resembles Brodnik et al's RAOTS[1], which also uses an array of pointers to other arrays which increase geometrically in size. RAOTS offer amortized O(1) for most operatons and O(√n) excess space.

Also are Bagwell's VLists[2], which were based on RAOTS, which he presents an example deque for, but this differs from OPs implementation.

A note about the VList versus RAOTS - in Bagwell's paper he claimed the VList performs better, giving a comparison of several soft MSB calculations, which may have been required at the time this was published, but nearly all modern hardware has instructions for very quickly calculating the MSB (Either a singe-cycle instruction, or via count leading zeroes), so it's questionable that there's a real benefit to it as it requires additional metadata which also comes at the cost of power-of-2 alignment of the sub-arrays. However, the VList was designed to be used for a Lisp implementation, for which there may still be other benefits.

    msb = 8*sizeof(size_t) - __builtin_clz(idx)

Or in Java

    msb = Long.SIZE - Long.numberOfLeadingZeroes(idx)

This Multi-Array queue could perhaps benefit from this previous work. In particular, if you constrain the data arrays to be powers of 2 in size, you can use the MSB calculation to very quickly determine the index of the sub-array in the "rings" array, and by masking out the MSB, you determine the index in the sub-array. The RAOTS paper is a forgotten gem which every language developer should be aware of when they're implementing lists in their stdlib. They can be used for immutable lists too in place of linked lists, as cons only requires copying the contents of one sub-array and copying the rings array. In fact, you can modify it slightly to make the rings array a plain old linked-list to make this even cheaper for consing immutable lists, at the cost of slower random-access.

[1]:https://cs.uwaterloo.ca/research/tr/1999/09/CS-99-09.pdf

[2]:https://core.ac.uk/download/pdf/147902641.pdf

vitpro2213 · 2 years ago

Thanks, will look at it.

I actually thought the same: Given e.g. the complex structures published in ACM papers, it would be a surprise if MultiArrayQueue would be a completely new discovery.

We are not in the pioneer years of 1960's anymore (unfortunately :-))

vitpro2213 commented on Multi-Array Queue github.com/MultiArrayQueu... · Posted by u/vitpro2213

bigdubs · 2 years ago

Many ring-buffer implementations grow the backing storage array transparently on enqueue but do so in place, discarding the old arrays; what's the advantage of keeping the previous arrays? Naively I'd say it would reduce GC churn because you wouldn't have to free the old arrays, but I'm curious what the impact of that is in benchmarks.

Separately; the simulator is cool and very helpful!

vitpro2213 · 2 years ago

If you discard the old array (and allocate a bigger one before), you must also copy all enqueued material.

Also this one enqueue will be mega expensive - a clear "fat tail" in the latency histogram.

In MultiArrayQueue you keep all already enqueued material in-place, "just" allocate the bigger array, register the new diversion, enqueue the one new element - and done.

Thanks