Perhaps we want to have any of the given multicast readers able to read more than one element at a time, and that does complicate things somewhat. But hardly impossible to handle.
Again: deeply disagreeing with the premise here that this can't be done. And it isn't even really a significant penalty, if your consumers do have to be async consumers that need to hold open their reading for a while. Unclear what the protests are.
The smaller arrays are not "left behind" in the garbage sense - the queue will use them again and again in the next rounds. See simulator. Re-use, not Re-cycle - the Garbage-Free Mantra.
If the Queue re-cycles the smaller arrays, it would not be garbage-free anymore.
If you still believe that the smaller arrays should be re-cycled (would be curious why), then comes the technical problem:
Let's imagine a reader stands immediately before reading the array (e.g. to check if the writer has already written). Now the OS preempts him. For how long: We don't know. In the meantime all things in the queue move forward and the program code in some other thread (writer probably) decides to de-allocate the array (and indeed does it).
Now the preempted reader wakes up and the first thing it does is to read from that (deallocated) array ...
In my case, I needed a way to book and release two-ports tuples really fast to accommodate a RTP simulator. So, I wrote that slotmachine data structure and have been running in in production for months and can confirm: yes, performance is good.
Note: I should mention that my approach is almost exactly opposite to yours: I create a final backing slice, then create the traversal slices.