Calendar Queues: A Fast O(1) Priority Queue Implementation (1988)

I've had very similar experience as other commenters have stated W.R.T. calendar queues vs just good-old-fashioned std::priority_queue.

Then, one day, my team hired this ancient soviet engineer who looked like he could have been Lenin's drinking buddy. He was not impressed that I was using std::priority_queue, and he sat down and wrote a calendar queue.

I'll be damned if that thing wasn't 7 to 9 times faster. I thought I was an engineer, but next to this guy, I was just a monkey poking at the typewriter.

It is possible to make a calendar queue which will absolutely mop the floor with any other queue, but the algorithms given in these papers is just a starting point. Going from the published algorithm to an actual performant, production-ready product is always the hardest part.

throwaway81523 · 2 years ago

As mentioned, something like it already exists inside Linux. Maybe it could be pulled out and turned into an app library, if it's so much better than a heap queue. Info: https://duckduckgo.com/?q=timer+wheel+linux

I remember writing a heap queue in C++ myself because std::priority_queue had some kind of shortcoming whose specifics I don't remember. Maybe I can find that program and check what it wanted. It wasn't a performance issue, but rather, something I needed was missing from the stdlib API and I remember thinking that it was silly that they omitted it.

Ono-Sendai · 2 years ago

What you are thinking of is probably that you can't erase elements (apart from the top element) from the priority queue.

I think I reinvented and started implementing something like this, but then just ended up using std::priority_queue (the C++ standard library priority queue) which is pretty fast.

timClicks · 2 years ago

Strictly speaking, you ended up using your compiler's priority queue. The standard defines the interface and invariants, but the implementer has discretion about the implementation. There are also several knobs you can turn to tweak the performance characteristics. std::priority_queue is a container adapter that can be applied to many of the standard containers. https://en.cppreference.com/w/cpp/container/priority_queue

quuxplusone · 2 years ago

[Nit: Not "your compiler's" but "your library's." The C++ Standard Library is generally provided by the compiler vendor, but it's not built into the compiler, except for tiny pieces like `std::bad_alloc` and `std::strong_ordering`.]

The implementor has far less freedom than your answer seems to be implying. The standard specifies, for example:

https://eel.is/c++draft/priority.queue#priqueue.cons-4

> The constructor calls make_heap(c.begin(), c.end(), comp).

https://eel.is/c++draft/priority.queue#priqueue.members-5

> emplace calls push_heap(c.begin(), c.end(), comp).

And so on. In fact, if I weren't trying to "yes and" you, I'd say there is essentially no implementation freedom. In particular, the user-programmer is allowed, at any point, to extract the protected data member `c` and verify that it is in fact heapified in the same way that `std::push_heap` would have heapified it.

That said, std::priority_queue::pop is specified to behave "as if by pop_heap followed by pop_back," and in fact the vendor can do better there, by using Floyd's "bottom-up" algorithm. LLVM's libc++ switched from the naïve implementation to Floyd's version back in early 2022, thus closing a feature request that had been open for 11 years at the time:

https://github.com/llvm/llvm-project/commit/79d08e398c17e83b...

I think the other two major vendors had already switched by then, although I'm not sure.

The implementation definitely does not have the freedom to switch from the mandated heap-based PQ to any alternative kind of PQ, including but not limited to (TAOCP §5.2.3) "leftist or balanced trees, stratified trees, binomial queues, pagodas, pairing heaps, skew heaps, Fibonacci heaps, calendar queues, relaxed heaps, fishspear, hot queues, etc."

I once wrote an STL-style implementation of Fishspear, with some analysis of its pros and cons. https://quuxplusone.github.io/blog/2021/05/23/fishspear/

Sesse__ · 2 years ago

std::priority_queue is sorely missing the operation “change the priority of this element” (you need to do it using a delete and then a new insert, which is rather slow), which comes up all the time in e.g. Dijkstra's algorithm.

quuxplusone · 2 years ago

Yes, and, there are two related but different operations there:

- Look up an arbitrary element by its value, and then change that element's priority. This is often needed in real life, but is fundamentally incompatible with std::priority_queue's highly restricted design. There is no public API at all for dealing with "arbitrary elements" of a std::priority_queue; you interact only with the .top() element.

- Change the top element's priority, i.e. handle it and then throw it back down to be dealt with again sometime later. This operation is used in e.g. the Sieve of Eratosthenes.

I'm not sure which operation you're thinking of w.r.t. Dijkstra's algorithm; I'd wildly guess it's the first operation, not the second.

Changing the top element's priority is easy to graft onto the STL priority_queue's API. I've done it myself here: https://quuxplusone.github.io/blog/2018/04/27/pq-replace-top... The proper name of this operation is `pq.replace_top(value)`, and for the perfect-forwarding version, `pq.reemplace_top(args...)`.

Search `reemplace_top` in this Sieve of Eratosthenes code: https://godbolt.org/z/bvY4Mr1GE

alexhutcheson · 2 years ago

Boost.Heap has this functionality. Or if you want to stick with the standard library it’s fairly easy to use the *_heap functions from <algorithm> and just hand-code your own fix_heap(first, last, changed) function. Agree it would be more convenient to have it built-in, though.

Dead Comment

bob1029 · 2 years ago

I spent a solid few days chasing this exact damn rabbit.

I thought I could beat the PQ implementation in .NET with something like this but I never even got close.

I think my use case breaks the assumptions in this paper due to the volatility of the distribution over time.

Edit: For reference, this is the approach taken by .NET - https://en.m.wikipedia.org/wiki/D-ary_heap

neonsunset · 2 years ago

If you would like to contribute, there might be a better optimization opportunity in the current bounded Channel<T> implementation: https://github.com/dotnet/runtime/discussions/104791#discuss...

nbingham · 2 years ago

I tested two C++ implementations of Calendar Queues again the standard library priority_queue. The first implementation uses a deque as a backing container and then fills the calendar with linked lists. The second implementation just uses vectors in the calendar with no backing container.

The calendar queues have an interesting failure mode. If you are randomly inserting elements with a priority that is less than "now", and then pop some number of elements, and doing this repeatedly, then the front of the calendar empties out. As a result, the random insertions create a single value in the front of the calendar, then there are many many days that are empty after that. So subsequent pops will always have to search a lot of days in the calendar. So, these calendar queues are only fast if you keep track of "now" and only insert events after "now".

https://gist.github.com/nbingham1/611d37fce31334a1520213ce5d...

seed 1725545662 priority_queue 0.644354 calendar_queue 0.215860 calendar_queue_vector 0.405788

seed 1725545667 priority_queue 0.572672 calendar_queue 0.196812 calendar_queue_vector 0.392303

seed 1725545672 priority_queue 0.622041 calendar_queue 0.241419 calendar_queue_vector 0.413713

seed 1725545676 priority_queue 0.590372 calendar_queue 0.204428 calendar_queue_vector 0.386992

It would be nice to note in the title that this is a pdf. The algorithm is something like the timer wheels in the Linux kernel. Related to radix sorting more or less. Basically there are a bunch of buckets containing sorted lists of events. I didn't read too carefully since most people use a heap for this, which is O(log n) but likely has better constants.

rhelz · 2 years ago

kazinator · 2 years ago

I independently invented something similar around 1993 inside the scheduler of a threading implementation. I wanted to have a priority scheme whereby the ratios of priority values determined the amount of CPU quanta given to the thread. E.g. a priority 5 thread would twice the CPU time compared to a priority 10.

I called the algorithm "appointment calendar". Threads were scheduled in a calendar, with the lower priority threads (higher value) getting appointments farther in the future. The scheduler just marched through the calendar in order, taking the appointments.

kevinventullo · 2 years ago

I believe the Re-Pair algorithm used for doing linear time byte-pair encoding makes use of a similar idea: https://en.m.wikipedia.org/wiki/Re-Pair

There, instead of dates, the “priority index” reflects the frequencies of pairs seen in the input string. This leads to guaranteed O(1) runtime amortized over the input string, since the largest frequency count is bounded by the size of the input and can only decrease as merges happen.

packetlost · 2 years ago

Everything can be O(1) if you put bounds on every operation.

kevindamm · 2 years ago

The difference of note is when the problem statement has an implicit constraint that makes the bounds reasonable vs. when the bounds are arbitrary and artificially constrain the problems that can be solved.

Sorting integer elements that can only be represented by 64-bit ints and smaller? Radix sort for the win (*). Sorting strings which may be of any length? Well, saying you can do that in linear time is a bit disingenuous.

It is always important to recognize the properties of the problem domain when stating complexity bounds.

(*) of course, any vanilla comparison-based sort is a better first-implementation than radix sort, but we're talking about linear time algorithms here