Donald Knuth’s 2023 Christmas Lecture: Making the Cells Dance

Dancing Links is very marvelous form of optimization.

Thanks to a intent to optimize this problem relating to combinatorial-exploration, Knuth's "Dancing Links" operates upon a hyper-focused system where "malloc" and "free" of these linked lists are incredibly optimized.

In fact, "malloc" is like 2 or 3 assembly instructions (akin to bump-allocation), while "free" is IIRC a no-op. This is because every linked-node can only point to the immediate cell to the right, or a cell later down the line. (So a malloc is basically &curNode+1).

-------------

Its incredible that Knuth continues to work on this problem. I do think that these combinatorial problems he's gotten "stuck" on, here on Volume4, really constitute the most elegant of the entirety of the Comp. Sci field.

I would like to see more chapters of Volume4 to come out though!

----------

I'll have to look at this "dancing cells" lecture of his, any improvements in this concept will definitely intrigue me.

This lecture covers Knuth's latest (and simplest-so-far) algorithm to solve the XCC problem; I'll describe XCC below.

The XCC problem is designed to solve problems that are NP-hard. So algorithms that solve XCC are not known to be polynomial-time. But they're interesting because they can solve a whole swath of problems that are difficult (not known to have a poly-time algorithm) and interesting. This is in the same world as SAT-solvers, in case you already know about those.

This particular Knuth lecture shows how you can simplify his original "dancing links" algorithm to a "dancing cells" version that no longer uses as complex a linked list structure internally, but rather uses flat arrays, although it does retain some link-like structure. The dancing cells alg isn't always better than dancing links, but for most of Knuth's test cases, this new version is faster.

Ok, what is XCC? It's the "exact covering with colors" problem. (Knuth mentioned that XCC doesn't yet have a wikipedia page, and maybe it should!) It's an extension of the "exact cover" problem -- you're given a set of items that should be covered, and a set of options (sets of items) that may cover some items. Each item must be covered exactly once (no more, no less). Now I'll add the "with colors" part: You can divide items into primary (these need to be covered exactly once) and secondary: these items may be either uncovered, or have possibly-many covering options, but the "color" of the coverings must be the same.

Here's a problem to help gain some intuition for XCC: Imagine trying to build an mxn word grid with valid English words. Each option can be either a word in a row or a word in a column. In this case your primary items could be something like "each row needs a word and each column needs a word." Then your secondary items would be the individual cells and each letter is a "color;" thus the use of consistent colors is a way to ensure that you can't assign different letters to the same cell.

Another description of XCC: https://docs.rs/xcc/latest/xcc/

Word squares: https://en.wikipedia.org/wiki/Word_square

abhgh · 2 years ago

I am yet to read up on xcc so this is helpful, thanks. One thing that I found useful as an application from Knuth's lecture (I'm not sure if it's edited out because it was an audience discussion, I attended the 2023 talk in person, reg. which I commented on another thread [1]), is that it can produce multiple SAT solutions, as opposed to just one,which seems to be the case now (definitely true for z3 which I have used a bit). This is important in practice because often there are downstream constraints which you might become aware of later (e.g., there're other hidden "soft" costs you didn't factor in), and because of which you now need to go back and need to run the solver again to find an alternative solution. This is time consuming but the worse part is there is no good way to avoid the first solution except by adding another constraint, e.g. original constraints + not(first solution).

Edit: as an extension of the above, the second solution mightn't work well too, in which case you'd need to repeat the above till you find a solution that you like. Very painful in practice.

[1] https://news.ycombinator.com/item?id=38775882

cb321 · 2 years ago

It's not called "XCC" (or perhaps just "XC") as he mentioned in his talk, but there is this: https://en.wikipedia.org/wiki/Exact_cover which has a link to a Knuth algo (EDIT: and a section on what they call "generalized exact cover").

svat · 2 years ago

The 'exact cover' problem has been studied for decades, possibly centuries. The generalization to "primary" and "secondary" items is Knuth's terminology, but the idea itself is a natural one and has been considered before. It is specifically "exact covering with colors" (XCC, not "XC") that is Knuth's innovation/discovery, what a large part of the recent volume discusses, and about which he said there isn't a Wikipedia article yet (because it hasn't caught on yet, and discussed in non-Knuth sources).

dragontamer · 2 years ago

dang · 2 years ago

Donald Knuth's Annual Christmas Lecture 2023 - https://news.ycombinator.com/item?id=38548712 - Dec 2023 (5 comments)

30 years of Donald Knuth's 'Christmas Lectures' are online – including 2023's - https://news.ycombinator.com/item?id=38775882 - Dec 2023 (10 comments)

JacobiX · 2 years ago

This is a very nice observation that I've found in the dancing links paper :

My trusty old SPARCstation 2 computer, vintage 1992, is able to perform approxi- mately 0.39 mega-updates per second when working on large problems and maintaining the S fields. The 120 MHz Pentium I computer that Stanford computer science faculty were given in 1996 did 1.21 mega-updates per second, and my new 500 MHz Pentium III does 5.94. Thus the running time decreases as technology advances; but it remains essentially proportional to the number of updates, which is the number of times the links do their dance. Therefore I prefer to measure the performance of algorithm DLX by counting the number of updates, not by counting the number of elapsed seconds.

tylerneylon · 2 years ago

caprock · 2 years ago

Here's the more direct link to the Stanford page:

https://online.stanford.edu/donald-e-knuth-lectures

radicality · 2 years ago

It’s been many years, but I remember really enjoying implementing dancing links for sudoku solving when I was practicing for job interviews. I wrote about it on my blog, there is some very old (probably crappy) Java code for it too.

https://rafal.io/posts/solving-sudoku-with-dancing-links.htm...

https://github.com/rafalio/dancing-links-java/tree/master

jsenn · 2 years ago

Though not for the same purpose, some of the tricks described remind me of the “compact slot map” described here: https://gamedev.stackexchange.com/a/33905, including the “swap and pop” for deleting from the middle of the array.

The Briggs & Torczon paper focuses on set representation rather than key-value style access, but of course one can get associative access with just a 2nd dense array of satellite data or, equivalently, wider-than integers in the one dense array (depending upon memory layout desires). Then you basically have a direct access index (like only 256 possible things, so use an 8-bit byte-sized index array of 256 entries) paired with a dense table.

This always struck me as very similar to early ideas in database indexing a la Ted Codd & crew (as perhaps more recently popularized in recent Python3.6 hash table impls). So, while Aho & Briggs & Torczon all deserve ongoing credit for popularization, I wonder if anyone out in HN-land knows of a reference to a database paper from the 60s or 70s introducing this? (For more complete attribution.)