It is sometimes thought that extremely long-running Turing machine programs must be deeply complicated, or "spaghetti code". This new reigning champion program is a counterexample. All things considered, it is relatively simple.
There are three states: A, B, C. (States are just like goto targets in C.) State B passes control to A and C, but states A and C don't "know about" each other; they only pass control back to B. This is a sort of modular construction, whereas in true spaghetti code each state would be able to pass to all the others.
This program has some other interesting features. It never prints a blank (that is, whenever it scans a 0, it prints 1, 2, or 3). Additionally, every instruction changes either the state or the color -- there are no "lazy instructions" like B1 -> 1LB that just move position without changing anything else.
There is some debate in the bbchallenge project regarding the current long-running champions: do their properties reflect those of the actual longest-running machine of that size, or are their properties just the easiest to automatically search for and prove things about, as a kind of streetlamp effect? There's no way to know, until the entire search space has been ruled out, either definitely or heuristically. (Every size past BB(5, 2) contains chaotic, pseudorandom machines that are expected to run forever, but can't be proven to do so without big advances in number theory.)
Though I suspect that no long-running machine can be utterly chaotic, at least when you look at the patterns it produces on the tape. To run for a long time, a machine must simulate some sort of higher-level rule: if it were dumping symbols on the tape at random, then it would very soon reach a halting configuration, a cyclical configuration, or some other simplified pattern. Still, one can have long-running machines that do simulate something chaotic on a higher level, spending an inordinate amount of time between each high-level step until it halts.
I agree completely, all of these kinds of conjectures are shaped by what is detectable. If there are any "dark matter" programs out there, they by definition will be difficult to find. That said, I find it entirely plausible that the champions will win by exploiting complex and exotic mathematical facts, while the implementations of the math do not themselves need to be complex or exotic at the code level.
Handwaving here, but I think longest running machines can't follow a specific structure in general.
Rough sketch of an argument:
Let's construct a number from all intermediate states of the machine concatenated. The number of digits of this number should correspond to the runtime (sketchy). We only care about the halting machines, so it's finite. We know that it must be unique because if a smaller machine computes the same number, we could get a bigger number by simply running the smaller program and doing other nonsense. That means the biggest programs are komolgorov optimal, and the numbers themselves should be k-trivial and thus nearly but not quite computable. Since they're not computable, the programs themselves can't follow a structure to generate them (since that would be computable in turn for larger values).
At a high level, wouldn’t you expect Champions to be chaotic in the limit? As in, the halting problem tells us that any increasing sequence of Champions is not computable.
0 1 2 3
A 1RB 3LB 1RZ 2RA
B 2LC 3RB 1LC 2RA
C 3RB 1LB 3LC 2RC
with the triple (t',d, s') in row s column t specifying the transition from state s with symbol t under the tape head. the machine overwrites symbol t with t', moves the tape Left/Right according to direction d, and changes state from s to s', halting if s'==Z.
This is 3*4*log2(4*2*log2(4+1)) or about 64 bits of information.
Meanwhile, in only 49 bits, BBλ(49) far exceeds Graham's number [1].
Counting the number of distinct TMs is not a simple task. The way you count it is the most broad (the count of all tables of values where each cell can have any (symbol, direction, state) combination). But this is a significant overcount of the bits needed to describe an arbitrary TM. for the BB(3, 4) case, there are only about 600B distinct TMs using the Tree Normal Form aka Brady's algorithm (https://nickdrozd.github.io/2022/01/14/bradys-algorithm.html) which comes out to <40 bits.
Similarly, 2^n would be a huge overcount of the number of n-bit closed terms, which is given by OEIS sequence A114852 [1]. There are 36140122614 ~ 2^35 closed terms of size 49 bits (length of a straightforward self-delimiting encoding). So in that sense it's still a smaller lambda term computing an incomprehensibly larger output.
Do you know of any (hand-wavy is ok) intuitive explanation for why this machine will halt, beyond the inductive proof's given in your article? Just watching the machine run, I would guess at some kind of infinite behavior. It is remarkable that this is not the case.
Hmm, so I think the 1R in 1RZ don't matter then for this program and are arbitrarily chosen, since it halts there so it doesn't matter what's still written on the tape or where the tape head moves?
(Even writing of the 1 doesn't matter, though a 0 would have been suboptimal I guess. A 2 was already written on the tape there, it's being replaced by a 1 but the 2 counted just as much for 'number of symbols on the tape')
I was curious as to how it works, so I implemented here:
turingmachine.io/?import-gist=c862f28918f3d889f964797694d28fcc
If you run it for a bit you see what's going on, State B turns 0's into 2's and 1's into 1's transitioning to C and state C turns 3 -> 2's and transitions to A. So you just iteratively lengthen your run of 3's exponentially since it requires a full pass through all the 3's to fix a 2->1.
A BitGrid[1] (my personal hobby horse) only has 4 bits of state per cell, so a 4x4 grid of cells can't count more than 2^64, no matter what. Finding the highest it could count would be interesting. For small grids, the edge connections would dominate the outcome.
The "states" (A, B, C) correspond to goto targets. The "colors" (0, 1, 2, 3) are runtime data. At each state, the current color is read, and an instruction is executed (print some color, move left or right, goto some state) based on which color is there. Transliterated into C it looks like this:
#include "machine.h"
int main(void)
{
A:
switch (SCAN) {
case 0:
WRITE(1); RIGHT; goto B;
case 1:
WRITE(3); LEFT; goto B;
case 2:
WRITE(1); RIGHT; goto H;
case 3:
WRITE(2); RIGHT; goto A;
}
B:
switch (SCAN) {
case 0:
WRITE(2); LEFT; goto C;
case 1:
WRITE(3); RIGHT; goto B;
case 2:
WRITE(1); LEFT; goto C;
case 3:
WRITE(2); RIGHT; goto A;
}
C:
switch (SCAN) {
case 0:
WRITE(3); RIGHT; goto B;
case 1:
WRITE(1); LEFT; goto B;
case 2:
WRITE(3); LEFT; goto C;
case 3:
WRITE(2); RIGHT; goto C;
}
H:
HALT;
}
Question: are there any opportunities to rewrite this logic in a more "structured" style, or to make any other optimizations?
I love these puzzles. GNU C supports a label as value for computed goto. This is useful for direct threaded dispatch. You trade off a branch instruction for an address lookup, but it makes the code more structured.
Each row is a state and each column is the symbol that was just read off the tape. So the first row & first column mean "I have just read symbol 0 and am currently in state A".
The cell in the table describes which actions to perform. The first row & first column has "1RB" which means: "replace the symbol on the tape with '1', shift 1 symbol to the right on the tape and switch to state 'B'".
There's a brief explanation at [0]. (Note that "1RZ" is understood to be a halting transition, since state "Z" has no rules.) Wikipedia also has a more elaborated example of a TM state table [1]. You can see the trace of this particular TM at [2].
Each triplet seems to be the symbol to write, and direction to move in, and the next state. You can think of the behavior of a turing machine as a function of the current state and the symbol read that's repeated iteratively.
Why not? The idea that the only valid way to publish a scientific result is in a (so-called) peer-reviewed journal is a relic from 200 years ago when the scientific community was small enough to fit within the monkeysphere [1]. It is being clung to only because it is profitable for a small group of powerful academics and publishing houses, not because it has any actual merit in terms of the advancement of science. In fact, it's probably in no small measure responsible for contemporary replication crises. I'm as staunch an advocate of the scientific method as you will hope to find (I'm writing a book about it!) but IMHO traditional peer review is long past its sell-by date.
Because Discord is a proprietary glorified IRC SaaS; its contents are, by nature, ephemeral and under control of the vendor. I'd expect such links to rot very quickly.
Collaborating on Discord is fine. Important results, including citations backing them, should really be published or at least replicated in more durable medium that's less susceptible to link rot, and easier to archive. Today, that's PDFs in paper repositories, or even regular blog posts.
To be fair, the responsibility for the replication crises is much trickier. It’s based on human motivations and poor science:
1. Outright fraud
2. Lack of independent verification of results before building up a body of work resulting in a bunch of garbage. Contributes to making #1 “easy”.
3. Financial pressures to publish or perish contributing to 1 & 2. If peer review didn’t exist you’d still something similar about producing “recognized” results. This is also probably why we haven’t done any major breakthroughs in particle physics which now has a much longer term thinking phase to come up with anything.
The biggest problem with peer review is actually the establishment of an orthodoxy which discourages itself being upended by controlling the career success of anyone who dare challenge it - you have to support the orthodoxy to get career success and you fail to get recognition if your idea challenges the peer reviewers ideas and careers. That being said, such pressures existed before, but at least you could try to build your own following and it was competing orthodoxies instead of a single one winning out.
FWIW, it's a public Discord server, you can find the invite link at the top-right of https://bbchallenge.org.
Also, I'd consider these more as attributions than citations. All the major arguments supporting the results have been replicated in the blog post (in a more rigorous form), so it can stand on its own: the Discord links just provide historical context for those interested.
As a member of these chats: it's often like hitting on an idea on a break-room blackboard and working it out, except the interaction can be cited. That's a positive change, if we can follow through and add to the literature in due time. Here's hoping.
That's fine, but the citation shouldn't be in the form of a Discord link, or at least not exclusively in that form. Make a copy of the relevant messages, host them elsewhere, and include that copy in the citation. Discord has been pretty good so far but being a durable archive has never been their core selling point so I don't trust that to continue indefinitely.
I understand the complaint here, but a lot of recent impressive progress in mathematics has been by rapid collaboration + iteration (e.g. the project to improve Zhang's prime gap bound), and it may be the case that different communication tools are not easily fungible with Discord in this regard. You have to go where the people actually are.
But Discord isn't citable. Somebody needs to archive the Discord and make that available through the proper channels (e.g. a website, a book, the Internet Archive).
Finding bigger busy beaver numbers is not exactly foundational. More like recreational. If it were foundational it would be peer reviewed in a journal article, not posted on a blog.
> If it were foundational it would be peer reviewed in a journal article, not posted on a blog.
What I think you are doing here is to DEFINE "foundational work" as something that gets published in a journal.
I don't mind if you use that definition, but if you do then the fact that all foundational work is published in journals is not insightful or informative, it is merely the definition.
If, on the other hand, you intended for the statement to say something meaningful about how important scientific and mathematical work is communicated, then you need a different definition of "foundational". And you would have to look to that definition to decide whether this work was foundational, because if it was then it disproves your hypothesis: it would be a counter example that illustrates that some foundational work is shared in places other than journals.
There are only so many Turing machines that we can possibly describe with a not too large amount of symbols such as 1RB3LB1RZ2RA_2LC3RB1LC2RA_3RB1LB3LC2RC. The fact that some of these can make such a crazy large number of steps before halting to me is mind blowing.
There are 2^60 such 3 state 4 symbol Turing machines. A 49-bit lambda term whose output (normal form) exceeds Graham's Number should blow your mind even more.
I don’t know why, but these probably-useless-results (which frankly I don’t 100% understand) intrigue me more than the incredibly-useful LLM advances. I guess because I’m naturally drawn to “simple” mathematical truths more than “messy” engineering results.
There are three states: A, B, C. (States are just like goto targets in C.) State B passes control to A and C, but states A and C don't "know about" each other; they only pass control back to B. This is a sort of modular construction, whereas in true spaghetti code each state would be able to pass to all the others.
This program has some other interesting features. It never prints a blank (that is, whenever it scans a 0, it prints 1, 2, or 3). Additionally, every instruction changes either the state or the color -- there are no "lazy instructions" like B1 -> 1LB that just move position without changing anything else.
Though I suspect that no long-running machine can be utterly chaotic, at least when you look at the patterns it produces on the tape. To run for a long time, a machine must simulate some sort of higher-level rule: if it were dumping symbols on the tape at random, then it would very soon reach a halting configuration, a cyclical configuration, or some other simplified pattern. Still, one can have long-running machines that do simulate something chaotic on a higher level, spending an inordinate amount of time between each high-level step until it halts.
I agree completely, all of these kinds of conjectures are shaped by what is detectable. If there are any "dark matter" programs out there, they by definition will be difficult to find. That said, I find it entirely plausible that the champions will win by exploiting complex and exotic mathematical facts, while the implementations of the math do not themselves need to be complex or exotic at the code level.
More rambling thoughts about this: https://nickdrozd.github.io/2021/09/25/spaghetti-code-conjec...
Rough sketch of an argument:
Let's construct a number from all intermediate states of the machine concatenated. The number of digits of this number should correspond to the runtime (sketchy). We only care about the halting machines, so it's finite. We know that it must be unique because if a smaller machine computes the same number, we could get a bigger number by simply running the smaller program and doing other nonsense. That means the biggest programs are komolgorov optimal, and the numbers themselves should be k-trivial and thus nearly but not quite computable. Since they're not computable, the programs themselves can't follow a structure to generate them (since that would be computable in turn for larger values).
An n-state s-symbol TM can transition to n other states at most (or halt). Thus, for s=4 or s=2, only the smallest of TMs could be spaghetti like.
This is 3*4*log2(4*2*log2(4+1)) or about 64 bits of information. Meanwhile, in only 49 bits, BBλ(49) far exceeds Graham's number [1].
[1] https://oeis.org/A333479
[1] https://oeis.org/A114852
(Even writing of the 1 doesn't matter, though a 0 would have been suboptimal I guess. A 2 was already written on the tape there, it's being replaced by a 1 but the 2 counted just as much for 'number of symbols on the tape')
I calculate 3*4*log2(4*2*log2(4+1)) ≈ 51.
I (a layperson) would have thought the calculation would be 3*4*log2(4*2*4) = 60.
Is it perhaps 3*4*log2(4*2*log2(3*3*4-1)) ≈ 64?
If you run it for a bit you see what's going on, State B turns 0's into 2's and 1's into 1's transitioning to C and state C turns 3 -> 2's and transitions to A. So you just iteratively lengthen your run of 3's exponentially since it requires a full pass through all the 3's to fix a 2->1.
But the really tricky part to understand is why it eventually stops. After an unimaginably high number of steps.
Deleted Comment
Here's a tangent to explore:
A BitGrid[1] (my personal hobby horse) only has 4 bits of state per cell, so a 4x4 grid of cells can't count more than 2^64, no matter what. Finding the highest it could count would be interesting. For small grids, the edge connections would dominate the outcome.
[1] https://esolangs.org/wiki/Bitgrid
[2] https://github.com/mikewarot/Bitgrid
Because A and C only jump to B it is possible to structure this using only loops and one boolean. Let us use Rust to demonstrate as it lacks GOTO:
Of course it is possible to rewrite this as a single loop if you are willing to accept two bits of extra state rather than one.The cell in the table describes which actions to perform. The first row & first column has "1RB" which means: "replace the symbol on the tape with '1', shift 1 symbol to the right on the tape and switch to state 'B'".
The state 'Z' corresponds to the halting state.
[0] https://bbchallenge.org/story#turing-machines
[1] https://en.wikipedia.org/wiki/Turing_machine#Formal_definiti...
[2] https://bbchallenge.org/1RB3LB1RZ2RA_2LC3RB1LC2RA_3RB1LB3LC2...
Despite a CS undergrad I don’t recall really learning any of these canonical representations of TMs before.
Deleted Comment
[1] https://en.wikipedia.org/wiki/Dunbar%27s_number
Collaborating on Discord is fine. Important results, including citations backing them, should really be published or at least replicated in more durable medium that's less susceptible to link rot, and easier to archive. Today, that's PDFs in paper repositories, or even regular blog posts.
1. Outright fraud
2. Lack of independent verification of results before building up a body of work resulting in a bunch of garbage. Contributes to making #1 “easy”.
3. Financial pressures to publish or perish contributing to 1 & 2. If peer review didn’t exist you’d still something similar about producing “recognized” results. This is also probably why we haven’t done any major breakthroughs in particle physics which now has a much longer term thinking phase to come up with anything.
The biggest problem with peer review is actually the establishment of an orthodoxy which discourages itself being upended by controlling the career success of anyone who dare challenge it - you have to support the orthodoxy to get career success and you fail to get recognition if your idea challenges the peer reviewers ideas and careers. That being said, such pressures existed before, but at least you could try to build your own following and it was competing orthodoxies instead of a single one winning out.
Also, I'd consider these more as attributions than citations. All the major arguments supporting the results have been replicated in the blog post (in a more rigorous form), so it can stand on its own: the Discord links just provide historical context for those interested.
What I think you are doing here is to DEFINE "foundational work" as something that gets published in a journal.
I don't mind if you use that definition, but if you do then the fact that all foundational work is published in journals is not insightful or informative, it is merely the definition.
If, on the other hand, you intended for the statement to say something meaningful about how important scientific and mathematical work is communicated, then you need a different definition of "foundational". And you would have to look to that definition to decide whether this work was foundational, because if it was then it disproves your hypothesis: it would be a counter example that illustrates that some foundational work is shared in places other than journals.
[1] https://en.wikipedia.org/wiki/Superpermutation#Lower_bounds,...