Turing Complete Transformers: Two Transformers Are More Powerful Than One

These reviews are brutal. It's basically science-speak for "the paper is utter trash".

"The main claim [...] is both somewhat obvious and previously already stated"

"Many pieces of writing are overly assertive and inaccurate."

"I do not think it deserves spending half a page demonstrating that {0^n 1^n} is not in the regular language."

Animats · 2 years ago

Yes. Anything with finite memory is not Turing-complete, and transformers are finite-state machines. So what? Anything with infinite memory is un-buildable.

Yes, there's a school of philosophy that claims that AI cannot be built with digital hardware because it needs infinite state. Penrose is the leading exponent of this argument.[1] This isn't taken too seriously any more, now that finite machines are doing so well at AI.

This is close to theology. Man must be special, right? Only humans can play chess, right? Play Go? Pass the Turing test? Create art and music? We're running out of boxes to check. Common sense (as in getting through the next 30 seconds without a major screwup) and manipulation in unstructured settings are still in bad shape, but we only need squirrel-level AI for that.

On that subject, there was a "what does the cerebellum do" article on HN a few days ago. That's a big issue. The lower mammal brains are mostly cerebellum. The weak areas in AI at present are mostly cerebellum functions. That area has never gotten enough attention in AI research. Today, though, the amount of hardware to do a low-end mammal cerebellum equivalent doesn't seem at all unreasonable.

[1] https://sortingsearching.com/2021/07/18/roger-penrose-ai-ske...

colechristensen · 2 years ago

Penrose claims that human intelligence relies on quantum effects and demonstrates certain structures in neurons that makes it at least a plausible argument.

It would follow that comparable intelligence would not be possible with current digital computer architectures.

I also think a lot of people are a little too fooled by the “intelligence” of current models which are mimics with enormous libraries of “knowledge”. It is not too hard to be frustrated by the enormous limitations of current models. Impressive, yes, smart, no. Though to refute claims of their intelligence is really pushing to refine the philosophical definition of intelligence which has been rather vague so far. We can’t just say “i know it when i see it” because computers pretending is getting very good.

kaba0 · 2 years ago

I don’t think we are running out of boxes. We just try to determine the minimum requirement that we could constitute as general AI, and that bound is hard to find. It’s quite trivial to find less than minimal bounds, say, writing a novel research paper, making a discovery/invention, etc.

sacado2 · 2 years ago

The issue is that intelligence itself is an ill-defined concept, and an unofficial but broadly-shared rough definition is "what separates humans (and some animals) from unanimated things". So, as soon as a machine can do something (whether it's having good memory, proving things, playing chess, translating texts, drawing a picture, driving a car, anything) it no longer belongs to "intelligence". Using that very definition, AI is an oxymoron.

drowsspa · 2 years ago

> This is close to theology

Yeah, even the arguments reek of basically "God of the gaps" and "moving the goalposts" stuff

baq · 2 years ago

Engineering POV makes sense, but Turing-completeness is well-defined in CS i.e. maths, and maths deals with unbuildable stuff all the time. You can't engineer a result into CS unless you prove your engineering in a mathematical fashion, e.g. the famous four colour theorem proof.

b4ke · 2 years ago

Isn’t genetic material an infinite state machine as long as the host system has the requisite support continuity?

“Ai” is just the current moniker for a persistent assault on the original general intelligence (belief not required).

Pretty sure religion is an accurate representation of a generative intelligence, but I digress to knowing little.

“James

Tainnor · 2 years ago

> Yes. Anything with finite memory is not Turing-complete, and transformers are finite-state machines. So what? Anything with infinite memory is un-buildable.

I think that's missing the point. A Turing Machine description is independent of the memory provided to it by the hardware and it doesn't need to be rewritten every time the hardware improves. That's very different from a FSM which is restricted in memory a priori.

I don't really know enough about transformers to understand where they fit in here, though some reviewers do seem to suggest that allowing transformers access to infinite memory does make them also Turing Complete.

3abiton · 2 years ago

This is such a philosophical rabbit hole that I will get into when time allows it.

jakderrida · 2 years ago

I don't even have a phD. I just find openreview's feedback fun to read. For the same reason I love watching Gordon Ramsey give feedback.

qsort · 2 years ago

Me neither, but I know what the process is like from having co-authored some stuff at work. This is not the kind of review you get if the paper is simply not good enough. Those are some seriously pissed off reviewers. Compare for example with this:

https://openreview.net/forum?id=cXs5md5wAq

Cacti · 2 years ago

These are just jokes with reviewers having some fun after doing 20 other reviews. The review process isn't really needed for a paper this bad, but they have to go through the motions, and so people have fun with it.

It’s not a big deal. Shitty papers exist.

zubairq · 2 years ago

I actually really appreciate these brutal reviews

axiom92 · 2 years ago

Welcome to one of the most hated parts of the academia.

Out of curiosity, what do people think of the comments to the reviewers by the authors?

I was pretty surprised to see them challenge the reviewers. Maybe open review is different, but I was trained to try find ways to defer to the reviewers or, basically, placate them if possible. It look like the authors have tried to argue back, for example finding a contradiction between the reviews… it seems like a risky strategy to me. Then again I haven’t ever received feedback this negative, thank goodness.

krackers · 2 years ago

After skimming the paper I kind of agree with the reviewers on this one, there's too much time spent going over undergrad level TCS material, and not much in the way of related papers cited (for instance, not one mention of circuit complexity). It's a good introduction to the material for people a bit rusty in the field, but if I as a layman can nod along with my meager TCS knowledge then those parts are probably a bit _too_ trivial to include (at least for a journal submission. An addendum or blog post would be fine).

It may be quite possible that they did stumble upon a model that beats gpt-4 on their tasks, in which case they should release the code/model and let those results stand for itself. But as a _paper_ it's not very good since the main novel contribution (this "Find/Replace modification to transformer architecture" has less than a page dedicated to describing it).

Imnimo · 2 years ago

I think there are two situations in which you might see this kind of rebuttal:

1) The reviewers were truly unreasonable, and the authors feel they have no hope of winning them over. Instead, they try to highlight the flaws in the reviews to play to the Area Chair, and hope the reviewers are overruled. It's a longshot, but worth a try if you don't feel there's another path.

2) The authors aren't really familiar with the review process, and are just upset with the negative feedback. They're just looking to argue for the sake of arguing, and not as part of a strategy to get accepted. I think that's what we're seeing here.

light_hue_1 · 2 years ago

I've been a meta-reviewer, reviewer, and author at ICLR many times (not involved in reviewing this paper).

The authors did not respond to reviewers properly at all! Dear readers, please do not do this if you submit.

Negative feedback from reviewers is good. It's the point of the review system. If the reviewers aren't going to give negative feedback, we need to remove them. This is not extreme negative feedback by any means. And had the authors actually been correct, they had a shot to convince reviewer, or at least an attentive meta-reviewer.

That being said, the instructions for how to reply are clear. Authors must respond to each reviewer individually. Point by point. And then engage in a discussion with the reviewers about their responses. They did none of this. It's the mark of a nutjob who wants to avoid being challenged.

Click any of that papers of the first page here to see how it's supposed to work https://openreview.net/group?id=ICLR.cc/2022/Conference

Complaining about the quality of your reviews to your meta-reviewer is... poor form. And it does not make it more likely your paper will be accepted. We're looking for reasons to accept based on the quality of your work. The perceived shortcomings of reviewers aren't an indication that your work is good. Many meta-reviewers will tell authors to knock it off next time, because this isn't helping their case.

The reviews are pretty harsh, but after reading the paper, I feel they may be too generous. Somehow this paper spends several pages proving things that are both trivial and irrelevant, and spends zero words explaining the architecture of their model. This is borderline crackpot territory.

davesque · 2 years ago

Yeah, I sort of agree. It hardly provides any details about the model that they supposedly used to achieve only two benchmark results which are poorly described. It comes across as being written by someone that only dabbles in the topic but seems to believe they've taken some great leap forward ahead of the entire field.

im3w1l · 2 years ago

Sounds consistent with some non-scientists working in the field, maybe for some smaller software company? They find something cool, and try to write a paper about it, which is something they aren't very experienced with.

utopcell · 2 years ago

Agreed. Why is this on HN again ?

Something I noticed when skimming the paper that is also called out by one of the reviewers:

"Moreover, in the submission the authors considered only Transformers with input and output of bounded lengths, which are quite strange since Turing machines do not pose constraints on the tape length. If the length is constrained in Transformers, they clearly do not match Turing machines."

Azertinv · 2 years ago

I don't think the comparison apply in this way, Turing machine may operate on an unconstrained tape but they only look at a single cell at each step. This find and replace technique operate on a bounded context but, in the same way as a turing machine, they can "technically" operate on an unconstrained tape (technically here because the find transformer wouldn't be able to lookup an infinite tape in the real world). I guess you could say that a turing machine has a bounded input length of 1.

Okay, actually I think I just misunderstood the argument in the paper. After taking a second look, I think their argument goes as follows:

A typical autoregressive transformer model operates on a fixed-size context that acts as both input and a queue to which the output of the model is appended. If a model operates on a context of length k with n possible symbols in its alphabet, then there are n^k possible contexts.

If you run such a model for n^k iterations, then

1) if the <eos> (end of sequence) symbol appears in the context, then the model halts.

2) if the <eos> symbol never appears in the context, then you know the model never halts because it must repeat a context string by the pigeon hole principal and must be in an infinite loop.

Therefore, the halting problem is decidable for transformer models with finite length contexts that are autoregressive in this way and that would imply a contradiction if we claim they are Turing complete (because the halting problem is known to be undecidable for Turing complete systems).

I'm not entirely sure this proves anything (it sounds believable I guess?), but at least I think it describes the argument they are making.

It is sort of interesting how it highlights the different between a system that can write to any position in memory vs. one that can only append to memory while being required to delete the first memory cell (where memory is a queue).

However, the overall paper is awful and pretty hard to take seriously.

This response by the authors to the reviews to me looks like they didn't really understand the objection:

> First, it's worth noting that different reviewers sometimes gave opposite critiques of the paper, e.g. Reviewer erh8: The conclusion in this paper is questionable... It contradicts to [1], which shows that Transformers are Turing-complete Reviewer bz3o: The main claim that transformers with finite attention span are not computationally universal is both somewhat obvious and previously already stated

If I'm reading the reviews correctly, the claim by both reviewers was that transformers are actually Turing complete, but one reviewer added that they're "obviously" not Turing complete if you restrict their memory a priori (which I would agree is obvious). So there isn't really a contradiction between the reviews.

From briefly skimming the paper, this does look indeed to me like researchers which aren't really familiar with theoretical CS trying to handwave their way into something that looks ground-breaking. But while you absolutely can get away with vague-ish description in a more experimental part of CS, you absolutely can't get away with it in computability theory - that field is rigorous, and basically maths.

inciampati · 2 years ago

Transformers with fixed precision and any amount of memory are not Turing complete. They're bounded by a variant of first order logic with counting quantifiers. https://arxiv.org/abs/2301.10743

This supports the intuitive idea that they "learn circuits for things" that I've heard a few others mention.

If they have infinite precision then I suppose they can simulate a Turing machine in their infinitely precise states...

It appears that it's exactly the distinction between fixed and arbitrary precision that decides whether or not transformers are Turing Complete: https://openreview.net/forum?id=HyGBdo0qFm&noteId=r1lxKiAJlV

I think the larger point though is that "is X Turing Complete?" depends on the exact assumptions made and that's a point that the authors of the paper IMHO failed to understand while reading the reviews.

DoubleDerper · 2 years ago

Hear me out. Try three transformers.

_lqaf · 2 years ago

To save time:

https://www.theonion.com/fuck-everything-were-doing-five-bla...

zontorol · 2 years ago

https://gillette.com/en-us/products/razors-trimmers-and-blad...

pseudosavant · 2 years ago

The link to that Onion article was worth a double up vote!

compacct27 · 2 years ago

Can’t. Three Body Problem.

junipertea · 2 years ago

Is the discussion about the paper, or about how it was unilaterally rejected?

Deleted Comment

K0balt · 2 years ago

Meh, let’s see an open license implementation.

Unfortunately, I don’t find the acceptance or rejection of a paper in this field has much predictive value these days. By and large, the reviewers tend to be more and more like Wikipedia editors as time goes on.

bee_rider · 2 years ago

Whether or not the review process works, open review is a slightly different process. I’d expect a skeptic of the review process to celebrate the idea of reviewers “showing their work.”

morthaur · 2 years ago

> Let’s consider a simple problem that finite automata cannot solve: recognizing the language L = {anbn|n ≥ 0}. This language consists of strings with n ’a’s followed by n ’b’s. > A regular expression or finite automaton can recognize strings in this language up to a certain length. For example, the regular expression a∗b∗ can recognize strings in L up to length 2.

That regex makes no guarantee that the number of a's matches the number of b's, which doesn't match their language definition. I think they wanted (ab)*, which does, and can match any string in their language.

Wow, that's wild. Not only are they writing down things everybody in that field should know, but they fail to even properly understand the very basics of regular languages. Literally what you learn in the first week of an intro to CS course.

What they wrote makes absolutely no sense, I have no idea where this confusion came from.

They're also at least imprecise (or possibly again confused) when they write that "no regular expression [...] can recognize all strings in L" - they should have written "exactly the strings in L", because you can of course write a regex (or FSM) that recognises every string.

However, they definitely shouldn't have written (ab)* because this doesn't recognise L either (it does keep the number of as and bs balanced, but they're in the wrong order) - and with good reason, that language is simply not regular: as they do correctly write, it would require arbitrarily large memory (formally, that's the Myhill-Nerode theorem).

(Also, as a side-note, it's "an automaton", not "an automata", and the fact that they have no spacing around the | in their set notation is also a minor pet peeve of mine. But now I'm splitting hairs.)