Readit News logoReadit News
lsy · a year ago
Note that for the purposes of this paper a “problem” just means a formally decidable problem or a formal language, and the proof is that by creatively arranging transformers you can make individual transformer runs behave like individual Boolean circuits. However, this is a long way from any practical application of transformers: for one thing, most problems we care about are not stated as formal languages, and we already have an exceptionally more efficient way to implement Boolean circuits.
shawntan · a year ago
If a "problem we care about" is not stated as a formal language, does it mean it does not exist in the hierarchy of formal languages? Or is it just as yet unclassified?
tsimionescu · a year ago
It means that there are two problems: one, to formalize the problem as stated while capturing all relevant details, and two, solving the resulting formal problem. Until you solve problem one, you can't use formal methods to say anything about the problem (it's not even clear a priori that a problem is even solvable).

Unfortunately, the task of a formalizing an informal problem is itself an informal problem that we don't know how to formalize, so we can't say much about it. So overall, we can't say much about how hard the general problem "given a problem statement from a human, solve that problem" is, whether any particular system (including a human!) can solve it and how long that might take with what resources.

wslh · a year ago
My 2 cents: Since LLMs (Large Language Models) operate as at least a subset of Turing machines (which recognize recursively enumerable languages), the chain of thought (CoT) approach could be equivalent to or even more expressive than that subset. In fact, CoT could perfectly be a Turing machine.

If we leave CoT aside for a moment, it's worth exploring the work discussed in the paper "Neural Networks and the Chomsky Hierarchy"[1], which analyzes how neural networks (including LLMs) map onto different levels of the Chomsky hierarchy, with a particular focus on their ability to recognize formal languages across varying complexity.

[1] https://ar5iv.labs.arxiv.org/html/2207.02098v1

julienreszka · a year ago
> most problems we care about are not stated as formal languages

then a way would be to translate them to formal language

lgessler · a year ago
I liked Yoav Goldberg snarky's quote tweet:

> next paper: transformers can solve any problem but on some of them they may compute indefinitely and never provide an answer

> (and you cannot tell in advance which is which!!)

https://twitter.com/yoavgo/status/1835802380589203802

teqsun · a year ago
It reminds me of Busy Beaver
sigmoid10 · a year ago
>Remarkably, constant depth is sufficient.

How would that be remarkable, when it is exactly what he Universal Approximation Theorem already states? Since transformers also use fully connected layers, none of this should really come as a surprise. But from glancing at the paper, they don't even mention it.

nexustext · a year ago
It's 'remarkable' because (a) academic careers are as much about hype as science, (b) arxiv doesn't have peer review process to quash this, (c) people take arxiv seriously.
logicchains · a year ago
>How would that be remarkable, when it is exactly what he Universal Approximation Theorem already states

Only with infinite precision, which is highly unrealistic. Under realistic assumptions, fixed depth transformer without chain-of-thought are very limited in what they can express: https://arxiv.org/abs/2207.00729 . Chain of thought increases the class of problems which fixed depth transformers can solve: https://arxiv.org/abs/2310.07923

IshKebab · a year ago
The universal approximation theorem has no practical relevance.
larodi · a year ago
I'm waiting for peoples of AI to discover syllogism and inference in its original PROLOG sense, which this CoT abomination basically tries to achieve. Interestingly, if all logical content is translated to rules, and then only rules are fed into the LLM training set, what would the result be, and can the probabilistic magic be made into actually following reason without all the dice.
trescenzi · a year ago
Right we’ve now gotten to the stage of this AI cycle where we start using the new tool to solve problems old tools could solve. Saying a transformer can solve any Formally decidable problem if given enough tape isn’t saying much. It’s a cool proof, don’t mean to deny that, but it doesn’t mean much practically as we already have more efficient tools that can do the same.
marcosdumay · a year ago
What I don't get is... didn't people prove that in the 90s for any multi-layer neural network? Didn't people prove transformers are equivalent on the transformers paper?
sunir · a year ago
I was thinking about the graphrag paper and prolog. I’d like to extract predicates. The source material will be inconsistent and contradictory and incomplete.

Using the clustering (community) model, an llm can summarize the opinions as a set of predicates which don’t have to agree and some general weight of how much people agree or disagree with them.

The predicates won’t be suitable for symbolic logic because the language will be loose. However an embedding model may be able to connect different symbols together.

Then you could attempt multiple runs through the database of predicates because there will be different opinions.

Then one could attempt to reason using these loosely stitched predicates. I don’t know how good the outcome would be.

I imagine this would be better in an interactive decision making tool where a human is evaluating the suggestions for the next step.

This could be better for planning than problem solving.

larodi · a year ago
Hm... a RAG over DB of logical rules actually may be interesting. But loosely stitched predicates you can easily put to work with some random dice when you decide inference.

Chris Coyne of OKCupid and KeyBase (https://chriscoyne.com/) produced ContextFree (https://www.contextfreeart.org/) before all that. It is a grammar-based inference with probabilistic chance for the inference of the next rule. Very very very inspiring, not only because of the aesthetic side of the result. Digging further you find ProbLog which allows probabilities for rules (https://dtai.cs.kuleuven.be/problog/tutorial/basic/08_rule_p...)

So how about we start thinking of AI as combination of the graphical probabilistic whatever which compresses the infromation from the training set in a very lossy manner; which is then hooked, internally or externally, with a discreet logical core, whenever CoT is needed. So this construct now can benefit from both worlds.

pkoird · a year ago
I've said this before and I'll say it again: Any sufficiently advanced LLM is indistinguishable from Prolog.
detourdog · a year ago
I’m surprised that understanding how to be thought unfolds is being considered not relevant to the answer. I have done a lot of problem solving in groups and alone. How thoughts develop seems fundamental to understand the solutions.

The story regarding the banning of terms that can be used with a reasoning system is a big red flag to me.

This sort of knee jerk reaction displays immature management and an immature technology product.

larodi · a year ago
a little late to reply, but perhaps you see this. does it not make impression to you that lots of these articles on AI that get published are very childish. not in the math sense, but in the rasoning sense. besides, most of them are anything but interdisciplinary. I've almost never encountred prompt engineers who actually tried to delve into what GPTs do, and then these CoT guys, they don't know a thing about predicat logic, yet try to invent it anew.

On your comment reg. banning tokens/terms we are on the same page. We can agree all of this is very immature, and many of the peoples also, including this lot of chinese kids who seem to put out one paper per our. You see, the original seq2seq paper is 8 pages, topic included. Can you imagine? But Sutskever was ot a child back then, he was already deep into this all. We can easily state/assume the LLM business is in its infancy. It may easily stay there for a century until everyone levels up.

wodenokoto · a year ago
But didn't we already know that NN can solve any computable problem? The interesting thing is if they can be trained to solve any (computable) problem.
imhoguy · a year ago
I don't know why I have read "HN", indeed HN can solve any problem.
tossandthrow · a year ago
Feed forward NNs can approximate all functions f: X -> Y only for closed domains.

But recurrent neural networks can do solve any computational problem given enough precision.

roboboffin · a year ago
Does that mean when we reduce the precision of a NN, for example using bfloat16 instead of float32, we reduce the set of computational problems that can be solved.

How would that compare with a biological neural network with presumably near-infinite precision ?

wodenokoto · a year ago
First day of introductions to NN we were asked to create all the logic gates using artificial neurons, and then told "If you have all gates, you can do all computations".

I got to admit, I'm sorta sticking to that at face value, because I don't know enough computer science to a) discern if that is true and b) know what "f: X -> Y only for closed domains" means.

logicchains · a year ago
Only NNs of infinite size or precision. Under more realistic assumptions, transformers without chain of thought are actually limited in what they can solve: https://arxiv.org/abs/2207.00729
nopinsight · a year ago
In the words of an author:

"What is the performance limit when scaling LLM inference? Sky's the limit.

We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed. Remarkably, constant depth is sufficient.

http://arxiv.org/abs/2402.12875 (ICLR 2024)"

https://x.com/denny_zhou/status/1835761801453306089

ec109685 · a year ago
Is this the infinite monkey Shakespeare trope?
throwup238 · a year ago
More like the universal approximation theorem extended to computation rather than network complexity: https://en.wikipedia.org/wiki/Universal_approximation_theore...
nopinsight · a year ago
A key difference is that the way LMMs (Large Multimodal Models) generate output is far from random. These models can imitate/blend existing information or imitate/probably blend known reasoning methods in the training data. The latter is a key distinguishing feature of the new OpenAI o1 models.

Thus, the signal-to-noise ratio of their output is generally way better than infinite monkeys.

Arguably, humans rely on similar modes of "thinking" most of the time as well.

CamperBob2 · a year ago
Yeah. Monkeys. Monkeys that write useful C and Python code that needs a bit less revision every time there's a model update.

Can we just give the "stochastic parrot" and "monkeys with typewriters" schtick a rest? It made for novel commentary three or four years ago, but at this point, these posts themselves read like the work of parrots. They are no longer interesting, insightful, or (for that matter) true.

tsimionescu · a year ago
One question, if anyone knows the details: does this prove that there exists a single LLM that can approximate any function to arbitrary precision given enough CoT, or does it prove that for every function, there exists a Transformer that fits those criteria?

That is, does this prove that a single LLM can solve any problem, or that for any problem, we can find an LLM that solves it?

jstanley · a year ago
Doesn't the latter imply the former?

If it's possible to find an LLM for any given problem, then find an LLM for the problem "find an LLM for the problem and then evaluate it" and then evaluate it, and then you have an LLM that can solve any problem.

It's the "Universal Turing Machine" for LLMs.

I wonder what's the LLM equivalent of the halting problem?

shawntan · a year ago
Theoretical results exist that try to quantify the number of CoT tokens needed to reach different levels of computational expressibility: https://arxiv.org/pdf/2310.07923

TL;DR: Getting to Turing completeness can require polynomial CoT tokens, wrt the input problem size. For a field that constantly harps on parallelism and compute efficiency, this requirement seems prohibitive.

We really need to get away from constant depth architectures.

benkuykendall · a year ago
> Getting to Turing completeness can require polynomial CoT tokens, wrt the input problem size.

So, as stated, this is impossible since it violates the Time Hierarchy Theorem.

The actual result of the paper is that any poly-time computable function can be computed with poly-many tokens. Which is... not a particularly impressive bound? Any non-trivial fixed neural network can, for instance, compute the NAND of two inputs. And any polynomial computable function can be computed with a polynomial number of NAND gates.

Deleted Comment

ljsprague · a year ago
Skynet's the limit.
__loam · a year ago
> We have mathematically proven that transformers can solve any problem

We should require that you've passed an algorithms and a thermodynamics class before you can post.

nopinsight · a year ago
To be clear I think the tweet is a bit exaggerated (and the word ‘performance’ there doesn’t take into account efficiency, for example) but I don’t have the time to read the full paper (just skimmed the abstract and conclusion). I quoted the tweet by an author for people to discuss since it’s still a fairly remarkable result.
bonoboTP · a year ago
This is an accepted ICLR paper by authors from Stanford, Toyota and Google. That's not a guarantee for anything, of course, but they likely know basic algorithms and the second law. You can certainly argue against their claims, but you need to put in the legwork.
riku_iki · a year ago
> Remarkably, constant depth is sufficient.

I think article also says log(n) embedding size (width?) is required, where n is size of input.

candiddevmike · a year ago
> We have mathematically proven that transformers can solve any problem, provided they are allowed to generate as many intermediate reasoning tokens as needed.

That seems like a bit of a leap here to make this seem more impressive than it is (IMO). You can say the same thing about humans, provided they are allowed to think across as many years/generations as needed.

Wake me up when a LLM figures out stable fusion or room temperature superconductors.

krackers · a year ago
I think you're misrepresenting the study. It builds upon previous work that examines the computation power of the transformer architecture from a circuit-complexity perspective. Previous work showed that the class of problems that a "naive" Transformer architecture could compute was within TC0 [1, 2] and as a consequence it was fundamentally impossible for transformers to solve certain classes of mathematical problems. This study actually provides a more realistic bound of AC0 (by analyzing the finite-precision case) which rules out even more problems, including such 'simple' ones as modular parity.

We also had previous work that hinted that part of the reason why chain-of-thought works from a theoretical perspective is that it literally allows the model to perform types of computations it could not under the more limited setting (in the same way jumping from FSMs to pushdown automata allows you to solve new types of problems) [3].

[1] https://news.ycombinator.com/item?id=35609652 [2] https://blog.computationalcomplexity.org/2023/02/why-cant-li... [3] https://arxiv.org/abs/2305.15408

Horffupolde · a year ago
It is actually impressive.

One could argue that writing enabled chain of thought across generations.

Veedrac · a year ago
> Wake me up when a LLM figures out stable fusion or room temperature superconductors.

Man, the goalposts these days.

whimsicalism · a year ago
it's a TCS result.

seems like many commenting don't know about computability

WalterSear · a year ago
> You can say the same thing about humans

1. Holy shit.

2. You can't apply Moore's law to humans.

aurareturn · a year ago
> You can say the same thing about humans, provided they are allowed to think across as many years/generations as needed.

Isn’t this a good thing since compute can be scaled so that the LLM can do generations of human thinking in a much shorter amount of time?

Say humans can solve quantum gravity in 100 years of thinking by 10,000 really smart people. If one AGI is equal to 1 really smart person. Scale enough compute for 1 million AGI and we can solve quantum gravity in a year.

The major assumption here is that transformers can indeed solve every problem humans can.

DarkNova6 · a year ago
The more interesting is whether the ability of reason and solve problems scales linearly or logarithmically.

Deleted Comment

m3kw9 · a year ago
Sort of like quantum superposition state? So here is an idea, using quantum to produce all possible inferences and use some not yet invented algorithms to collapse to the final result

Deleted Comment

tooltower · a year ago
Constant depth circuits can solve everything? I feel like I missed some important part of circuit complexity. Or this is BS.
shawntan · a year ago
Using CoT implicitly increases the depth of the circuit. But yes, poorly worded.
whimsicalism · a year ago
CoT means you're adding loops
JSDevOps · a year ago
So given infinite time and resources it can solve any problem? Hardly groundbreaking is it.
mrbungie · a year ago
The "We need more Nvidia GPUs and we will reach AGI" theorem.
dilyevsky · a year ago
Infinite Token Theorem
JTyQZSnP3cQGa8B · a year ago
Rendered useless by the infinite money problem.
bilekas · a year ago
Nice quip but in reality it's the exact same right?
zeofig · a year ago
I have faith that Nvidia can sell them infinity gpus
imhoguy · a year ago
Now it is time to prove can it loop into creating an infinite number of paperclips? /s https://en.wikipedia.org/wiki/Instrumental_convergence#Paper...
mg · a year ago
Has it been publicly benchmarked yet, if this approach:

    Hello LLM, please solve this task: <task>
Can be improved by performing this afterwards?

    for iteration in range(10):
        Hello LLM, please solve this task: <task>
        Here is a possible solution: <last_reply>
        Please look at it and see if you can improve it.
        Then tell me your improved solution.

lorepieri · a year ago
Not sure if it has been benchmarked, but I've called this technique the "for-loop of thought". :)
bachback · a year ago
for coding tasks see

https://aider.chat/docs/leaderboards/

the question is how would you define "improve" and "solve". RLHF in a way delegates this to humans.

Kiro · a year ago
Isn't that the whole reason that o1 works?
ben_w · a year ago
I think o1 is more like "pretend you're doing a job interview, think step and show your working".

I tried something similar to the suggested iterative loop on a blog post I'd authored but wanted help copy editing; first few were good enough, but then it got very confused and decided the blog post wasn't actually a blog post to be edited and instead that what I really wanted to know was the implications of Florida something something Republican Party.

Benchmark would be neat, because all I have is an anecdote.

eykrehbein · a year ago
BruteforceLLM