I am highly skeptical of LLMs as a mechanism to achieve AGI, but I also find this paper fairly unconvincing, bordering on tautological. I feel similarly about this as to what I've read of Chalmers - I agree with pretty much all of the conclusions, but I don't feel like the text would convince me of those conclusions if I disagreed; it's more like it's showing me ways of explaining or illustrating what I already believed.
On embodiment - yes, LLMs do not have corporeal experience. But it's not obvious that this means that they cannot, a priori, have an "internal" concept of reality, or that it's impossible to gain such an understanding from text. The argument feels circular: LLMs are similar to a fake "video game" world because they aren't real people - therefore, it's wrong to think that they could be real people? And the other half of the argument is that because LLMs can only see text, they're missing out on the wider world of non-textual communication; but then, does that mean that human writing is not "real" language? This argument feels especially weak in the face of multi-modal models that are in fact able to "see" and "hear".
The other flavor of argument here is that LLM behavior is empirically non-human - e.g., the argument about not asking for clarification. But that only means that they aren't currently matching humans, not that they couldn't.
Basically all of these arguments feel like they fall down to the strongest counterargument I see proposed by LLM-believers, which is that sufficiently advanced mimicry is not only indistinguishable from the real thing, but at the limit in fact is the real thing. If we say that it's impossible to have true language skills without implicitly having a representation of self and environment, and then we see an entity with what appears to be true language skills, we should conclude that that entity must contain within it a representation of self and environment. That argument doesn't rely on any assumptions about the mechanism of representation other than a reliance on physicalism. Looking at it from the other direction, if you assume that all that it means to "be human" is encapsulated in the entropy of a human body, then that concept is necessarily describable with finite entropy. Therefore, by extension, there must be some number of parameters and some model architecture that completely encode that entropy. Questions like whether LLMs are the perfect architecture or whether the number of parameters required is a number that can be practically stored on human-manufacturable media are engineering questions, not philosophical ones: finite problems admit finite solutions, full stop.
Again, that conclusion feels wrong to me... but if I'm being honest with myself, I can't point to why, other than to point at some form of dualism or spirituality as the escape hatch.
> sufficiently advanced mimicry is not only indistinguishable from the real thing, but at the limit in fact is the real thing
I am continually surprised at how relevant and pervasive one of Kurt Vonnegut’s major insights is: “we are what we pretend to be, so we must be very careful about what we pretend to be”
Everyone in the "life imitates art, not the other way around" camp (and also neo-platonists/gnostics i.e. https://en.wikipedia.org/wiki/Demiurge ) is getting massively validated by the modern advances in AI right now.
Isn't any formal "proof" or "reasoning" that shows that something cannot be AGI inherently flawed, because we have a hard time formally describing what AGI is anyway.
Like your argument: embodiment is missing in LLMs, but is it needed for AGI? Nobody knows.
I feel we first have to do a better job defining the basics of intelligence, we can then define what it means to be an AGI, and only then can we prove that something is, or is not, AGI.
It seems that we skipped step 1 because its too hard, and jumped straight to step 3.
Yep, this is a big part of it. Intelligence and consciousness are barely understood beyond "I'll know it when I see it", which doesn't work for things you can't see - and in the case of consciousness, most definitions are explicitly based on concepts that are not only invisible but ineffable. And then we have no solid idea whether these things we can't really define, detect, or explain are intrinsically linked to each other or have a causal relationship in either direction. Almost any definition you pick is going to lead to some unsatisfying conclusions vis a vis non-human animals or "obviously not intelligent" forms of machine learning.
To me LLMs seem to most closely resemble the regions of the brain used for converting speech to abstract thought and vice-versa, because LLMs are very good at generating natural language and knowing the flow of speech. An LLM is similar to if you took the the Wernicke's and Broca's Areas and stuck a regression between them. The problem is that the regression in the middle is just a brute force of the entire world's knowledge instead of a real thought.
I think the major lessons from the success of LLMs are two: 1) the astonishing power of a largely trivial association engine based only on the semantic categories inferred by word2vec, and 2) that so much of the communication abilities of the human mind require so little rational thought (since LLMs demonstrate essentially none of the skills in Kahneman's and Tversky's System 2 thinking (logic, circumspection, self-correction, reflection, etc).
I guess this also disproves Minsky's 'Society of Mind' conjecture - a large part of human cognition (System 1) does not require the complex interaction of heterogeneous mental components.
>that sufficiently advanced mimicry is not only indistinguishable from the real thing, but at the limit in fact is the real thing.
While sufficiently does a lot of the heavy lifting here, the indistinguishable criteria implicitly means there must be no-way to tell if it is not the real thing. The belief that it is the real thing comes from the intuition that anything that can be everything a person must be, but have that fundamental essence of being a person. I don't think people could really conceive an alternative without resorting to prejudice which they could equally apply to machines or people.
I take the arguments such as in this paper to be instead making the claim that because X cannot be Y you will never be able to make X indistinguishable from Y. It is more a prediction of future failure than a judgment on an existing thing.
I end up looking at some of these complaints from the point of view of my sometimes profession of Game Developer. When I show someone a game in development to playtest they will find a bunch of issues. The vast majority of those issues, not only am I already aware of, but I have a much more detailed perspective of what the problem is and how it might be fixed. I have been seeing the problem, over and over, every day as I work. The problem persists because there are other things to do before fixing the issue, some of which might render the issue redundant anyway.
I feel like a lot of the criticisms of AI are like this they are like the playtesters pointing out issues in the current state where those working on the problems are generally well aware of particular issues and have a variety of solutions in mind that might help.
Clear statements of deficiencies in ability are helpful as a guide to measure future success.
I'm also in the camp that LLM's cannot be an AGI on its own, on the other hand I do think the architecture might be extended to become one. There is an easy out for any criticism to say, "Well, it's not an LLM anymore".
In a way that ends up with a lot of people saying
.The current models cannot do the things we know the current models cannot do
.Future models will not be able to do those things if they are the same as the current ones
.Therefore the things that will be able to do those things will be different
> Future models will not be able to do those things if they are the same as the current ones
I think a lot of people disagree with this. People think if we just keep adding parameters and data, magic will happen. That’s kind of what happened with ChatGPT after all.
One of the issues here is that future-focused discussions often lead to wild speculation because we don’t know the future. Also, there’s often too much confidence in people’s preferred predictions (skeptical or optimistic) and it would be less heated if we admitted that we don’t know how things will look even a couple of years out, and alternative scenarios are reasonable.
So I think you’re right, it’s not enlightening. Criticism of overconfident predictions won’t be enlightening if you already believe that they’re overconfident and the future is uncertain. Conversations might be more interesting if not so focused on bad arguments of the other side.
But perhaps such criticism is still useful. How else do you deflate excessive hype or skepticism?
> LLMs do not have corporeal experience. But it's not obvious that this means that they cannot, a priori, have an "internal" concept of reality, or that it's impossible to gain such an understanding from text.
I would argue it is (obviously) impossible the way the current implementation of models work.
How could a system which produces a single next word based upon a likelihood and and a parameter called a "temperature" have a conceptual model underpinning it? Even theoretically?
Humans and animals have an obvious conceptual understanding of the world. Before we "emit" a word or a sentence, we have an idea of what we're going to say. This is obvious when talking to children, who know something and have a hard time saying it. Clearly, language is not the medium in which they think or develop thoughts, merely an imperfect (and often humorous) expression of it.
Not so with LLMs!! Generative LLMs do not have a prior concept available before they start emitting text. That the "temperature" can chaotically change the output as the tokens proceed just goes to show there is no pre-existing concept to reference. It looks right, and often is right, but generative systems are basically always hallucinating: they do not have any concepts at all. That they are "right" as often as they are is a testament to the power of curve fitting and compression of basis functions in high dimensionality spaces. But JPEGs do the same thing, and I don't believe they have a conceptual understanding of pictures.
Transformer models have been shown to spontaneously form internal, predictive models of their input spaces. This is one of the most pervasive misunderstandings about LLMs (and other transformers) around. It is of course also true that the quality of these internal models depends a lot on the kind of task it is trained on. A GPT must be able to reproduce a huge swathe of human output, so the internal models it picks out would be those that are the most useful for that task, and might not include models of common mathematical tasks, for instance, unless they are common in the training set.
Have a look at the OthelloGPT papers (can provide links if you're interested). This is one of the reasons people are so interested in them!
> How could a system which produces a single next word based upon a likelihood and and a parameter called a "temperature" have a conceptual model underpinning it? Even theoretically?
Could a creature that simply evolved to survive and reproduce possibly have a conceptual model underpinning it? Model training and evolution are very different processes, but they are both ways of optimizing a physical system. It may be the case that evolution can give rise to intelligence and model training can’t, but we need some argument to prove that.
> generative systems are basically always hallucinating: they do not have any concepts at all. That they are "right" as often as they are is a testament to the power of curve fitting and compression of basis functions in high dimensionality spaces
It's refreshing to read someone who "got it". Sad that before my upvote the comment was grayed out.
Any proponent of conceptual or other wishful/magical thinking shoud come with proofs, since it is the hypothesis that diverge from the definition of a LLM.
The argument would be that that conceptual model is encoded in the intermediate-layer parameters of the model, in a different but analogous way to how it's encoded in the graph and chemical structure of your neurons.
> I would argue it is (obviously) impossible the way the current implementation of models work.
> How could a system which produces a single next word based upon a likelihood and and a parameter called a "temperature" have a conceptual model underpinning it? Even theoretically?
Any probability distribution over strings can theoretically be factored into a product of such a “probability that next token is x given that the text so far is y”. Now, whether a probability distribution over strings can efficiently computed in this form, is another question. But, if we are being so theoretical that we don’t care about the computational cost (as long as it is finite), then the “it is next token prediction” can’t preclude anything which “it produces a probability distribution over strings” doesn’t already preclude.
As for the temperature, given any probability distribution over a discrete set, we can modify it by adding a temperature parameter. Just take the log of the probabilities according to the original probability distribution, scale them all by a factor (the inverse of the temperature, I think. Either that or the temperature, but I think it is the inverse of the temperature.), then exponentiate each of these, and then normalize to produce a probability distribution.
So, the fact that they work by next token prediction, and have a temperature parameter, cannot imply any theoretical limitation that wouldn’t apply to any other way of expressing a probability distribution over strings, as far as discussing probability distributions in the abstract, over strings, rather than talking about computational processes that implement such probability distributions over strings.
But also like,
going between P(next token is x | initial string so far is y) and P(the string begins with z) , isn’t that computationally costly?
Well, in one direction anyway.
Because like, P(next token is x|string so far is y) = P(string begins with yx) / P(string begins with y) .
Though, one might object to P(string starts with y) over P(string is y) ?
It's only because you can essentially put the llms in a simulations that you can have this argument. We can imagine the human brain also in a simulation which we can replay over and over again and adjust various parameters of the physical brain to change the temperature. These sort of arguments can never distinguish between llm and humans.
On that point, I would dispute the premise that "it's impossible to have true language skills without implicitly having a representation of self and environment". I don't see any contradiction between the following two ideas:
1. LLMs inherently lack any form of consciousness, subjective experience, emotions, or will
2. A sufficiently advanced LLM with sufficient compute resources would perform on par with human intelligence at any given task, insofar as the task is applicable to LLMs
> How could a system which produces a single next word based upon a likelihood and and a parameter called a "temperature" have a conceptual model underpinning it? Even theoretically?
You're limiting your view of their capabilities on the output format.
> Not so with LLMs!! Generative LLMs do not have a prior concept available before they start emitting text.
How do you establish that? What do you think of othellogpt? That seems to form an internal world model.
> That the "temperature" can chaotically change the output as the tokens proceed
Changing the temperature forcibly makes the model pick words it thinks fit worse. Of course it changes the output. It's like an improv game with someone shouting "CHANGE!".
Let's make two tiny changes.
One, let's tell a model to use the format
<innerthought>askjdhas</innerthought> as the voice in their head, and <speak>blah</speak> for the output.
Second, let's remove temperature and keep it at 0 so we're not playing a game where we force them to choose different words.
> On embodiment - yes, LLMs do not have corporeal experience.
My own thought on this (as someone who believes embodiment is essential) is to consider the rebuttals to Searle's Chinese Room thought experiment.
For now (and the foreseeable future) humans are the embodiment of LLMs. In some sense, we could be seen as playing the role of a centralized AIs nervous system.
Rebuttals of Chinese rooms are also rebuttals of embodiment as a requirement! To say the system of person+books speaks Chinese is to say that good enough emulation of a process has all the qualities of the emulated process, and can substitute for it. Embodiment then cannot be essential, because we could emulate it instead.
The crux of the video game analogy seems to be that when you go close to an object, the resolution starts blurring and the illusion gets broken, and there is a similar thing that happens with LLMs (as of today) as well. This is, so far, reasonable based on daily experience with these models.
The extension of that argument being made in the paper is that a model trained on language tokens spewed by humans is incapable of actually reaching that limit where this illusion will never breakdown in resolution. That also seems reasonable to me. They use the word "languaging" in verb form as opposed to "language" as a noun to express this.
Why are LLMs incapable of reaching that limit? It's very easy to imagine video games getting to that point. We have all the data to see objects right down to the atomic level, which is plenty more than you'd need for a game. It's mostly a matter of compute. Why then should LLMs breakdown if they can at least mimic the smartest humans? We don't need "resolution" beyond that.
There are many finite problems that absolutely do not admit finite solutions. Full stop.
I think the deeper point of the paper is that you simply cannot generate an intelligent entity by just looking at recorded language. You can create a dictionary, and a map - but one must not mistake this map for the territory.
The human brain is a finite solution, so we already have an existence proof. That means a lot for our confidence in the solvability of this kind of problem.
It is also not universally impossible to reconstruct a function of finite complexity from only samples of its inputs and outputs. It is sometimes possible to draw a map that is an exact replica of the territory.
> . I feel similarly about this as to what I've read of Chalmers - I agree with pretty much all of the conclusions, but I don't feel like the text would convince me of those conclusions if I disagreed;
my limited experience of reading Chalmers is that he doesn't actually present evidence - he goes on a meandering rant and then claims to have proved things that he didn't even cover. it was the most infuriating read of my life, I heavily annotated two chapters and then finally gave up and donated the book.
I haven't read any Chalmers so I can't comment on his writing style. I have seen him in several videos on discussion panels and on podcasts.
One thing I appreciate is he often states his premises, or what modern philosophers seem to call "commitments". I wouldn't go so far as to say he uses air-tight logic to reason from these premises/commitments to conclusions - but at the least his reasoning doesn't seem to stray too far from those commitments.
I think it would be fair to argue that not all of his commitments are backed by physical evidence (and perhaps some of them could be argued to go against some physical evidence). And so you are free to reject his commitments and therefore reject his conclusions.
In fact, I think the value of philosophers like Chalmers is less in their specific commitments and conclusions and more in their framing of questions. It can be useful to list out his commitments and find out where you stand on each of them, and then to do your own reasoning using logic to see what conclusions your own set of commitments forces you into.
>> Again, that conclusion feels wrong to me... but if I'm being honest with myself, I can't point to why, other than to point at some form of dualism or spirituality as the escape hatch.
I like how Chomsky deals with it who doesn't have any spirituality at all, the big degenerate materialist:
As far as I can see all of this [he's speaking about the Loebner Prize and the Turing test in general] is entirely pointless. It's like asking how we can determine empirically whether an aeroplane can fly the answer being if it can fool someone into thinking that it's an eagle under some conditions.
He's right, you know. It should be possible to tell whether something is intelligent just as easily as it is to say that something is flying. If there are endless arguments about it, then it's probably not intelligent (yet). Conversely, if everyone can agree it is intelligent then it probably is.
Because it's not easy to tell whether something is flying. Definitions like that fall apart every time we encounter something out of the ordinary. If you take the criterion of "there's no discussion about it", then you're limiting the definition to that which is familiar, not that which is interesting.
Is an ekranoplan flying? Is an orbiting spaceship flying? Is a hovercraft flying? Is a chicken flapping its wings over a fence flying?
Your criterion would suggest the answer of "no" to any of those cases, even though those cover much of the same use cases as flying, and possibly some new, more interesting ones.
And I don't think an AGI must be limited to the familiar notion of intelligence to be considered an AGI, or, at the very least, to open up avenues that were closed before.
Everyone seems to want to discuss whether there’s some fundamental qualia preventing my toaster from being an AGI, but no one is interested in acknowledging that my toaster isn’t an AGI. Maybe a larger toaster would be an AGI? Or one with more precise toastiness controls? One with more wattage?
The only thing this paper prove is that folks at Trinity College in Dublin are poor, envious anthropocentric drunkards, ready to throw every argument to defend their crown of creating, without actually understanding the linguistics concepts they use to make their argument.
Not much new here. The basic criticism is that LLMs are not embodied; they have no interaction with the real world. The same criticism can be applied to most office work.
Useful insight: "We (humans) are always doing more than one thing." This is in the sense of language output having goals for the speaker, not just delivering information. This is related to the problem of LLMs losing the thread of a conversation. Probably the only reasonably new concept in this paper.
Standard rant: "Humans are not brains that exist in a vat..."
"LLMs ... have nothing at stake." Arguable, in that some LLMs are trained using punishment. Which seems to have strong side effects. The undesirable behavior is suppressed, but so is much other behavior. That's rather human-like.
"LLMs Don’t Algospeak". The author means using word choices to get past dumb censorship algorithms. That's probably do-able, if anybody cares.
The optimization process adjusts the weights of a computational graph until the numeric outputs align with some baseline statistics of a large data set. There is no "punishment" or "reward", gradient descent isn't even necessary as there are methods for modifying the weights in other ways and the optimization still converges to a desired distribution which people claim is "intelligent".
The converse is that people are "just" statistical distributions of the signals produced by them but I don't know if there are people who claim they are nothing more than statistical distributions.
I think people are confused because they do not really understand how software and computers work. I'd say they should learn some computability theory to gain some clarity but I doubt they'd listen.
If you really want to phrase it that way, organisms like us are "just" distributions of genes that have been pushed this way and that by natural selection until they converged to something we consider intelligent (humans).
It's pretty clear that these optimisation processes lead to emergent behaviour, both in ML and in the natural sciences. Computability theory isn't really relevant here.
Good summary of some of the main "theoretical" criticism of LLMs but I feel that it's a bit dated and ignores the recent trend of iterative post-training, especially with human feedback. Major chatbots are no doubt being iteratively refined on the feedback from users i.e. interaction feedback, RLHF, RLAIF. So ChatGPT could fall within the sort of "enactive" perspective on language and definitely goes beyond the issues of static datasets and data completeness.
Sidenote: the authors make a mistake when citing Wittgenstein to find similarity between humans and LLMs. Language modelling on a static dataset is mostly not a language game (see Bender and Koller's section on distributional semantics and caveats on learning meaning from "control codes")
it does. that's what the "direct preference" part of DPO means. you just avoid training an explicit reward model on it like in rlhf and instead directly optimize for log probability of preferred vs dispreferred responses
The authors of this paper are just another instance of the AI hype being used by people who have no connection to it, to attract some kind of attention.
"Here is what we think about this current hot topic; please read our stuff and cite generously ..."
> Language completeness assumes that a distinct and complete thing such as `a natural language' exists, the essential characteristics of which can be effectively and comprehensively modelled by an LLM
Replace "LLM" by "linguistics". Same thing.
> The assumption of data completeness relies on the belief that a language can be quantified and wholly captured by data.
That's all that a baby has, who becomes a native speaker of their surrounding language. Language acquisition does not imply totality of data. Not every native speaker recognizes exactly the same vocabulary and exactly the same set of grammar rules.
Babies have feedback and interaction with someone speaking to them. Would they learn to speak if you just dumped them in front of a TV and never spoke to them? I'm not sure.
But anyway I agree with you. This is just a confused HN comment in paper form.
I personally don’t get much value out of the paper, but it is orders of magnitude more substantive and thoughtful than a median “confused Hacker News comment”.
> Babies have feedback and interaction with someone speaking to them. Would they learn to speak if you just dumped them in front of a TV and never spoke to them? I'm not sure.
Feedback and interaction is not vital for acquisition for secondary language learning at least according to one theory.
And if that’s good enough for adults it might be good enough for sponge-brain babies.
They are two researchers/assistant professors working with cognitive science, psychology, and trustworthy AI. The paper is peer reviewed and has been accepted for publication in the Journal of Language Sciences.
You should publish your critique of their research in that same journal.
P.s. if you find any grave mistakes, you can contact the editor in chief, who happens to be a linguist.
Their critique is written here, in plain english. Any fault with it you can just mention. The "I won't read your comment unless you get X journal to publish it" seems really counterproductive. Presumably even the great Journal of Language Sciences is not above making mistakes or publishing things that are not perfect.
The "efficient journal hypothesis" -- if something is written in a paper in a journal, then it's impossible for anyone to know any better, since if they knew better, they would already have published the correction in a journal.
The parent comment I responded to is speculative and does not argue on the merits. We can do better here.
Are there people who ride the hype wave of AI? Sure.
But how can you tell from where you sit? How do you come to such a judgment? Are you being thoughtful and rational?
Have you considered an alternative explanation? I think the odds are much greater that the authors’ academic roots/training is at odds with what you think is productive. (This is what I think, BTW. I found the paper to be a waste of my time. Perhaps others can get value from it?)
But I don’t pretend to know the authors’ motivations, nor will I cast aspersions on them.
When one casts shade on a person like the comment above did, one invites and deserves this level of criticism.
That's a lot of thinking they've done about LLMs, but how much did they actually try LLMs? I have long threads where ChatGPT refine solutions to coding problems. Their example of losing the thread after printing a tiny list of 10 philosophers seems really outdated. Also it seems LLMs utilize nested contexts as well, for example when it can break it' own rules while telling a story or speaking hypothetically.
For a paper submitted on July 11, 2024, and with several references to other 2024 publications, it is indeed strange that it gives ChatGPT output from April 2023 to demonstrate that “LLMs lose the thread of a conversation with inhuman ease, as outputs are generated in response to prompts rather than a consistent, shared dialogue” (Figure 1). I have had many consistent, shared dialogues with recent versions of ChatGPT and Claude without any loss of conversation thread even after many back-and-forths.
Most LLM critics (and singularity-is-near influencers) don't actually use the systems enough to have relevant opinions about them. The only really good sources of truth is the chatbot-arena from lmsys and the comment section of r/localllama (I'm quoting Karpathy), both are "wisdom of the crowd" and often the crowd on r/localllama is getting that wisdom by spending hours with one hand on the keyboard and another under their clothes.
There is a lot of frustration here over what appears to be essentially this claim:
> ...we argue that it is possible to offer generous interpretations of some aspects of LLM engineering to find parallels with human language learning. However, in the majority of key aspects of language learning and use, most specifically in the various kinds of linguistic agency exhibited by human beings, these small apparent comparisons do little to balance what are much more deep-rooted contrasts.
Now, why is this so hard to stomach? This is the argument of this paper. To feel like this extremely general claim is something you have to argue against means you believe in a fundamental similarity between what our linguistic agency and the model. But is embodied human agency something that you really need the LLMs to have right now? Why? What are the stakes here? The ones actually related to the argument at hand?
This ultimately not that strong of a claim! To the point that its almost vacuous... Of course the LLM will never learn the stove is "hot" like you did when you were a curious child. How can this still be too much to admit for someone? What is lost?
It makes me feel little crazy here that people constantly jump over the text at hand whenever something gets a little too philosophical, and the arguments become long pseudo-theories that aren't relevant to argument.
“Enactivism” really? I wonder if these complaints will continue as LLMs see wider adoption, the old first they ignore you, then they ridicule you, then they fight you… trope that is halfways accurate. Any field that focuses on building theories on top of theories is in for a bad time.
Where I work, there's a somewhat haphazardly divided org structure, where my team has some responsibility to answer the executives demands for "use AI to help our core business". So we applied off-the-shelf models to extract structured context from mostly unstructured text - effectively a data engineering job - and thereby support analytics and create more dashboards for the execs to mull over.
Another team, with a similar role in a different part of the org has jumped (feet first) into optimizing large language models to turn them into agents, without consulting the business about whether they need such things. RAG, LoRA and all this optimization is well and good, but this engineering focus has found no actual application, expect wasting several million bucks hiring staff to do something nobody wants.
On embodiment - yes, LLMs do not have corporeal experience. But it's not obvious that this means that they cannot, a priori, have an "internal" concept of reality, or that it's impossible to gain such an understanding from text. The argument feels circular: LLMs are similar to a fake "video game" world because they aren't real people - therefore, it's wrong to think that they could be real people? And the other half of the argument is that because LLMs can only see text, they're missing out on the wider world of non-textual communication; but then, does that mean that human writing is not "real" language? This argument feels especially weak in the face of multi-modal models that are in fact able to "see" and "hear".
The other flavor of argument here is that LLM behavior is empirically non-human - e.g., the argument about not asking for clarification. But that only means that they aren't currently matching humans, not that they couldn't.
Basically all of these arguments feel like they fall down to the strongest counterargument I see proposed by LLM-believers, which is that sufficiently advanced mimicry is not only indistinguishable from the real thing, but at the limit in fact is the real thing. If we say that it's impossible to have true language skills without implicitly having a representation of self and environment, and then we see an entity with what appears to be true language skills, we should conclude that that entity must contain within it a representation of self and environment. That argument doesn't rely on any assumptions about the mechanism of representation other than a reliance on physicalism. Looking at it from the other direction, if you assume that all that it means to "be human" is encapsulated in the entropy of a human body, then that concept is necessarily describable with finite entropy. Therefore, by extension, there must be some number of parameters and some model architecture that completely encode that entropy. Questions like whether LLMs are the perfect architecture or whether the number of parameters required is a number that can be practically stored on human-manufacturable media are engineering questions, not philosophical ones: finite problems admit finite solutions, full stop.
Again, that conclusion feels wrong to me... but if I'm being honest with myself, I can't point to why, other than to point at some form of dualism or spirituality as the escape hatch.
I am continually surprised at how relevant and pervasive one of Kurt Vonnegut’s major insights is: “we are what we pretend to be, so we must be very careful about what we pretend to be”
https://en.wikipedia.org/wiki/Life_imitating_art
Everyone in the "life imitates art, not the other way around" camp (and also neo-platonists/gnostics i.e. https://en.wikipedia.org/wiki/Demiurge ) is getting massively validated by the modern advances in AI right now.
Like your argument: embodiment is missing in LLMs, but is it needed for AGI? Nobody knows.
I feel we first have to do a better job defining the basics of intelligence, we can then define what it means to be an AGI, and only then can we prove that something is, or is not, AGI.
It seems that we skipped step 1 because its too hard, and jumped straight to step 3.
It's a real mess.
I guess this also disproves Minsky's 'Society of Mind' conjecture - a large part of human cognition (System 1) does not require the complex interaction of heterogeneous mental components.
While sufficiently does a lot of the heavy lifting here, the indistinguishable criteria implicitly means there must be no-way to tell if it is not the real thing. The belief that it is the real thing comes from the intuition that anything that can be everything a person must be, but have that fundamental essence of being a person. I don't think people could really conceive an alternative without resorting to prejudice which they could equally apply to machines or people.
I take the arguments such as in this paper to be instead making the claim that because X cannot be Y you will never be able to make X indistinguishable from Y. It is more a prediction of future failure than a judgment on an existing thing.
I end up looking at some of these complaints from the point of view of my sometimes profession of Game Developer. When I show someone a game in development to playtest they will find a bunch of issues. The vast majority of those issues, not only am I already aware of, but I have a much more detailed perspective of what the problem is and how it might be fixed. I have been seeing the problem, over and over, every day as I work. The problem persists because there are other things to do before fixing the issue, some of which might render the issue redundant anyway.
I feel like a lot of the criticisms of AI are like this they are like the playtesters pointing out issues in the current state where those working on the problems are generally well aware of particular issues and have a variety of solutions in mind that might help.
Clear statements of deficiencies in ability are helpful as a guide to measure future success.
I'm also in the camp that LLM's cannot be an AGI on its own, on the other hand I do think the architecture might be extended to become one. There is an easy out for any criticism to say, "Well, it's not an LLM anymore".
In a way that ends up with a lot of people saying
.The current models cannot do the things we know the current models cannot do
.Future models will not be able to do those things if they are the same as the current ones
.Therefore the things that will be able to do those things will be different
That is true, but hardly enlightening.
I think a lot of people disagree with this. People think if we just keep adding parameters and data, magic will happen. That’s kind of what happened with ChatGPT after all.
So I think you’re right, it’s not enlightening. Criticism of overconfident predictions won’t be enlightening if you already believe that they’re overconfident and the future is uncertain. Conversations might be more interesting if not so focused on bad arguments of the other side.
But perhaps such criticism is still useful. How else do you deflate excessive hype or skepticism?
I would argue it is (obviously) impossible the way the current implementation of models work.
How could a system which produces a single next word based upon a likelihood and and a parameter called a "temperature" have a conceptual model underpinning it? Even theoretically?
Humans and animals have an obvious conceptual understanding of the world. Before we "emit" a word or a sentence, we have an idea of what we're going to say. This is obvious when talking to children, who know something and have a hard time saying it. Clearly, language is not the medium in which they think or develop thoughts, merely an imperfect (and often humorous) expression of it.
Not so with LLMs!! Generative LLMs do not have a prior concept available before they start emitting text. That the "temperature" can chaotically change the output as the tokens proceed just goes to show there is no pre-existing concept to reference. It looks right, and often is right, but generative systems are basically always hallucinating: they do not have any concepts at all. That they are "right" as often as they are is a testament to the power of curve fitting and compression of basis functions in high dimensionality spaces. But JPEGs do the same thing, and I don't believe they have a conceptual understanding of pictures.
Have a look at the OthelloGPT papers (can provide links if you're interested). This is one of the reasons people are so interested in them!
Could a creature that simply evolved to survive and reproduce possibly have a conceptual model underpinning it? Model training and evolution are very different processes, but they are both ways of optimizing a physical system. It may be the case that evolution can give rise to intelligence and model training can’t, but we need some argument to prove that.
It's refreshing to read someone who "got it". Sad that before my upvote the comment was grayed out.
Any proponent of conceptual or other wishful/magical thinking shoud come with proofs, since it is the hypothesis that diverge from the definition of a LLM.
> How could a system which produces a single next word based upon a likelihood and and a parameter called a "temperature" have a conceptual model underpinning it? Even theoretically?
Any probability distribution over strings can theoretically be factored into a product of such a “probability that next token is x given that the text so far is y”. Now, whether a probability distribution over strings can efficiently computed in this form, is another question. But, if we are being so theoretical that we don’t care about the computational cost (as long as it is finite), then the “it is next token prediction” can’t preclude anything which “it produces a probability distribution over strings” doesn’t already preclude.
As for the temperature, given any probability distribution over a discrete set, we can modify it by adding a temperature parameter. Just take the log of the probabilities according to the original probability distribution, scale them all by a factor (the inverse of the temperature, I think. Either that or the temperature, but I think it is the inverse of the temperature.), then exponentiate each of these, and then normalize to produce a probability distribution.
So, the fact that they work by next token prediction, and have a temperature parameter, cannot imply any theoretical limitation that wouldn’t apply to any other way of expressing a probability distribution over strings, as far as discussing probability distributions in the abstract, over strings, rather than talking about computational processes that implement such probability distributions over strings.
But also like, going between P(next token is x | initial string so far is y) and P(the string begins with z) , isn’t that computationally costly? Well, in one direction anyway. Because like, P(next token is x|string so far is y) = P(string begins with yx) / P(string begins with y) .
Though, one might object to P(string starts with y) over P(string is y) ?
1. LLMs inherently lack any form of consciousness, subjective experience, emotions, or will
2. A sufficiently advanced LLM with sufficient compute resources would perform on par with human intelligence at any given task, insofar as the task is applicable to LLMs
You're limiting your view of their capabilities on the output format.
> Not so with LLMs!! Generative LLMs do not have a prior concept available before they start emitting text.
How do you establish that? What do you think of othellogpt? That seems to form an internal world model.
> That the "temperature" can chaotically change the output as the tokens proceed
Changing the temperature forcibly makes the model pick words it thinks fit worse. Of course it changes the output. It's like an improv game with someone shouting "CHANGE!".
Let's make two tiny changes.
One, let's tell a model to use the format
<innerthought>askjdhas</innerthought> as the voice in their head, and <speak>blah</speak> for the output.
Second, let's remove temperature and keep it at 0 so we're not playing a game where we force them to choose different words.
Now what remains of the argument?
My own thought on this (as someone who believes embodiment is essential) is to consider the rebuttals to Searle's Chinese Room thought experiment.
For now (and the foreseeable future) humans are the embodiment of LLMs. In some sense, we could be seen as playing the role of a centralized AIs nervous system.
The extension of that argument being made in the paper is that a model trained on language tokens spewed by humans is incapable of actually reaching that limit where this illusion will never breakdown in resolution. That also seems reasonable to me. They use the word "languaging" in verb form as opposed to "language" as a noun to express this.
I think the deeper point of the paper is that you simply cannot generate an intelligent entity by just looking at recorded language. You can create a dictionary, and a map - but one must not mistake this map for the territory.
It is also not universally impossible to reconstruct a function of finite complexity from only samples of its inputs and outputs. It is sometimes possible to draw a map that is an exact replica of the territory.
my limited experience of reading Chalmers is that he doesn't actually present evidence - he goes on a meandering rant and then claims to have proved things that he didn't even cover. it was the most infuriating read of my life, I heavily annotated two chapters and then finally gave up and donated the book.
One thing I appreciate is he often states his premises, or what modern philosophers seem to call "commitments". I wouldn't go so far as to say he uses air-tight logic to reason from these premises/commitments to conclusions - but at the least his reasoning doesn't seem to stray too far from those commitments.
I think it would be fair to argue that not all of his commitments are backed by physical evidence (and perhaps some of them could be argued to go against some physical evidence). And so you are free to reject his commitments and therefore reject his conclusions.
In fact, I think the value of philosophers like Chalmers is less in their specific commitments and conclusions and more in their framing of questions. It can be useful to list out his commitments and find out where you stand on each of them, and then to do your own reasoning using logic to see what conclusions your own set of commitments forces you into.
I like how Chomsky deals with it who doesn't have any spirituality at all, the big degenerate materialist:
As far as I can see all of this [he's speaking about the Loebner Prize and the Turing test in general] is entirely pointless. It's like asking how we can determine empirically whether an aeroplane can fly the answer being if it can fool someone into thinking that it's an eagle under some conditions.
https://youtu.be/0hzCOsQJ8Sc?si=MUXpmIwAzcla9lvK&t=2052
(My transcript)
He's right, you know. It should be possible to tell whether something is intelligent just as easily as it is to say that something is flying. If there are endless arguments about it, then it's probably not intelligent (yet). Conversely, if everyone can agree it is intelligent then it probably is.
Because it's not easy to tell whether something is flying. Definitions like that fall apart every time we encounter something out of the ordinary. If you take the criterion of "there's no discussion about it", then you're limiting the definition to that which is familiar, not that which is interesting.
Is an ekranoplan flying? Is an orbiting spaceship flying? Is a hovercraft flying? Is a chicken flapping its wings over a fence flying?
Your criterion would suggest the answer of "no" to any of those cases, even though those cover much of the same use cases as flying, and possibly some new, more interesting ones.
And I don't think an AGI must be limited to the familiar notion of intelligence to be considered an AGI, or, at the very least, to open up avenues that were closed before.
Not much new here. The basic criticism is that LLMs are not embodied; they have no interaction with the real world. The same criticism can be applied to most office work.
Useful insight: "We (humans) are always doing more than one thing." This is in the sense of language output having goals for the speaker, not just delivering information. This is related to the problem of LLMs losing the thread of a conversation. Probably the only reasonably new concept in this paper.
Standard rant: "Humans are not brains that exist in a vat..."
"LLMs ... have nothing at stake." Arguable, in that some LLMs are trained using punishment. Which seems to have strong side effects. The undesirable behavior is suppressed, but so is much other behavior. That's rather human-like.
"LLMs Don’t Algospeak". The author means using word choices to get past dumb censorship algorithms. That's probably do-able, if anybody cares.
[1] https://arxiv.org/pdf/2407.08790
The converse is that people are "just" statistical distributions of the signals produced by them but I don't know if there are people who claim they are nothing more than statistical distributions.
I think people are confused because they do not really understand how software and computers work. I'd say they should learn some computability theory to gain some clarity but I doubt they'd listen.
It's pretty clear that these optimisation processes lead to emergent behaviour, both in ML and in the natural sciences. Computability theory isn't really relevant here.
Sidenote: the authors make a mistake when citing Wittgenstein to find similarity between humans and LLMs. Language modelling on a static dataset is mostly not a language game (see Bender and Koller's section on distributional semantics and caveats on learning meaning from "control codes")
IIRC DPO doesn’t have human feedback in the loop
"Here is what we think about this current hot topic; please read our stuff and cite generously ..."
> Language completeness assumes that a distinct and complete thing such as `a natural language' exists, the essential characteristics of which can be effectively and comprehensively modelled by an LLM
Replace "LLM" by "linguistics". Same thing.
> The assumption of data completeness relies on the belief that a language can be quantified and wholly captured by data.
That's all that a baby has, who becomes a native speaker of their surrounding language. Language acquisition does not imply totality of data. Not every native speaker recognizes exactly the same vocabulary and exactly the same set of grammar rules.
But anyway I agree with you. This is just a confused HN comment in paper form.
Feedback and interaction is not vital for acquisition for secondary language learning at least according to one theory.
And if that’s good enough for adults it might be good enough for sponge-brain babies.
https://en.wikipedia.org/wiki/Input_hypothesis
You should publish your critique of their research in that same journal.
P.s. if you find any grave mistakes, you can contact the editor in chief, who happens to be a linguist.
Their critique is written here, in plain english. Any fault with it you can just mention. The "I won't read your comment unless you get X journal to publish it" seems really counterproductive. Presumably even the great Journal of Language Sciences is not above making mistakes or publishing things that are not perfect.
No thanks; that would be at least twice removed from Making Stuff.
(Once removed is writing about Making Stuff.)
Are there people who ride the hype wave of AI? Sure.
But how can you tell from where you sit? How do you come to such a judgment? Are you being thoughtful and rational?
Have you considered an alternative explanation? I think the odds are much greater that the authors’ academic roots/training is at odds with what you think is productive. (This is what I think, BTW. I found the paper to be a waste of my time. Perhaps others can get value from it?)
But I don’t pretend to know the authors’ motivations, nor will I cast aspersions on them.
When one casts shade on a person like the comment above did, one invites and deserves this level of criticism.
> ...we argue that it is possible to offer generous interpretations of some aspects of LLM engineering to find parallels with human language learning. However, in the majority of key aspects of language learning and use, most specifically in the various kinds of linguistic agency exhibited by human beings, these small apparent comparisons do little to balance what are much more deep-rooted contrasts.
Now, why is this so hard to stomach? This is the argument of this paper. To feel like this extremely general claim is something you have to argue against means you believe in a fundamental similarity between what our linguistic agency and the model. But is embodied human agency something that you really need the LLMs to have right now? Why? What are the stakes here? The ones actually related to the argument at hand?
This ultimately not that strong of a claim! To the point that its almost vacuous... Of course the LLM will never learn the stove is "hot" like you did when you were a curious child. How can this still be too much to admit for someone? What is lost?
It makes me feel little crazy here that people constantly jump over the text at hand whenever something gets a little too philosophical, and the arguments become long pseudo-theories that aren't relevant to argument.
https://en.m.wikipedia.org/wiki/Enactivism
Another team, with a similar role in a different part of the org has jumped (feet first) into optimizing large language models to turn them into agents, without consulting the business about whether they need such things. RAG, LoRA and all this optimization is well and good, but this engineering focus has found no actual application, expect wasting several million bucks hiring staff to do something nobody wants.