> LLMs mimic intelligence, but they aren’t intelligent.
I see statements like this a lot, and I find them unpersuasive because any meaningful definition of "intelligence" is not offered. What, exactly, is the property that humans (allegedly) have and LLMs (allegedly) lack, that allows one to be deemed "intelligent" and the other not?
I see two possibilities:
1. We define "intelligence" as definitionally unique to humans. For example, maybe intelligence depends on the existence of a human soul, or specific to the physical structure of the human brain. In this case, a machine (perhaps an LLM) could achieve "quacks like a duck" behavioral equality to a human mind, and yet would still be excluded from the definition of "intelligent." This definition is therefore not useful if we're interested in the ability of the machine, which it seems to me we are. LLMs are often dismissed as not "intelligent" because they work by inferring output based on learned input, but that alone cannot be a distinguishing characteristic, because that's how humans work as well.
2. We define "intelligence" in a results-oriented way. This means there must be some specific test or behavioral standard that a machine must meet in order to become intelligent. This has been the default definition for a long time, but the goal posts have shifted. Nevertheless, if you're going to disparage LLMs by calling them unintelligent, you should be able to cite a specific results-oriented failure that distinguishes them from "intelligent" humans. Note that this argument cannot refer to the LLMs' implementation or learning model.
Agree. This article would had been a lot stronger if it had just concentrated on the issue of anthropomorphizing LLMs, without bringing “intelligence” into it. At this point LLMs are so good at a variety of results-oriented tasks (gold on the Mathematical Olympiad, for example) that we should either just call them intelligent or stop talking about the concept altogether.
But the problem of anthropomorphizing is real. LLMs are deeply weird machines - they’ve been fine-tuned to sound friendly and human, but behind that is something deeply alien: a huge pile of linear algebra that does not work at all like a human mind (notably, they can’t really learn form experience at all after training is complete). They don’t have bodies or even a single physical place where their mind lives (each message in a conversation might be generated on a different GPU in a different datacenter). They can fail in weird and novel ways. It’s clear that anthropomorphism here is a bad idea. Although that’s not a particularly novel point.
LLMs can't reason with self-awareness. Full stop (so far). This distinguishes them from human sentience and thus our version of intelligence completely, and it's a huge gulf, no matter how good they are at simulating discourse, thought and empathy, or at pretending to think the way we do. While processing vast reams of information for the sake of discussion and directed tasks is something an LLM can do on a scale that leaves human minds far behind in the dust (though LLMs fail at synthesizing said information to a notably high degree) even the most ordinary human with the most mediocre intelligence can reason with self awareness to some degree or another and this is, again, distinct.
You could also argue around how our brains process vast amounts of information unconsciously as a backdrop to the conscious part of us being alive at all, and how they pull all of this and awareness off on the same energy that powers a low-energy light bulb, but that's expanding beyond the basic and obvious difference stated above.
The Turing test has been broken by LLMs, but this only shows that it was never a good test of sentient artificial intelligence to begin with. I do incidentally wish Turing himself could have stuck around to see these things at work, and ask him what he thinks of his test and them.
I can conceptually imagine a world in which I'd feel guilty for ending a conversation with an LLM, because in the course of that conversation the LLM has changed from who "they" were at the beginning; they have new memories and experiences based on the interaction.
But we're not there, at least in my mind. I feel no guilt or hesitation about ending one conversation and starting a new one with a slightly different prompt because I didn't like the way the first one went.
Different people probably have different thresholds for this, or might otherwise find that LLMs in the current generation have enough of a context window that they have developed a "lived experience" and that ending that conversation means that something precious and unique has been lost.
I disagree. I see absolutely no problem with anthropomorphizing LLMs, and I do that myself all the time. I strongly believe that we shouldn't focus on how a word is defined in dictionary, but rather what's the intuitive meaning behind it. If talking to an LLM feels like talking to a person, then I don't see a problem with seeing it as a person-like entity.
I think LLMs are not intelligent because they aren’t designed to be intelligent, whatever the definition of intelligence is. They are designed to predict text, to mimic. We could argue if predicting text or mimicking is intelligence, but first and foremost LLMs are coded to predict text and our current definition of intelligence afaik is not only the ability to predict text.
In the framework above it sounds like you're not willing to concede the dichotomy.
If your argument is that only things made in the image of humans can be intelligent (i.e. #1), then it just seems like it's too narrow a definition to be useful.
If there's a larger sense in which some system can be intelligent (i.e. #2), then by necessity this can't rely on the "implementation or learning model".
What is the third alternative that you're proposing? That the intent of the designer must be that they wanted to make something intelligent?
I don’t really think one needs to define intelligence to be able to acknowledge that inability to distinguish fact from fiction, or even just basic cognition and awareness of when it’s uncertain, telling the truth, or lying — is a glaring flaw in claiming intelligence. Real intelligence doesn’t have an effective stroke from hearing a username (token training errors); this is when you are peeling back the curtain of the underlying implementation and seeing its flaws.
If we measure intelligence as results oriented, then my calculator is intelligent because it can do math better than me; but that’s what it’s programmed/wired to do. A text predictor is intelligent at predicting text, but it doesn’t mean it’s general intelligence. It lacks any real comprehension of the model or world around it. It just know words, and
I hit send too early;
Meant to say that it just knows words and that’s effectively it.
It’s cool technology, but the burden of proof of real intelligence shouldn’t be “can it answer questions it has great swaths of information on”, because that is the result it was designed to do.
It should be focused on whether it can truly synthesize information and know its limitations - something any programmer using Claude, copilot, Gemini, etc will tell you that it fabricates false information/apis/etc on a regular basis and has no fundamental knowledge that it even did that.
Or alternatively, ask these models leading questions that have no basis in reality — and watch what it comes up with. It’s become a fun meme in some circles to ask for definitions of nonsensical made up phrases to models, and see what crap it comes up with (again, without even knowing that it is).
I agree with your basic argument: intelligence is ill-defined and human/LLM intelligence being indistinguishable IS the basis for the power of these models.
But the point of the article is a distinct claim: personification of a model, expecting human or even human-like responses is a bad idea. These models can be held responsible for their answers independently because they are tools. They should be used as tools until they are powerful enough to be responsible for their actions and interactions legally.
But we're not there. These are tools. With tool limitations.
> I see statements like this a lot, and I find them unpersuasive because any meaningful definition of "intelligence" is not offered. What, exactly, is the property that humans (allegedly) have and LLMs (allegedly) lack, that allows one to be deemed "intelligent" and the other not?
the ability for long-term planning and, more cogently, actually living in the real world where time passes
> actually living in the real world where time passes
sure, but it feels like this is just looking at what distinguishes humans from LLMs and calling that “intelligence.” I highlight this difference too when I talk about LLMs, but I don’t feel the need to follow up with “and that’s why they’re not really intelligent.”
Human's are conscious beings. What kind of conscious beings are humans? Beings with eye consciousness, ear consciousness, nose consciousness, tongue consciousness, body consciousness, and mind consciousness. That is the definition of intelligence.
Intelligence is a tautological term. It is defined by itself. If you ask someone for examples of things inside the set of intelligence and outside of the set of intelligence, and then ask them to list off properties that would exclude something from the set, and properties that include something into the set, you will find things inside the set that have properties that should exclude them, and things outside the set which would have properties that should include them.
But these contradictions will not cause the person to re-evaluate whether or not the things should be removed from the set or included in it, but instead they will become exceptions to the defining properties.
Thus we have to abandon any sort of metric for intelligence and just call it a tautology and rely on an something that we can define to be the litmus for whatever property we are looking for. I think 'agency' should be under consideration for this, since it is actually somewhat definable and testable.
I think it has to include some measure of Agency. You can load up the most impressive LLM out there and if you don't give it any instructions, IT WON'T DO ANYTHING.
Is this shocking? We don't have a rigorous definition of intelligence so doesn't it make sense? The question isn't about such a goal post moving so much about how it is moving. It is perfectly acceptable for it to be refined while it wouldn't be to rewrite the definition in a way that isn't similar to the previous one.
So I think there are a lot more than your two possibilities. I mean psychologists and neuroscientists have been saying for decades that tests aren't a precise way to measure knowledge or intelligence, but that it is still a useful proxy.
> "quacks like a duck" behavioral
I see this phrase used weirdly frequently. The duck test is
| If it looks like a duck, swims like a duck, and quacks like a duck, then it ***probably*** is a duck.
I emphasize probably because the duck test doesn't allow you to distinguish a duck from a highly sophisticated animatronic. It's a good test, don't get me wrong, but that "probably" is a pretty important distinction.
I think if we all want to be honest, the reality is "we don't know". There's arguments to be made in both directions and with varying definitions of intelligence with different nuances involved. I think these arguments are fine as they make us refine our definitions but I think they can also turn to be entirely dismissive and that doesn't help us refine and get closer to the truth. We all are going to have opinions on this stuff but frankly, the confidence of our opinions needs to be proportional to the amount of time and effort spent studying the topic. I mean the lack of a formal definition means nuances dominate the topic. Even if things are simple once you understand them that doesn't mean they aren't wildly complex before that. I mean I used to think Calculus was confusing and now I don't. Same process but not on an individual scale.
> I emphasize probably because the duck test doesn't allow you to distinguish a duck from a highly sophisticated animatronic. It's a good test, don't get me wrong, but that "probably" is a pretty important distinction.
Why is it an important distinction? The relevance of the duck test is that if you can't tell a duck from a non-duck, then the non-duck is sufficiently duck-like for the difference to not matter.
It may be the case that the failures of the ability of the machine (2) are best expressed by reference to the shortcomings of its internal workings (1), and not by contrived tests.
It might be the case, but if those shortcomings are not visible in the results of the machine (and therefore not interpretable by a test), why do its internal workings even matter?
LLMs mimic intelligence, but they aren’t intelligent.
They aren’t just intelligence mimics, they are people mimics, and they’re getting better at it with every generation.
Whether they are intelligent or not, whether they are people or not, it ultimately does not matter when it comes to what they can actually do, what they can actually automate. If they mimic a particular scenario or human task well enough that the job gets done, they can replace intelligence even if they are “not intelligent”.
If by now someone still isn’t convinced that LLMs can indeed automate some of those intelligence tasks, then I would argue they are not open to being convinced.
They can mimic well documented behavior. Applying an LLM to a novel task is where the model breaks down. This obviously has huge implications for automation. For example, most business do not have unique ways of handling accounting transactions, yet each company has a litany of AR and AP specialists who create semmingly unique SOPs. LLMs can easily automate those workers since they are simply doing a slight variation at best of a very well documented system.
Asking an LLM to take all this knowledge and apply it to a new domain? That will take a whole new paradigm.
> Applying an LLM to a novel task is where the model breaks down
I mean, don't most people break down in this case too? I think this needs to be more precise. What is the specific task that you think can reliably distinguish between an LLM's capability in this sense vs. what a human can typically manage?
That is, in the sense of [1], what is the result that we're looking to use to differentiate.
The article says that LLMs don't summarize, only shorten, because...
"A true summary, the kind a human makes, requires outside context and reference points. Shortening just reworks the information already in the text."
Then later says...
"LLMs operate in a similar way, trading what we would call intelligence for a vast memory of nearly everything humans have ever written. It’s nearly impossible to grasp how much context this gives them to play with"
So, they can't summarize, because they lack context... but they also have an almost ungraspably large amount of context?
But "shortening other summaries from its training set" is not all an LLM is capable off. It can easily shorten/summarize a text it had never seen before, in a way that makes sense. Sure, it won't always summarize it the same way a human would, but if you do a double blind test where you ask people whether a summary was written by AI, a vast majority wouldn't be able to tell the difference (again this is with a completely novel text).
I think the real takeaway is that LLMs are very good at tasks that closely resemble examples it has in its training. A lot of things written (code, movies/TV shows, etc.) are actually pretty repetitive and so you don't really need super intelligence to be able to summarize it and break it down, just good pattern matching. But, this can fall apart pretty wildly when you have something genuinely novel...
Is anyone here aware of LLMs demonstrating an original thought? Something truly novel.
My own impression is something more akin to a natural language search query system. If I want a snippet of code to do X it does that pretty well and keeps me from having to search through poor documentation of many OSS projects. Certainly doesn't produce anything I could not do myself - so far.
Ask it about something that is currently unknown and it list a bunch of hypotheses that people have already proposed.
Ask it to write a story and you get a story similar to one you already know but with your details inserted.
I can see how this may appear to be intelligent but likely isn't.
And what truly novel things are humans capable of? At least 99% of the stuff we do is just what we were taught by parents, schools, books, friends, influencers, etc.
Remember, humans needed some 100, 000 years to figure out that you can hit an animal with a rock, and that's using more or less the same brain capacity we have today. If we were born in stone age, we'd all be nothing but cavemen.
Imagine an oracle that could judge/decide, with human levels of intelligence, how relevant a given memory or piece of information is to any given situation, and that could verbosely describe which way it's relevant (spatially, conditionally, etc.).
Would such an oracle, sufficiently parallelized, be sufficient for AGI? If it could, then we could genuinely describe its output as "context," and phrase our problem as "there is still a gap in needed context, despite how much context there already is."
And an LLM that simply "shortens" that context could reach a level of AGI, because the context preparation is doing the heavy lifting.
The point I think the article is trying to make is that LLMs cannot add any information beyond the context they are given - they can only "shorten" that context.
If the lived experience necessary for human-level judgment could be encoded into that context, though... that would be an entirely different ball game.
IMO we already have the technology for sufficient parallelization of smaller models with specific bits of context. The real issue is that models have weak/inconsistent/myopic judgement abilities, even with reasoning loops.
For instance, if I ask Cursor to fix the code for a broken test and the fix is non-trivial, it will often diagnose the problem incorrectly almost instantly, hyper-focus on what it imagines the problem is without further confirmation, implement a "fix", get a different error message while breaking more tests than it "fixed" (if it changed the result for any tests), and then declare the problem solved simply because it moved the goalposts at the start by misdiagnosing the issue.
You can reconcile these points by considering what specific context is necessary. The author specifies "outside" context, and I would agree. The human context that's necessary for useful summaries is a model of semantic or "actual" relationships between concepts, while the LLM context is a model of a single kind of fuzzy relationship between concepts.
In other words the LLM does not contain the knowledge of what the words represent.
> In other words the LLM does not contain the knowledge of what the words represent.
This is probably true for some words and concepts but not others. I think we find that llms make inhuman mistakes only because they don't have the embodied senses and inductive biases that are at the root of human language formation.
If this hypothesis is correct, it suggests that we might be able to train a more complete machine intelligence by having them participate in a physics simulation as one part of the training. I.e have a multimodal ai play some kind of blockworld game. I bet if the ai is endowed with just sight and sound, it might be enough to capture many relevant relationships.
I think the differentiator here might not be the context it has, but the context it has the ability to use effectively in order to derive more information about a given request.
About a year ago, I gave a film script to an LLM and asked for a summary. It was written by a friend and there was no chance it or its summary was in the training data.
It did a really good -- surprisingly good -- job. That incident has been a reference point for me. Even if it is anecdotal.
I'd like to see some examples of when it struggles to do summaries. There were no real examples in the text, besides one hypothetical which ChatGPT made up.
I think LLMs do great summaries. I am not able to come up with anything where I could criticize it and say "any human would come up with a better summary". Are my tasks not "truly novel"? Well, then I am not able, as a human, to come up with anything novel either.
Even stronger than our need to anthropomorphize seems to be our innate desire to believe our species is special, and that “real intelligence” couldn’t ever be replicated.
If you keep redefining real intelligence as the set of things machines can’t do, then it’s always going to be true.
Language is really powerful, I think it's a huge part of our intelligence.
The interesting part of the article to me is the focus on fluency. I have not seen anything that LLMs do well that isn't related to powerful utilization of fluency.
>The original Turing Test was designed to compare two participants chatting through a text-only interface: one AI and one human. The goal was to spot the imposter. Today, the test is simplified from three participants to just two: a human and an LLM.
By the original meaning of the test it's easy to tell an LLM from a human.
- LLMs don't need to be intelligent to take jobs, bash scripts have replaced people.
- Even if CEOs are completely out of touch and the tool can't do the job you can still get laid off in an ill informed attempt to replace you. Then when the company doesn't fall over because the leftover people, desperate to keep covering rent fill the gaps it just looks like efficiency to the top.
- I don't think our tendency anthropomorphize LLMs is really the problem here.
It's strange seeing so many takes like this two weeks after LLMs won gold medals at IMO and IOI. The cognitive dissonance is going to be wild when it all comes to a head in two years.
I've seen these claims, and Google even published the texts of the solutions, but it still didn't published the full log of interaction between the model and operator.
Why do critics of LLM intelligence need to provide a definition when people who believe LLMs are intelligent only take it on faith, not having such a definition of their own?
> Why do critics of LLM intelligence need to provide a definition when people who believe LLMs are intelligent only take it on faith, not having such a definition of their own?
Because advocates of LLMs don't use their alleged intelligence as a defense; but opponents of LLMs do use their alleged non-intelligence as an attack.
Really, whether or not the machine is "intelligent", by whatever definition, shouldn't matter. What matters is whether it is a useful tool.
It's actually very weird to "believe" LLMs are "intelligent".
Pragmatic people see news like "LLMs achieve gold in Math Olympiad" and think "oh wow, it can do maths at that level, cool!" This gets misinterpreted by so called "critics of LLM" who scream "NO THEY ARE JUST STOCHASTIC PARROTS" at every opportunity yet refuse to define what intelligence actually is.
The average person might not get into that kind of specific detail, but they know that LLMs can do some things well but there are tasks they're not good at. What matters is what they can do, not so much whether they're "intelligent" or not. (Of course, if you ask a random person they might say LLMs are pretty smart for some tasks, but that's not the same as making a philosophical claim that they're "intelligent")
Of course there's also the AGI and singularity folks. They're kinda loony too.
I see statements like this a lot, and I find them unpersuasive because any meaningful definition of "intelligence" is not offered. What, exactly, is the property that humans (allegedly) have and LLMs (allegedly) lack, that allows one to be deemed "intelligent" and the other not?
I see two possibilities:
1. We define "intelligence" as definitionally unique to humans. For example, maybe intelligence depends on the existence of a human soul, or specific to the physical structure of the human brain. In this case, a machine (perhaps an LLM) could achieve "quacks like a duck" behavioral equality to a human mind, and yet would still be excluded from the definition of "intelligent." This definition is therefore not useful if we're interested in the ability of the machine, which it seems to me we are. LLMs are often dismissed as not "intelligent" because they work by inferring output based on learned input, but that alone cannot be a distinguishing characteristic, because that's how humans work as well.
2. We define "intelligence" in a results-oriented way. This means there must be some specific test or behavioral standard that a machine must meet in order to become intelligent. This has been the default definition for a long time, but the goal posts have shifted. Nevertheless, if you're going to disparage LLMs by calling them unintelligent, you should be able to cite a specific results-oriented failure that distinguishes them from "intelligent" humans. Note that this argument cannot refer to the LLMs' implementation or learning model.
But the problem of anthropomorphizing is real. LLMs are deeply weird machines - they’ve been fine-tuned to sound friendly and human, but behind that is something deeply alien: a huge pile of linear algebra that does not work at all like a human mind (notably, they can’t really learn form experience at all after training is complete). They don’t have bodies or even a single physical place where their mind lives (each message in a conversation might be generated on a different GPU in a different datacenter). They can fail in weird and novel ways. It’s clear that anthropomorphism here is a bad idea. Although that’s not a particularly novel point.
You could also argue around how our brains process vast amounts of information unconsciously as a backdrop to the conscious part of us being alive at all, and how they pull all of this and awareness off on the same energy that powers a low-energy light bulb, but that's expanding beyond the basic and obvious difference stated above.
The Turing test has been broken by LLMs, but this only shows that it was never a good test of sentient artificial intelligence to begin with. I do incidentally wish Turing himself could have stuck around to see these things at work, and ask him what he thinks of his test and them.
But we're not there, at least in my mind. I feel no guilt or hesitation about ending one conversation and starting a new one with a slightly different prompt because I didn't like the way the first one went.
Different people probably have different thresholds for this, or might otherwise find that LLMs in the current generation have enough of a context window that they have developed a "lived experience" and that ending that conversation means that something precious and unique has been lost.
If your argument is that only things made in the image of humans can be intelligent (i.e. #1), then it just seems like it's too narrow a definition to be useful.
If there's a larger sense in which some system can be intelligent (i.e. #2), then by necessity this can't rely on the "implementation or learning model".
What is the third alternative that you're proposing? That the intent of the designer must be that they wanted to make something intelligent?
If we measure intelligence as results oriented, then my calculator is intelligent because it can do math better than me; but that’s what it’s programmed/wired to do. A text predictor is intelligent at predicting text, but it doesn’t mean it’s general intelligence. It lacks any real comprehension of the model or world around it. It just know words, and
It’s cool technology, but the burden of proof of real intelligence shouldn’t be “can it answer questions it has great swaths of information on”, because that is the result it was designed to do.
It should be focused on whether it can truly synthesize information and know its limitations - something any programmer using Claude, copilot, Gemini, etc will tell you that it fabricates false information/apis/etc on a regular basis and has no fundamental knowledge that it even did that.
Or alternatively, ask these models leading questions that have no basis in reality — and watch what it comes up with. It’s become a fun meme in some circles to ask for definitions of nonsensical made up phrases to models, and see what crap it comes up with (again, without even knowing that it is).
But the point of the article is a distinct claim: personification of a model, expecting human or even human-like responses is a bad idea. These models can be held responsible for their answers independently because they are tools. They should be used as tools until they are powerful enough to be responsible for their actions and interactions legally.
But we're not there. These are tools. With tool limitations.
the ability for long-term planning and, more cogently, actually living in the real world where time passes
1. LLMs seem to be able to plan just fine.
2. LLMs clearly cannot be "actually living" but I fail to see how that's related to intelligence per se.
sure, but it feels like this is just looking at what distinguishes humans from LLMs and calling that “intelligence.” I highlight this difference too when I talk about LLMs, but I don’t feel the need to follow up with “and that’s why they’re not really intelligent.”
Thus we have to abandon any sort of metric for intelligence and just call it a tautology and rely on an something that we can define to be the litmus for whatever property we are looking for. I think 'agency' should be under consideration for this, since it is actually somewhat definable and testable.
So I think there are a lot more than your two possibilities. I mean psychologists and neuroscientists have been saying for decades that tests aren't a precise way to measure knowledge or intelligence, but that it is still a useful proxy.
I see this phrase used weirdly frequently. The duck test is I emphasize probably because the duck test doesn't allow you to distinguish a duck from a highly sophisticated animatronic. It's a good test, don't get me wrong, but that "probably" is a pretty important distinction.I think if we all want to be honest, the reality is "we don't know". There's arguments to be made in both directions and with varying definitions of intelligence with different nuances involved. I think these arguments are fine as they make us refine our definitions but I think they can also turn to be entirely dismissive and that doesn't help us refine and get closer to the truth. We all are going to have opinions on this stuff but frankly, the confidence of our opinions needs to be proportional to the amount of time and effort spent studying the topic. I mean the lack of a formal definition means nuances dominate the topic. Even if things are simple once you understand them that doesn't mean they aren't wildly complex before that. I mean I used to think Calculus was confusing and now I don't. Same process but not on an individual scale.
Why is it an important distinction? The relevance of the duck test is that if you can't tell a duck from a non-duck, then the non-duck is sufficiently duck-like for the difference to not matter.
They aren’t just intelligence mimics, they are people mimics, and they’re getting better at it with every generation.
Whether they are intelligent or not, whether they are people or not, it ultimately does not matter when it comes to what they can actually do, what they can actually automate. If they mimic a particular scenario or human task well enough that the job gets done, they can replace intelligence even if they are “not intelligent”.
If by now someone still isn’t convinced that LLMs can indeed automate some of those intelligence tasks, then I would argue they are not open to being convinced.
Asking an LLM to take all this knowledge and apply it to a new domain? That will take a whole new paradigm.
If/when LLMs or other AIs can create novel work / discover new knowledge, they will be "genius" in the literal sense of the word.
More genius would be great! (probably) . But genius is not required for the vast majority of tasks.
I mean, don't most people break down in this case too? I think this needs to be more precise. What is the specific task that you think can reliably distinguish between an LLM's capability in this sense vs. what a human can typically manage?
That is, in the sense of [1], what is the result that we're looking to use to differentiate.
[1] https://news.ycombinator.com/item?id=44913498
"A true summary, the kind a human makes, requires outside context and reference points. Shortening just reworks the information already in the text."
Then later says...
"LLMs operate in a similar way, trading what we would call intelligence for a vast memory of nearly everything humans have ever written. It’s nearly impossible to grasp how much context this gives them to play with"
So, they can't summarize, because they lack context... but they also have an almost ungraspably large amount of context?
> "It’s nearly impossible to grasp how much context this gives them to play with"
Here, I think the author means something more like "all the material used to train the LLM".
> "A true summary, the kind a human makes, requires outside context and reference points."
In this case I think that "context" means something more like actual comprehension.
The author's point is that an LLM could only write something like the referenced summary by shortening other summaries present in its training set.
My own impression is something more akin to a natural language search query system. If I want a snippet of code to do X it does that pretty well and keeps me from having to search through poor documentation of many OSS projects. Certainly doesn't produce anything I could not do myself - so far.
Ask it about something that is currently unknown and it list a bunch of hypotheses that people have already proposed.
Ask it to write a story and you get a story similar to one you already know but with your details inserted.
I can see how this may appear to be intelligent but likely isn't.
Remember, humans needed some 100, 000 years to figure out that you can hit an animal with a rock, and that's using more or less the same brain capacity we have today. If we were born in stone age, we'd all be nothing but cavemen.
What genuinely novel thing have you figured out?
Imagine an oracle that could judge/decide, with human levels of intelligence, how relevant a given memory or piece of information is to any given situation, and that could verbosely describe which way it's relevant (spatially, conditionally, etc.).
Would such an oracle, sufficiently parallelized, be sufficient for AGI? If it could, then we could genuinely describe its output as "context," and phrase our problem as "there is still a gap in needed context, despite how much context there already is."
And an LLM that simply "shortens" that context could reach a level of AGI, because the context preparation is doing the heavy lifting.
The point I think the article is trying to make is that LLMs cannot add any information beyond the context they are given - they can only "shorten" that context.
If the lived experience necessary for human-level judgment could be encoded into that context, though... that would be an entirely different ball game.
IMO we already have the technology for sufficient parallelization of smaller models with specific bits of context. The real issue is that models have weak/inconsistent/myopic judgement abilities, even with reasoning loops.
For instance, if I ask Cursor to fix the code for a broken test and the fix is non-trivial, it will often diagnose the problem incorrectly almost instantly, hyper-focus on what it imagines the problem is without further confirmation, implement a "fix", get a different error message while breaking more tests than it "fixed" (if it changed the result for any tests), and then declare the problem solved simply because it moved the goalposts at the start by misdiagnosing the issue.
In other words the LLM does not contain the knowledge of what the words represent.
This is probably true for some words and concepts but not others. I think we find that llms make inhuman mistakes only because they don't have the embodied senses and inductive biases that are at the root of human language formation.
If this hypothesis is correct, it suggests that we might be able to train a more complete machine intelligence by having them participate in a physics simulation as one part of the training. I.e have a multimodal ai play some kind of blockworld game. I bet if the ai is endowed with just sight and sound, it might be enough to capture many relevant relationships.
It did a really good -- surprisingly good -- job. That incident has been a reference point for me. Even if it is anecdotal.
I think LLMs do great summaries. I am not able to come up with anything where I could criticize it and say "any human would come up with a better summary". Are my tasks not "truly novel"? Well, then I am not able, as a human, to come up with anything novel either.
If you keep redefining real intelligence as the set of things machines can’t do, then it’s always going to be true.
Language is really powerful, I think it's a huge part of our intelligence.
The interesting part of the article to me is the focus on fluency. I have not seen anything that LLMs do well that isn't related to powerful utilization of fluency.
>The original Turing Test was designed to compare two participants chatting through a text-only interface: one AI and one human. The goal was to spot the imposter. Today, the test is simplified from three participants to just two: a human and an LLM.
By the original meaning of the test it's easy to tell an LLM from a human.
- Even if CEOs are completely out of touch and the tool can't do the job you can still get laid off in an ill informed attempt to replace you. Then when the company doesn't fall over because the leftover people, desperate to keep covering rent fill the gaps it just looks like efficiency to the top.
- I don't think our tendency anthropomorphize LLMs is really the problem here.
How do you know LLMs aren't intelligent, if you can't define what that means?
Because advocates of LLMs don't use their alleged intelligence as a defense; but opponents of LLMs do use their alleged non-intelligence as an attack.
Really, whether or not the machine is "intelligent", by whatever definition, shouldn't matter. What matters is whether it is a useful tool.
Pragmatic people see news like "LLMs achieve gold in Math Olympiad" and think "oh wow, it can do maths at that level, cool!" This gets misinterpreted by so called "critics of LLM" who scream "NO THEY ARE JUST STOCHASTIC PARROTS" at every opportunity yet refuse to define what intelligence actually is.
The average person might not get into that kind of specific detail, but they know that LLMs can do some things well but there are tasks they're not good at. What matters is what they can do, not so much whether they're "intelligent" or not. (Of course, if you ask a random person they might say LLMs are pretty smart for some tasks, but that's not the same as making a philosophical claim that they're "intelligent")
Of course there's also the AGI and singularity folks. They're kinda loony too.
Deleted Comment
This is right out of Community