Again with the sloppy language. Claude 3 responses are different depending on context of questioning and can be seen to reflect a distinction between use for purpose and test use? Well thats a damn fine LL. Well done. Is it inferring or suspecting? No.
It's not "suspecting" anything except in the ridiculous, sloppy use of language, analogy ridden way AI proponents like to state things.
"suspect" had a very good english language meaning tied to intelligence and thinking. Polluting the meaning by applying it to a complex software system strongly implies belief the software IS thinking. It's not scientifically observationally neutral language.
If I was in peer review in this field, I'd fail any paper which wrote this kind of thing.
Claude 3 Opus can be used by people to distinguish use between test/benchmarking, and intentional use. Naieve use of Claude 3 for benchmarking may be led astray because the LLM appears to behave differently in each case. This is interesting. It's not a sign of emergent intelligence, its refinement of meaning in the NLP of the questions and their context.
I must say I don't understand your stance against extending the use of language to machines. Are you also arguing that "flying" should be reserved for birds and cannot be meaningfully applied to airplanes?
No, i am not. Flying doesn't impart meaning of intentional behaviour. It's distance carried against height lost or gained pure and simple. Claim, believes, states, hallucinated, states, says argues all go to intelligence not just the input output situation. Maybe most unintentional flight is gliding, or controlled descent to ground. That said, can a dumb system keep a Kite aloft? I believe yes. No Ai needed.
To me there is a clear difference. Ai language is not as direct as flying.
Interesting enough, all flying machines fly - that means both airplanes and lighter than air stuff.
But ships and submarines don't swim. Although submarines do dive.
Coming back from the finer points of the english language, the LLM in question is likely to be trained on text discussing LLM evaluation. So it ended up generating something about it.
You LLM "activists" are doing yourselves a disservice speaking about them in religious tones. You lose some credibility.
I am unhappy with ML as words but ML as an acronym imparts less concern. Goal seeking has been in the vocab for a long time, and usually does impart a sense of intentionality but it's coded in, algorithmically. the mechanistic approaches to goal seeking are givens, the code is an implementation of them.
In LLM we're being taken to "the intentionality emerges, it wasn't coded in" Which sould be disputed, but it's how I see people inferring "behaviour" in this space.
Agents yes, I do have problems with, having just written of them in a different but related thread (different HN story) I begin to suspect the seeds of this linguistic problem lie deep. They were there in "clippy" and the personification of the system.
Emergent awareness is cool, but it seems Claude Opus has imperfect recall at the 25k token mark, and has lossy recall well before the half-payload mark (100k)?
I saw a comment (not in the linked thread) that notes that the system prompt probably includes something along the lines of "You are a helpful AI assistant...", which does feel like a bit of a giveaway in the sense that there are many documents on the internet that discuss testing AI assistants and therefore a document who's topic is an AI assistant is likely to contain discussion of testing the assistant (and if the document is written from a first person perspective, of the assistant, then it's likely to contain text from the perspective of the assistant discussing it's own testing).
Good job on hyping it but it really lose respect for Claude for insulting our intelligence by saying this stuff. The issue here is that he does not say if the “randomness” setting is zero, if not you are gonna get all sorts of responses and depending on the pre prompt on how it is supposed to answer you
It's not "suspecting" anything except in the ridiculous, sloppy use of language, analogy ridden way AI proponents like to state things.
"suspect" had a very good english language meaning tied to intelligence and thinking. Polluting the meaning by applying it to a complex software system strongly implies belief the software IS thinking. It's not scientifically observationally neutral language.
If I was in peer review in this field, I'd fail any paper which wrote this kind of thing.
Claude 3 Opus can be used by people to distinguish use between test/benchmarking, and intentional use. Naieve use of Claude 3 for benchmarking may be led astray because the LLM appears to behave differently in each case. This is interesting. It's not a sign of emergent intelligence, its refinement of meaning in the NLP of the questions and their context.
To me there is a clear difference. Ai language is not as direct as flying.
But ships and submarines don't swim. Although submarines do dive.
Coming back from the finer points of the english language, the LLM in question is likely to be trained on text discussing LLM evaluation. So it ended up generating something about it.
You LLM "activists" are doing yourselves a disservice speaking about them in religious tones. You lose some credibility.
Did the Chinese Room suspect anything?
In LLM we're being taken to "the intentionality emerges, it wasn't coded in" Which sould be disputed, but it's how I see people inferring "behaviour" in this space.
Agents yes, I do have problems with, having just written of them in a different but related thread (different HN story) I begin to suspect the seeds of this linguistic problem lie deep. They were there in "clippy" and the personification of the system.
For reference, GPT4-Turbo has perfect recall to 64k and drops pretty badly past that until 128k: https://twitter.com/GregKamradt/status/1722386725635580292
Deleted Comment