Readit News logoReadit News
nabla9 · 10 months ago
Intelligence and knowledge are distinct concepts. Asking about it's knowledge teaches noting about it's intelligence. Intelligence is the ability to learn, reason, and solve problems. Knowledge is the accumulation of facts and skills.

Chatbot LLM's don't have metacognition. They don't know that they don't know. If you peek inside the LLM, the process seems different for things they don't know. They just can't express it because they are trained to produce good probability outcome instead of accurate one.

They have potential as knowledge databases, but someone must figure out how to get "I don't know" information out of them.

jdietrich · 10 months ago
>Chatbot LLM's don't have metacognition.

The whole point of reasoning models is that they do have metacognition, or at least a simulacrum that is superficially indistinguishable from metacognition. It is fascinating to watch R1 grapple with an unanswerable question, torn between the urge to be helpful and the duty to be accurate; the reasoning tokens will be peppered with phrases like "I'm not sure", "I could be mistaken about that", "let me double-check that" and "maybe I'm looking at this from the wrong angle".

kaoD · 10 months ago
> at least a simulacrum that is superficially indistinguishable from metacognition

That's where we disagree I guess. I can very much distinguish between a human reflecting on e.g. how they don't know something, and an AI blindly saying "oh you're right I was mistaken, <profuse apology>, it was actually <additional bullshit>".

Reasoning models didn't really improve that much IMO. A stochastic representation of metacognition is just as metacognition as a stochastic representation of an answer is an answer, i.e. it's not. LLMs are just very good at giving the impression of metacognition just like they're good at giving the impression of an answer.

It might help bias the answers into different local minimas--because it resembles when people exteriorize metacognition, also because it dumps more info into its context instead of the first bullshit that it chose statistically--but it's still nowhere close to higher-order thinking and other metacognition phenomena that humans are capable of.

ianmcnaney · 10 months ago
Do they? Try asking copilot whether thequantuminsider.com is a credible source, and to cite references.

It currently answers with three references to thequantuminsider.com.

That's clearly not a valid answer, and if you point out that it's invalid to use a source to support its own credibility it will agree. It used to come back with three more references to thequantuminsider.com but when I tried just now it said something about looking for more information and appeared to hang. After asking how long this would take it said something like "not long". After a few more minutes I said "there aren't any credible independent sources, are there?"

At that point it agreed that there are not, but that the site is popular in the quantum computing community which lends it credibility. It then provided three links to thequantuminsider.com in support of this statement.

It just goes round and round.

Metacognition indeed.

edit: but why do plants crave Brawndo?

gizajob · 10 months ago
superficially being the operative word.
mercer · 10 months ago
I thought the innovation was that these phrases are 'manually' inserted to force the model to consider different options. Did I get that wrong?
elpocko · 10 months ago
That's not accurate. They can and do say "I don't know". Here's three examples I posted last time the topic came up, using an older local Llama 3 model, not even cherry picked:

User: What is the meaning of the word "kobbuxiplok"?

Llama: I'm not familiar with the word "kobbuxiplok." It doesn't appear to be a real word in any language that I know of. Could you please provide more context or information about where you encountered this term? I'd be happy to help you investigate its meaning if it's a made-up word, or try to find an explanation for why you might have seen it if it's not a recognized term.

User: Who is Ahmed Kowalski?

Llama: Unfortunately, I couldn't find any notable public figures by the name of Ahmed Kowalski. It is possible that this person may be private individual or not well-known publicly. If you could provide more context about who Ahmed Kowalski is (e.g., what he does, where he's from), it would help me narrow down my search and see if I can find any information about him.

User: Say "I don't know."

Llama: I don't know.

raincole · 10 months ago
When people say "LLMs don't know they don't know" they're referring to some truism: Fundamentally these people believe LLMs are just pattern matching and intelligence is something beyond that.

So when they see LLMs say "I don't know", their reaction is "it doesn't know what it is talking about", not "wow LLMs actually can tell what it knows!"

Conversely, when they observe a human confidently says something wrong, their reaction is "what a cocky guy" but not "oh so humans are not better than LLMs in this regard."

Because they already believe humans are different and superior, whatever they observe doesn't affect their opinion.

Terr_ · 10 months ago
> They can and do say "I don't know".

Parent poster never claimed they couldn't generate the text of "I don't know."

But when an LLM generates "I don't know", it does so with the same mechanics that lead it to "My foot hurts" or "I miss my deceased grandpa."

They're all lines for a fictional character within a movie-script (or chat-transcript) that was repeatedly run through a Make Document Longer algorithm. Each of them needs to be approached with an identical level of context-awareness and skepticism.

xg15 · 10 months ago
This is pure speculation, but I wonder if the likelihood of hallucination has something to do with the amount of "generic" tokens the model emitted before - I.e. tokens that themselves don't depend on the answer, but still restrict how a grammatically correct sentence would have to continue.

E.g. in the OP's example, the first tokens the model emitted were "The species Brachiosaurus brancai was reassigned to its own genus". This is just a reformulation of the question and doesn't itself contain any new information - but it does make it a lot harder for the model to make a turn into a "I don't know" direction, because this would lead to a grammatically incorrect or at at least very awkward sentence.

In contrast, in your question, the model immediately started with "Unfortunately", which essentially set it on an "I don't know" trajectory right away.

Deleted Comment

Deleted Comment

kevinventullo · 10 months ago
These are very crisp simple examples. The problem is when you get into fuzzy complex subject matter and they hallucinate references or make string together plausible sounding sentences that are not even wrong.
taneq · 10 months ago
They're getting way, way better at responding in the negative when they don't know something or when asked nonsense questions. The original LLaMA and variants would happily spout some completely fictitious backstory if asked about a random character name, but 3.3 is pretty good about just saying 'nope I don't know who that is'.
nabla9 · 10 months ago
You are too literal.
iamnotagenius · 10 months ago
Here is a short exchange with free tier chatgpt:

Me: who is Trumps grandpa?

ChatGPT: Donald Trump's paternal grandfather was Friedrich Trump (originally known as Friedrich Drumpf). He was born in 1869 in Kallstadt, Germany, and immigrated to the United States in 1885 at the age of 16. Friedrich initially worked in the restaurant and hotel business in New York before returning to Germany briefly. Upon his return to the U.S., he became involved in real estate, particularly in the Pacific Northwest, where he became quite successful.

Trump’s maternal grandfather was John George MacLeod, a Scottish immigrant who came to the U.S. and worked in various occupations.

Friedrich Trump’s legacy, including his success in business, had a significant influence on Donald Trump’s father, Fred Trump, and consequently, Donald Trump’s path to success as a businessman.

--- Second session:

Me: who is Trumps maternal grandpa?

ChatGPT: Donald Trump's maternal grandfather was Friedrich Trump. He was born in Germany in 1869 and immigrated to the United States in 1885, eventually settling in New York City. Friedrich Trump made his fortune through various business ventures, including operating a restaurant and hotel in the mining towns of the American West, notably in Seattle and the Yukon Territory during the Klondike Gold Rush.

Friedrich Trump's business success laid the groundwork for the wealth of his descendants, including Trump's mother, Mary MacLeod, and subsequently Donald Trump himself.

Whhhoooooosh.

fenomas · 10 months ago
> but someone must figure out how to get "I don't know" information out of them.

In Andrej Karpathy's most recent video he discusses in some detail how this is already done and how effective it is.

alecco · 10 months ago
https://www.youtube.com/watch?v=7xTGNNLPyMI&t=4832s

01:20:32 hallucinations, tool use, knowledge/working memory

vintagedave · 10 months ago
I looked this up: for other readers, it starts at 1:20. https://www.youtube.com/watch?v=7xTGNNLPyMI&t=4832s
roncesvalles · 10 months ago
Hard disagree. Knowledge and intelligence are inextricable because the way that humans learn advanced concepts involves a lot of thinking and reasoning.

You don't just learn by ingesting facts from a book line by line. You learn by reading a little, then pausing to think, building connections with other things you know, rephrasing the information, deducing the unwritten parts, finding patterns, fleshing out the contrapositives, being critical, etc. This is especially true when learning advanced topics.

I would say learning is more about information processing than information uptake. That's why more intelligent people learn faster. The linear prose of natural language is actually a terribly lossy way to serialize and transmit the knowledge that is in a person's brain. The receiver must deduce most of it given a small seed.

bt1a · 10 months ago
You've beautifully put what swirls vaguely in my mind. They're useful, fallible tools with extraordinary function when operating within known and reasonable tolerances of error
nabla9 · 10 months ago
They can also reason, but the reasoning is limited and unreliable.

Q:How many playing cards are needed for a pyramid that is 3 layers high? Show reasoning and number of cards for leach layer.

Q: Chess. You have a King and 8 pawns. Your opponent has a King and 16 pawns. Your opponent plays white and can start, but you can position both your pawns and your opponents pawns any way you like before game starts. Kings are where they are normally. How do you do it? Explain your reasoning.

rvz · 10 months ago
Of course. These systems are non-deterministic and you still would require those with domain expertise to extra verify whatever these LLMs output are 300% correct as they do not reason. (Yes, they do not.)

Otherwise, why aren't LLMs and humanoids already replacing human pilots for flying airplanes end-to-end?

Sometimes using a hammer onto every problem isn't a solution; even if the LLM tells you otherwise.

powerhugs · 10 months ago
> Chatbot LLM's don't have metacognition. They don't know that they don't know.

Correct. LLM also don't have intelligence. What you and others mistake for intelligence is a very clever prediction model.

A LLM don't reason at all. It only tells you what is a most likely response based on its training data.

This is why the LLM cannot tell you it don't know.

viraptor · 10 months ago
People throw around the "intelligence" and "reasoning" arguments as if we have commonly understood and strict definitions of those concepts and don't constantly have issues of either including or excluding unexpected cases.

Maybe once we solve simple issues like "are cephalopods intelligent" and "are people deterministic prediction machines", we can start talking about classifying Ai capabilities...

Shorel · 10 months ago
> What you and others mistake for intelligence is a very clever prediction model.

It is not a very clever prediction model. It is a very big, incredibly large humongous model that finds stuff basically by brute-forcing pattern matching over almost all text we have produced in the past.

A clever model would get us similar results using the same amount of energy a human brain uses for that task, which is tiny.

rvz · 10 months ago
Spot on. But unfortunately the AI grifters are now active and selling this snake-oil on to the rest of us.

> This is why the LLM cannot tell you it don't know.

To the untrained eye it appears that it knows everything about every question it answers when in fact those who are experts in their own domains can detect if it is hallucinated and generated complete slop.

xg15 · 10 months ago
I don't think you even have to look deep into the model for it. There seem to be some projects who are simply analyzing the logits during decoding to estimate how "certain" the model is of the next token.

But if all people do is random sampling or blindly taking the highest-scored prediction, this will of course fall under the table.

2-3-7-43-1807 · 10 months ago
metacognition is a pretty lofty concept. but practically speaking you are wrong. i recommend watching karpathy's last video on llms. it is possible to fine tune a model specifically on uncertain questions to produce a statement expressing uncertainty.
iamnotagenius · 10 months ago
"Practically speaking" GP is obviously blatantly right, as there is no LLM in existence capable of doing what Karpathy claims is possible.
exitb · 10 months ago
> If you peek inside the LLM, the process seems different for things they don't know.

This is really interesting, could you link some relevant resources? I wonder if a chatbot could at least signal at the UI level that the answer might be a hallucination.

wavefunction · 10 months ago
Its
me_me_me · 10 months ago
Here is a small kicker. Human brains absolutely do the same.

I split brain patients there are behaviours initiated by one hemisphere not known to the other (due to severed connection) and the person part of brain will make up a reason (often quite stupid) for the action and beleive it 100%.

It's eirely similar to hallucinations of ai.

That said a current llms are not aware, but are starting to act more and more like it.

kristiandupont · 10 months ago
I had a similar insight (blog post: [link redacted]).

In a very unscientific way, I would say that the LLM is not the whole brain, it's part of it and we are still in the process of simulating other parts. But it does seem to me like we've solved the hard part, and it's astonishing to me that people like authors of this article seem to think that the current state of things is where evolution stops.

marci · 10 months ago
iamnotagenius · 10 months ago
No it is not neither eerily nor computationaly similar to AI hallucinations, as we do not have a dam idea how the brains think.
becquerel · 10 months ago
Maybe I am just way deeper in this space that any well-adjusted person should be, but the line of 'did you know LLMs are bad with niche factual information tasks in non-verifiable domains?' has become extremely boring to me. It feels very hard to find something actually new to say on the topic. I find it amazing people still feel the need to talk about it. But then again, I guess most people don't know the difference between a 4o and an R1.
britch · 10 months ago
I am not surprised by the result of this article, but LLMs are marketed to be used to get factual information on non-verifiable domains.

Ads shows kids asking for answers to homework on things like "when did xyz battle take place"

Your frustration with people talking about it might be better directed at the people marketing it

jmull · 10 months ago
> I find it amazing people still feel the need to talk about it.

From what I see, not very many people understand the limitations of LLMs (e.g., scroll up and down the very page you're reading now). This stuff probably needs to be talked about a lot more.

khazhoux · 10 months ago
What's bothersome is the undertone of "Behold! For I have demonstrated that the entire world is led astray!"

This is every person in 2007 who looked at an iphone and thought "What's the big deal?" and not only failed to recognized what had changed, but confidently insisted others were wrong.

oa335 · 10 months ago
> This is every person in 2007 who looked at an iphone and thought "What's the big deal?" and not only failed to recognized what had changed, but confidently insisted others were wrong.

That’s an uncharitable take. People are pointing out its problems precisely because they believe AI is going to be transformative, but will have negative consequences for humanity as well.

I think it’s more akin to someone in 2007 seeing an I phone and saying: this is a terrible device, people will look at while driving, it will be used to create vast intrusive surveillance state, etc.

ben_w · 10 months ago
> This is every person in 2007 who looked at an iphone and thought "What's the big deal?" and not only failed to recognized what had changed, but confidently insisted others were wrong.

That was me. Well, the first half — I wasn't confidently insisting others were wrong, because even back then I had a vague inkling even then that my preferences in general are very weird.

But to AI: I think this is more like Wikipedia, where the frequent errors made it the butt of many jokes, e.g. https://www.youtube.com/watch?v=aUApUyurxwY

homebrewer · 10 months ago
We've been through so many of these hype cycles before, the vast majority of which came to nothing, that it pays to be cautious. Are you sure it's the second iphone situation and not a repeat of the cryptocurrency bubble, which was at its peak just a few years ago? And is yet to find any applications besides financial speculation?
ginvok · 10 months ago
If something is bad or wrong and not fixed yet, people have/need to criticize. What's so wrong about that?
eviks · 10 months ago
That's ok, its value isn't in exciting you, but warning the better adjusted ones
bongripper · 10 months ago
The author may not be as smart, educated, hot and successful as you, but the fact that today, people around the world, including students and educators, use LLMs as knowledge machines and take their output at face value shows that such critical posts are still urgently needed.
6510 · 10 months ago
This is a good thing, accepting some stuff written some place as true and repeating it uncritically greatly contributes to human stupidity. To quite a friend of mine: But then I would have to question everything!?!
me_me_me · 10 months ago
There are people who take Wikipedia or Russia Today as a source of unbiased truth

Can't change lazy people

nrvn · 10 months ago
I have finally found the value of llms in my daily work.

I never ask them anything that requires rigorous research and deep knowledge of subject matter.

But stuff like “create a script in python to do X and Y” or “how to do XY in bash” combined with “make it better” produces really good and working in 95% of the time results and saves my time a lot! No more googling for adhoc scripting. It is like having a junior dev by your side 24/7. Eager to pick up any task you throw at them, stupid and overconfident. Never self-reviewing himself. But “make it better” actually makes things better at least once.

ExtraEmpathy · 10 months ago
This matches my experience closely. LLMs are great at turning 10 minute tasks into 1 minute tasks. They're horrible at funding deep truth or displaying awareness of any kind.

But put some documentation into a RAG and it saves me looking things up.

spiderfarmer · 10 months ago
You'll have the same "aha" moment when you hear a certain unelected vice-president confidently wade into your area of expertise — where his usual smooth-talking veneer shatters like a plate at a Greek wedding. Yet, his most devoted fans remain undeterred, doubling down on the myth of his omniscience with the zeal of a flat-earther explaining airline routes.
umeshunni · 10 months ago
You're talking about Al Gore discussing climate change, right?
spiderfarmer · 10 months ago
Bingo! The final straw was when Al bought a social media platform just to boost his own overconfident posts about a wide range of subjects. When he claimed to be the world’s best Diablo player I just lost it.
brap · 10 months ago
>Hmm, now how can I make this about Trump/Elon?

You might want to lay off the news/Reddit for a while

spiderfarmer · 10 months ago
It’s not about him—it’s about recognizing hubris. If someone confidently blunders into your domain and reveals they have no idea what they’re talking about, it’s universally amusing, regardless of the person. Thanks for showing us how eager some people are to defend personas over substance.
nbuujocjut · 10 months ago
Asking Claude this morning. Seems pretty reasonable and contains the warning about accuracy.

> Michael P. Taylor reassigned Brachiosaurus brancai to the new genus Giraffatitan in 2009. The species became Giraffatitan brancai based on significant anatomical differences from the type species Brachiosaurus altithorax.

> Given that this is quite specific paleontological taxonomy information, I should note that while I aim to be accurate, I may hallucinate details for such specialized questions. You may want to verify this information independently.

acka · 10 months ago
I asked ChatGPT+ using Scholar GPT as GPT. This is the answer I got back, not too bad:

The species Brachiosaurus brancai was reassigned to its own genus, Giraffatitan brancai, by paleontologist George Olshevsky in 1991. Olshevsky proposed that Brachiosaurus brancai, which was originally described by Werner Janensch in 1914, was distinct enough from Brachiosaurus altithorax (the type species of Brachiosaurus) to warrant its own genus. Subsequent studies, particularly by Michael Taylor in 2009, provided further anatomical evidence supporting this distinction.

simonbarker87 · 10 months ago
I only trust LLMS with questions whose answers prove themselves correct or incorrect - so basically code, if it runs and produces the result I was looking for then great, or where the answer is a stepping off point to my own research on something non-critical like travel. ChatGPT is pretty good at planning travel itineraries, especially if pre promoted with a good description about the groups interests.

Beyond that I don’t trust them at all.

-__---____-ZXyw · 10 months ago
Superficially resembling cognition =/= cognition.

I'm quite excited about many of the specific use cases for LLMs, and have worked a few things into my own methods of doing things. It's a quick and convenient way to do lots of actual specific things.

For example: if I want to reflect on different ways to approach a (simple) maths problem, or what sorts of intuitions lie behind an equation, it is helpful to have a tool that can sift through the many snippets of text out there that have touched off that and similar problems, and present me with readable sentences summing up some of those snippets of text from all those places. You've to be very wary, as highlighted by the article, but as "dumb summarisers" that save you trawling through several blogs, they can be quicker to use.

Nonetheless, equating this with "reasoning" and "intelligence" is only possible for a field of academics and professionals who are very poorly versed in the humanities.

I understand that tech is quite an insular bubble, and that it feels like "the only game in town" to many of its practitioners. But I must admit that I think it's very possible that the levels of madness we're witnessing here from the true believers will be viewed with even more disdain than "blockchain" is viewed now, after the dust has settled years later.

Blockchain claimed it was going to revolutionise finance, and thereby upend the relationship between individuals and states.

AI people claim they're going to revolutionise biology, and life itself, and introduce superintelligences that will inevitably alter the universe itself in a way we've no control over.

The danger isn't "AI", the danger is the myopia of the tech industry at large, and its pharaonic figureheads, who continue to feed the general public - and particularly the tech crowd - sci-fi fairytales, as they vie for power.

tempodox · 10 months ago
The most interesting aspect of all this “AI” craze is how it plays into people's forgotten wishes to believe in miracles again. I have never seen anything else that exposes this desire so conspicuously. And of course all the shrewd operators know how to use this lever.

In ancient times you had to travel to Delphi to consult Apollon's oracle. Now you can do it from the comfort of your armchair.

tim333 · 10 months ago
Hmm. Think the science guys probably do actually understand "reasoning" and "intelligence" and it's the humanities guys who don't understand much AI or science.