Language models can explain neurons in language models

Of note:

"... our technique works poorly for larger models, possibly because later layers are harder to explain."

And even for GPT-2, which is what they used for the paper:

"... the vast majority of our explanations score poorly ..."

Which is to say, we still have no clue as to what's going on inside GPT-4 or even GPT-3, which I think is the question many want an answer to. This may be the first step towards that, but as they also note, the technique is already very computationally intensive, and the focus on individual neurons as a function of input means that they can't "reverse engineer" larger structures composed of multiple neurons nor a neuron that has multiple roles; I would expect the former in particular to be much more common in larger models, which is perhaps why they're harder to analyze in this manner.

ryandvm · 3 years ago

Funny that we never quite understood how intelligence worked and yet it appears that we're pretty damn close to recreating it - still without knowing how it works.

I wonder how often this happens in the universe...

olalonde · 3 years ago

The battery (Voltaic Pile, 1800) and the telegraph (1830s-1840s) were both invented before the electron was discovered (1897).

brookst · 3 years ago

Imitation -> emulation -> duplication -> revolution is a very common pattern in nature, society, and business. Aka “fake it til you make it”.

Think of business / artistic / cultural leaders nurturing protégés despite not totally understanding why they’re successful.

Of course those protégés have agency and drive, so maybe not a perfect analogy. But I’m going to stand by the point intuitively even if a better example escapes me.

techolic · 3 years ago

> that we're pretty damn close to recreating it

Is that evident already or are we fitting the definition of intelligence without being aware?

samiskin · 3 years ago

Evolution created intelligence without even being intelligent itself

deepsun · 3 years ago

Yep, we don't know all constituents of buttermilk, nor how bread stales (there's too much going on inside). But it doesn't prevent us to judge their usefulness.

fennecfoxy · 3 years ago

I feel like OAI's approach is kind of wrong. GPT4 is still just text transformation/completion with multi-headed attention for better prediction of the next word that should follow (versus only looking at the previous word).

In human brains, language is only a way to communicate thoughts in concept form, though we also seem to use language to communicate abstract thoughts to ourselves to break them apart/down in a way (imo).

I'd love to see someone train a model on the level of GPT4 to generate abstract thoughts/ideas based on input/context and then pair this model with GPT4 co-operatively and continue to train, such that the flow of abstract ideas is parsed by GPT. But like...how do you even train a model that operates on abstract ideas, there doesn't seem to be any way to do this.

gowld · 3 years ago

Starting a fire is easy to do even if you don't know how it works.

pmontra · 3 years ago

We are probably creating something that looks like our intelligence but it works in a different way.

An example: we are not very good at creating flight, the one birds do and humans always regarded as flight, and yet we fly across half the globe in one day.

Going up three meters and landing on a branch is a different matter.

Grimblewald · 3 years ago

Not that weird if you think about it, our intelligence simultaneously measly and amazing as it is, was the product of trial, error, and sheer dumb luck. We could think of ourselves as monkeys with typewriters, eventually we'll get it right.

bluepoint · 3 years ago

No matter what it is probably easier to inspect a language model while it works, than the language module of a human while he speaks.

runlaszlorun · 3 years ago

We don’t know…

PaulHoule · 3 years ago

I like the idea. Note that LLMs have some skill at decoding sequential dense vectors in the human brain

https://pub.towardsai.net/ais-mind-reading-revolution-how-gp...

so why not have them decode sequential dense vectors of their own activations?

As for the majority scoring poorly, they suggest that most neurons won't have clear activation semantics so that is intrinsic to the task and you'd have to move to "decoding the semantics of neurons that fire as a group"

Imnimo · 3 years ago

I don't think this is showing LLMs performing decoding. They're just using the LLM to propose possible words. The decoding is done by using another model to score how well a proposed word matches brain activity, and using that score to select a most likely sequence given the proposals from the LLM.

gitfan86 · 3 years ago

We know that complex arrangements of neurons are triggered based on input and generating output that appears to have some intelligence to many humans.

The more interesting question is why are intelligence/beauty/consciousness emergent properties that exist in our minds.

otabdeveloper4 · 3 years ago

There is no evidence that intelligence runs on neurons. Yes, there are neurons in brains, but there's also lots of other stuff in there too. And there are creatures that exhibit intelligent properties even though they have hardly any neurons at all. (An individual ant has only something like 250000 neurons, and yet they're the only creatures beside humans that managed to create a civilization.)

dennisy · 3 years ago

Nature created humans to understand nature. We created GPT4 to understand ourselves.

talentedcoin · 3 years ago

There is no evidence that any of those are emergent properties. It’s no more or less logical than asserting they were placed there by a creator.

imranq · 3 years ago

I suspect that there's a sweet spot that combines a collection of several "neurons" and a human-readable explanation given a certain kind of prompt. However, this "three-body problem" will probably need some serious analytical capability to understand at scale

kfrzcode · 3 years ago

here's a clue, start your research

Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering, semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant, labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon the state of the art in 9 out of the 12 tasks studied. For instance, we achieve absolute improvements of 8.9% on commonsense reasoning (Stories Cloze Test), 5.7% on question answering (RACE), and 1.5% on textual entailment (MultiNLI).

nr2x · 3 years ago

My favorite line: “Our overall procedure is quite compute intensive.”

Should that be measured in number of nuclear power plants needed to run the computation? Or like, fractions of a small star’s output?

mnky9800n · 3 years ago

Yes but they are from openai so they can just write papers that say whatever they want to say without minding the metrics and then pretend like it is some kind of science.

rvz · 3 years ago

> Which is to say, we still have no clue as to what's going on inside GPT-4 or even GPT-3, which I think is the question many want an answer to.

Exactly. Especially:

> ...the technique is already very computationally intensive, and the focus on individual neurons as a function of input means that they can't "reverse engineer" larger structures composed of multiple neurons nor a neuron that has multiple roles;

This paper just brings us no closer to explainability in black box neural networks and is just another excuse piece by OpenAI to try to please the explainability situation that has been missing for decades in neural networks.

It is also the reason why they cannot be trusted in the most serious of applications which such decision making requires lots of transparency rather than a model regurgitating nonsense confidently.

jahewson · 3 years ago

> It is also the reason why they cannot be trusted in the most serious of applications which such decision making requires lots of transparency rather than a model regurgitating nonsense confidently.

Like say, in court to detect if someone is lying? Or at an airport to detect drugs?

ketzo · 3 years ago

Is it really fair to say this brings us “no closer” to explainability?

This seems like a novel approach to try to tackle the scale of the problem. Just because the earliest results aren’t great doesn’t mean it’s not a fruitful path to travel.

Deleted Comment

TaylorAlexander · 3 years ago

> the explainability situation that has been missing for decades in neural networks.

Is this true? I thought explainability for things like DNNs for vision made pretty good progress in the last decade.

pmarreck · 3 years ago

> It is also the reason why they cannot be trusted in the most serious of applications which such decision making requires lots of transparency rather than a model regurgitating nonsense confidently.

Doesn't this criticism also apply to people to some extent? We don't know what the purpose of individual brain neurons is.

LLMs are quickly going to be able to start explaining their own thought processes better than any human can explain their own. I wonder how many new words we will come up with to describe concepts (or "node-activating clusters of meaning") that the AI finds salient that we don't yet have a singular word for. Or, for that matter, how many of those concepts we will find meaningful at all. What will this teach us about ourselves?

ly3xqhl8g9 · 3 years ago

First of all, our own explanations about ourselves and our behaviour are mostly lies, fabrications, hallucinations, faulty re-memorization, post hoc reasoning:

"In one well-known experiment, a split-brain patient’s left hemisphere was shown a picture of a chicken claw and his right hemisphere was shown a picture of a snow scene. The patient was asked to point to a card that was associated with the picture he just saw. With his left hand (controlled by his right hemisphere) he selected a shovel, which matched the snow scene. With his right hand (controlled by his left hemisphere) he selected a chicken, which matched the chicken claw. Next, the experimenter asked the patient why he selected each item. One would expect the speaking left hemisphere to explain why it chose the chicken but not why it chose the shovel, since the left hemisphere did not have access to information about the snow scene. Instead, the patient’s speaking left hemisphere replied, “Oh, that’s simple. The chicken claw goes with the chicken and you need a shovel to clean out the chicken shed”" [1]. Also [2] has an interesting hypothesis on split-brains: not two agents, but two streams of perception.

[1] 2014, "Divergent hemispheric reasoning strategies: reducing uncertainty versus resolving inconsistency", https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4204522

[2] 2017, "The Split-Brain phenomenon revisited: A single conscious agent with split perception", https://pure.uva.nl/ws/files/25987577/Split_Brain.pdf

haldujai · 3 years ago

I'm not understanding the connection between your paragraphs here even after reading the first article.

Even if you accept classic theory (e.g. hemispheric localization and the homunculus) which most experts don't all this suggests is that the brain tries to make sense of the information it has and in sparse environments it fills in.

How does this make our behavior "mostly lies, fabrications, hallucinations, faulty re-memorization, post hoc reasoning" as most humans don't have a severed corpus callosum.

The discussion starts with:

"In a healthy human brain, these divergent hemispheric tendencies complement each other and create a balanced and flexible reasoning system. Working in unison, the left and right hemispheres can create inferences that have explanatory power and both internal and external consistency."

westurner · 3 years ago

Not supported by neuroimaging. Promoted without evidence or sufficient causal inference.

https://www.health.harvard.edu/blog/right-brainleft-brain-ri... :

> But, the evidence discounting the left/right brain concept is accumulating. According to a 2013 study from the University of Utah, brain scans demonstrate that activity is similar on both sides of the brain regardless of one's personality.

> They looked at the brain scans of more than 1,000 young people between the ages of 7 and 29 and divided different areas of the brain into 7,000 regions to determine whether one side of the brain was more active or connected than the other side. No evidence of "sidedness" was found. The authors concluded that the notion of some people being more left-brained or right-brained is more a figure of speech than an anatomically accurate description.

Here's wikipedia on the topic: "Lateralization of brain function" https://en.wikipedia.org/wiki/Lateralization_of_brain_functi...

Furthermore, "Neuropsychoanalysis" https://en.wikipedia.org/wiki/Neuropsychoanalysis

Neuropsychology: https://en.wikipedia.org/wiki/Neuropsychology

Personality psychology > ~Biophysiological: https://en.wikipedia.org/wiki/Personality_psychology

MBTI > Criticism: https://en.wikipedia.org/wiki/Myers%E2%80%93Briggs_Type_Indi...

Connectome: https://en.wikipedia.org/wiki/Connectome

rounakdatta · 3 years ago

Phantoms in the Brain is a fascinating book that deals with exactly this topic.

BaculumMeumEst · 3 years ago

that is absolutely fascinating and also makes me extremely uncomfortable

jmfldn · 3 years ago

"LLMs are quickly going to be able to start explaining their own thought processes better than any human can explain their own."

There is no "their" and there is no "thought process" . There is something that produces text that appears to humans like there is something like thought going on (cf the Eliza Effect), but we must be wary of this anthropomorphising language.

There is no self reflection, but if you ask an LLM program how "it" knows something it will produce some text.

marshray · 3 years ago

> There is no self reflection, but if you ask an LLM program how "it" knows something it will produce some text.

To be clear, you're saying that we should just dismiss out-of-hand any possibility that an LM AI might actually be able to explain its reasoning step-by-step?

I find it kind of charming actually how so many humans are just so darn sure that they have their own special kind of cognition that could never be replicated. Not even with 175,000,000,000 calculations for every word generated.

jpasmore · 3 years ago

As we don't know for sure what is happening 100% within a neural network, we can say we don't believe that they're thinking and we would still need to define the word thinking. Once LLM's can self-modify, the word "thinking" will be more accurate than it is today.

And when Hinton says at MIT, "I find it very hard to believe that they don't have semantics when they consult problems like you know how I paint the rooms how I get all the rooms in my house to be painted white in two years time," I believe he's commenting on the ability of LLM's to think on some level.

emporas · 3 years ago

Very true. In my opinion, in case there is a way to extract "Semantic Clouds of Words", i.e given a particular topic, navigate semantic clouds word by word, find some close neighbours of that word, jump to a neighbour of that word and so on, then LLMs might not seem that big of a deal.

I think LLMs are "Semantic Clouds of Words" + grammar and syntax generator. Someone could just discard the grammar and syntax generator, just use the semantic cloud and create the grammar and syntax by himself.

For example, in writing a legal document, a slightly educated person on the subject, could just use the relevant words put into an empty paper, fill in the blanks of syntax and grammar, alongside with the human reasoning which is far superior than any machine reasoning, till today at least.

The process of editing the GPT* generated documents to fix reasoning is not a negligible task anyway. Sam Altman mentioned that: "the machine has some kind of reasoning", not a human reasoning ability by any means.

My point is, that LLMs are two programs fused into one, "word clouds" and "syntax and grammar", sprinkled with some kind of poor reasoning. Their word clouding ability, is so unbelievable stronger than any human it fills me with awe every time i use it. Everything else is, just whatever!

callesgg · 3 years ago

The text output of a llm is the thought process. In this context the main difference between humans and llms, is that llms can’t have internalized thoughts. There are of course other differences to, like the fact that humans have a wider gamut of input: visuals, sound, input from other bodily functions. And the fact that we have live training.

icholy · 3 years ago

Or maybe the human thought process isn't as sophisticated as we imagined.

greenhearth · 3 years ago

This is it. This comprehension of the chats is symptom of something like linguistic pareidolia. It's an enforced face that is composed of some probabilistic incidents and wistfulness.

codehalo · 3 years ago

Pride comes before the fall, and the AI comes before humility.

circuit10 · 3 years ago

Clearly there is some process going on to decide what word to pick. Why can’t we call that thinking?

nerpderp82 · 3 years ago

What if you ask it to emit the reflexive output, then feed that reflexive output back into the LLM for the conscious answer?

What if you ask it to synthesize multiple internal streams of thought, for an ensemble of interior monologues, then have all those argue with each other using logic and then present a high level answer from that panoply of answers?

mrcode007 · 3 years ago

If the Gödel incompleteness theorem applies here, then the explanations are likely … incomplete or self-referential.

AlexCoventry · 3 years ago

The Goedel Incompleteness Theorem has no straightforward application to this question.

drdeca · 3 years ago

What leads you to suspect that Gödel incompleteness may be relevant here?

There's no formal axiom system being dealt with here, afaict?

Do you just generally mean "there may be some kind of self-reference, which may lead to some kind of liar-paradox-related issues"?

fnovd · 3 years ago

So is the word "word" but that seems to have worked out OK so far. I can explain the meaning of "meaning" and that seems to work OK too. Being self-referential sounds a lot more like a feature than a bug. Given that the neurons in our own heads are connected to each other and not any ground truth, I think LLMs should do just fine.

PartiallyTyped · 3 years ago

As long as lazy evaluation exists, self-reference is fine, no?

Hofstadter talks about something similar in his books.

wizeman · 3 years ago

That's probably one of the reasons why you'd use GPT-4 to explain GPT-2.

Of course, if you were trying to use GPT-4 to explain GPT-4 then I think the Gödel incompleteness theorem would be more relevant, and even then I'm not so sure.

chrisco255 · 3 years ago

Are there any examples of an LLM developing concepts that do not exist or cannot be inferred from its training set?

fnovd · 3 years ago

"Cannot be inferred from its training set" is a pretty difficult hurdle. Human beings can infer patterns that aren't there, and we typically call those hallucinations or even psychoses. On the other hand, some unconfirmed, novel patterns that humans infer actually represent groundbreaking discoveries, like for example much of the work of Ramanujan.

In a real sense, all of the future discoveries of mathematics already exist in the "training set" of our present understanding, we just haven't thought it all the way through yet. If we discover something new, can we say that the concept didn't exist, or that it "couldn't be inferred" from previous work?

I think the same would apply to LLMs and their understanding of the way we encode information using language. Given their radically different approach to understanding the same medium, they are well poised to both confirm many things we understand intuitively as well as expose the shortcomings of our human-centric model of understanding.

ftxbro · 3 years ago

I'm really curious what kind of concept you might have in mind. Can you give any example of a concept that if an LLM developed that concept then it would meet your criteria? It might sound like a sarcastic question but it's hard to agree on the meanings of "concepts that do not exist" or "concepts that cannot be inferred" maybe you can give some examples.

EDIT: I see below you gave some examples, like invention of language before it existed, and new theorems in math that presumably would be of interest to mathematicians. Those ones are fair enough in my opinion. The AI isn't quite good enough for those ones I think, but I also think newer versions trained with only more CPU/GPU and more parameters and more data could be 'AI scientists' that will make these kinds of concepts.

sebzim4500 · 3 years ago

It is by definition impossible for an LLM to develop a concept that 'cannot be inferred from its training set'.

On the other hand, that is an incredibly high bar.

PeterisP · 3 years ago

Tautologically, every concept that anything (LLM, or human, or alien) develops can be inferred from the input data(e.g. training set), because it was.

sgt101 · 3 years ago

The training sets are so poorly curated we will never know...

Sharlin · 3 years ago

I’m sure LLMs are quickly going to learn to hallucinate (or let’s use the proper word for what they’re doing: confabulate) plausible-sounding but nonsense explanations of their thought processes at least as well as humans.

Aerbil313 · 3 years ago

I don't think we invent new words just for AI to explain its thought process to us better. AI may explain more elaborately in our language instead.

elwell · 3 years ago

And if the LLM is the explainer, it can lie to us if 'needed'.