Readit News logoReadit News
anon291 · a year ago
So it seems 'obvious' to me that a network about 50 layers deep (for example) can only reason about symbolic questions for 50 'steps' (in quotes because it's not a step as we think about it). It only seems there's more complexity because it's 50 steps in one or more learned subspaces that the model has been trained in (which might mean the model can accomplish more than one 'human step' in its 'step'). Humans (well intelligent humans at least) seem able to obviously reason beyond those steps, but we all know it requires real thinking and deliberation and perhaps a notepad to be able to do that.

It's quite something to, for example, expect ChatGPT to be able to correctly do 4 digit multiplications without any thought or recourse to 'paper' when very few human beings can do that.

radarsat1 · a year ago
This is true but you have to also consider the autoregressive component. In your example, it's 50 steps per iteration of the model, where the model is executed once for each token in the output.

So practically speaking it's a bit more complicated to calculate how much the model can "think". Of course once a token is output it is committed to that (in the most basic scenario), but that doesn't mean it is not still "thinking" as it produces subsequent tokens.

> perhaps a notepad

Exactly, the context and previously output tokens can be considered such a notepad since they are input for the next steps of the model.

anon291 · a year ago
So part of my general issue with this kind of thinking is that, if we take this as the main means of creating complexity, then shorter prompts are worse for reasoning than longer ones, because longer ones automatically give the model more 'space', to think. Now, I realize that the research community knows this, but I like papers like this that explicitly seek ways to enable the model to 'breathe' a bit,.
Closi · a year ago
Agreed - also prompt engineering encourages LLM's to do this too (i.e. asking the LLM to explain the steps it will take to solve an answer, prior to answering - e.g. Zero-Shot CoT 'Let's think step by step')
blackbear_ · a year ago
This paper does indeed follow your intuition to investigate the limits of transformers on compositional tasks (i.e., those that require multi-step reasoning, including your multiplication example): https://arxiv.org/abs/2305.18654

> Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily developing systematic problem-solving skills. To round off our empirical study, we provide theoretical arguments on abstract multi-step reasoning problems that highlight how autoregressive generations' performance can rapidly decay with increased task complexity.

visarga · a year ago
Maybe the Skill Mix paper is relevant here. They define a list of 100 skills, and then randomly sample tuples of n skills (usually less than 6) and generate a test example using those skills. Apparently only GPT-4 (at the time of the paper) was able to compose 5 skills, the other models just 3 or 2. Beyond 5 skills even GPT-4 was doing much worse.

The interesting finding of the paper is that GPT-4 couldn't have seen all the (topic, skill-tuple) combinations in the training set. If you have 10,000 examples on a topic, and use 5 out of 100 skills, you would need 100^5 training examples to cover all combinations. In conclusion GPT-4 generalizes to new skill combinations, thus it is not a stochastic parrot.

https://arxiv.org/abs/2310.17567

anon291 · a year ago
Ah good... This is definitely a research path I've been looking into. Great to see someone else has already gone there!
visarga · a year ago
You are missing an important detail here - number of tokens - yes, you have 50 "steps" in network depth, but you could have extra tokens. Assuming you don't run out of tape, there is no reason for LLMs to be limited to simple operations.
danielmarkbruce · a year ago
This doesn't make a lot of sense when you consider how backprop works. Layers aren't limited to working independently.

This also doesn't make a lot of sense when you consider models are autoregressive.

082349872349872 · a year ago
Edsger Dijkstra had a precise english style; even though his mother tongue was Dutch, I find he made better use of English than many native speakers.

In one of the EWD's, he reminisced that, as children, they were taught to never begin to speak a sentence unless they already knew how they were going to finish it.

I'd bet these two observations have a causal connection.

zoogeny · a year ago
When I was a young man I was taking a language course while I was temporarily living in a foreign country. There was an older man in the course (not elderly, more like mid-fifties) who was very bad at the new language we were both learning. Yet I noticed he had, what seemed to me, a magic power: he could always make people laugh. He would often whisper something to one of our classmates and they would always get a giant smile on their face or even laugh out loud.

I was intensely curious and I spent some time wondering how he did it. One day, out of the blue, he invited me out to lunch after class. We just chatted for most of the lunch, exchanging backgrounds and stories. Then his face took on a serious expression and he slowly and carefully began to explain something to me as if he was passing on some wisdom.

He said that he never spoke a single sentence without fully saying the sentence in his mind. He said he would often think of the words several times in his mind, revising the phrase until he was happy. He would imagine saying the words to the person in front of him and he would imagine their reaction. And he would continue to revise until he felt confident the person who heard the words he would say would react in the way he wanted them to react. If he could not imagine the person reacting how he wanted them to react, he would not say anything at all.

It was clear to me that he was passing along this advice but also that he was calling me out a bit. He was letting me know that I spoke without thinking. I say what pops into my head. It was like he read my mind honestly, he knew exactly what I was curious about and he answered the question I had for him that I never asked.

I wish I could say that I learned the lesson. When I have tried the technique it has rewarded the effort. But I haven't formed it into a habit and I still tend to let my mouth race ahead of my mind.

MattPalmer1086 · a year ago
That actually sounds like hell to me, a complete absence of spontaneity and being in the moment.

I used to obsessively try to figure out what to say before I said it. I am socially awkward, and it did not help at all. I love writing because it is asynchronous and I can figure things out precisely and edit my thoughts.

But in social situations it is a complete hindrance.

Cthulhu_ · a year ago
I've observed two things. One, writing is different to speaking, because it's async, you can think before you write, you can edit, etc.

But second, speaking in a non-native language makes you think harder about what you're about to say. Less colloquialisms, more focus on making sure your meaning is understood, more sensitivity in case you might offend someone, perhaps?

It's not new either; a lot of science and whatnot has been done in people's not-native language, like French, German, Latin, etc. Another factor there is the lingo of the field; I can't simply say "Kubernetes is een open-bron houder orkestratiesysteem voor het automatiseren van de inzet, schalen, en het beheer van zachte waren" without confusing half my native speaking audience.

wara23arish · a year ago
I love reading his EWDs, I had a professor who worked with him who mentioned he made his students work use pens while taking his tests. To make it less likely for the students to make mistakes??
float4 · a year ago
> he made his students work use pens while taking his tests

This is very common in the Netherlands, I think that's why it was a rule of his.

In general, the Dutch education system seems to be against pencils (at least this was the case until recent; I'm Dutch and mid 20s). You're tought to write using a fountain pen, not a pencil. In high school, you're allowed to switch to ball point but absolutely not to pencil. In university, write with pretty much anything you want, but... not with a pencil. If you do take your test with a pencil, there's genuinely a chance your teacher will give you a 0, although most of the time they'll probably be forgiving.

I majored in CS in the Netherlands and every test was done with good old pen and paper. Students still make mistakes all the time, which is why everyone uses a scrap sheet.

westurner · a year ago
Perhaps to make it easier determine how to correct instruction.

- "Guidelines for keeping a laboratory notebook" (2019) https://news.ycombinator.com/item?id=19123430#19126809

torginus · a year ago
I also learned English from textbooks, and one of the strangest things I encountered that native speakers routinely confuse "their, there, they're" which I never thought was a mistake I could make. It would be like confusing 'wet' and 'vet'. So there's definitely a difference between native and non-native speakers use the language.
qup · a year ago
The people who confuse that mostly have not done very much reading. Audibly, those words are identical.
leobg · a year ago
Even crazier:

“Could of”.

Like “You could of said so”.

ricardobeat · a year ago
Is that even possible, or just hyperbole? I'd bet the latter. I wouldn't be surprised if some people are able to fully unravel entire paragraphs of conversation in their head in a couple of seconds, but that's not something you could teach to children in general.
mannykannot · a year ago
I don't think it is feasible, at least for conversation, but as an aspirational goal for children, along the lines of "put your toys away when you've finished playing with them", it is not a bad one.

It's not unusual for me to think I know how I am going to end a sentence, but then find that I can't get there.

h34t · a year ago
in Dutch (and German) the verb often goes at the end of a sentence, so the advice is rather practical.
ted_bunny · a year ago
German children would with you disagree.
caddy · a year ago
I also wonder if it has anything to do with the process of learning a new language in general. I've thought more thoroughly about how English works since I've been learning French (not that I'm very eloquent in either)
fennecbutt · a year ago
Unfortunately from experience that just gives enough of a delay that you get talked over in a group setting and never get a chance to speak anyway.
dcrimp · a year ago
I had this thought the other day that the whole chain of thought reasoning pattern contributing to improved performance in LLM-based systems seems to sit parallel to Kahneman's two-system model of the mind that he covers in 'Thinking, Fast and Slow'.

Haven't read it in a few years, but I recall the book suggests that we use one 'System 1' in our brains primarily for low-effort, low computation thinking - like 1+1=? or "the sky is ____".

It then suggests that we use a 'System 2' for deliberate, conscious, high-cognitive tasks. Dense multiplication, reasoning problems, working with tools - generally just decision-making. Anything that requires focus or brain power. Our brain escalates tasks from S1 to S2 if they feel complex or dangerous.

Maybe I'm being too cute, but it feels like critique that "LLMs aren't intelligent because they are stochastic parrots" is an observation that they are only equipped to use their 'System 1'.

When we prompt an LLM to think step-by-step, we allow it a workspace to write down it's thoughts which it can then consider in it's next token prediction, a rudimentary System 2, like a deliberation sandbox.

We do a similar thing when we engage our System 2 - we hold a diorama of the world in the front of our mind, where we simulate what the environment will do if we proceed with a given action - what our friend might respond to what we say, how the sheet steel might bend to a force, how the code might break, how the tyres might grip. And we use that simulation to explore a tree of possibilities and decide an action that rewards us the most.

I'm no expert, but this paper seems to recognise a similar framework to the above. Perhaps a recurrent deliberation/simulation mechanism will make it's way into models in the future, especially the action models we are seeing in robotics.

airstrike · a year ago
I'll preface this by saying I know this may sound entirely made up, unscientific, anecdotal, naive, or adolescent even, but luckily nobody has to believe me...

A few weeks back I was in that limbo state where you're neither fully awake nor fully asleep and for some reason I got into a cycle where I could notice my fast-thinking brain spitting out words/concepts in what felt like the speed of light before my slow-thinking brain would take those and turn them into actual sentences

It was like I was seeing my chain of thought as a list of ideas that was filled impossibly fast before it got summarized into a proper "thought" as a carefully selected list of words

I have since believed, as others have suggested in much more cogent arguments before me, that what we perceive as our thoughts are, indeed, a curated output of the brainstormy process that immediately precedes it

giva · a year ago
Well, this sound weird to me in the sense that I don't feel that I think in _words_. I only convert my thoughts into words when i need to speak or write them down; So when I need to communicate them to others, when I need to remember them for later, or when I am stuck and I need to clear things up.

I was actually convinced it was the same for most people, and that for this reason "Rubber duck debugging"[1] is a thing.

1) https://en.wikipedia.org/wiki/Rubber_duck_debugging

nico · a year ago
There is a technique for achieving this state of consciousness, it’s called noting

This is an awareness that advanced meditators seek, practice and develop to perceive “reality as it is”

If you are curious, you might find related discussions, and a great welcoming community at r/streamentry on Reddit

Also the book Mastering the Core Teachings of the Buddha talks about it quite a bit, including instructions on how to do it

dicroce · a year ago
This is fascinating. I had another experience that I think sheds light on some of this. One day I was in my office and the lights were off. I turned around and looked at the dark shape on top of my coworkers desk. For a few seconds I stared blankly and then suddenly I had a thought: PC, it's his PC. Then I started to think about that period of time just before I realized what I was looking at... The only word I can describe what it felt like is: unconscious. Is it possible that consciousness is just a stream of recognition?
theaussiestew · a year ago
I have this too. My cognitive processes are not related to my thinking brain, which I define as the part of my mental process which produces the sounds of words in my mind. Instead, I've observed that first, my subconscious processes concepts at a much more fine grained level, much like the latent space of a machine learning model. Only substantially after, let's say 10ms after, do thoughts arise, which are just pointers to the already processed subconscious process. A very rough analogy would be the inference of an LLM in words, vs all the processing of embeddings that happens internally.
andai · a year ago
I forget the name but I remember reading about this as a recognized process in neurology. We usually only hear the thought that wins, but there are many generated simultaneously, and there is a selection process.

Possibly related, I had a similar experience last night, where my mind simulated a fully realistic conversation between two people, with audio and video, except that the sentences made no sense. I thought that was interesting. My explanation was "the language part of your brain is too tired cause you've been using it all day."

Swizec · a year ago
> I got into a cycle where I could notice my fast-thinking brain spitting out words/concepts in what felt like the speed of light before my slow-thinking brain would take those and turn them into actual sentences

The way I’ve seen this described by psychologists is that System 1 is driving the car while System 2 panicks in the back seat screaming out explanations for every action and shouting directions to the driver so it can feel in control. The driver may listen to those directions, but there’s no direct link between System 2 in the backseat and System 1 holding the wheel.

Various experiments have shown that in many situations our actions come first and our conscious understanding/explanation of those actions comes second. Easiest observed in people with split brain operations. The wordy brain always thinks it’s in control even when we know for a fact it couldn’t possibly have been because the link has been surgically severed.

Being super tired, on the edge of sleep, or on drugs can disrupt these links enough to let you observe this directly. It’s pretty wild when it happens.

Another easy way, for me, is to get up on stage and give a talk. Your mouth runs away presenting things and you’re in the back of your head going “Oh shit no that’s going in the wrong direction and won’t make the right point, adjust course!”

mirror_neuron · a year ago
It’s hard (impossible?) to know if we’re talking about the same thing or not, but I experience something like this all the time, without being on the edge of sleep. We might both be wrong, but it’s relatable!
pictureofabear · a year ago
This seems like it might upend Descartes' "cogito, ergo sum" ("I think therefore I am") in that the process for forming thoughts in a language is not indicative that we exist, rather it merely indicates that we have evolved a brain that can produce and interpret language.

Seems like we're dismantling a lot of what Descartes came up with these days.

melagonster · a year ago
From positive perspective,it is surely that our thinking/mind is not just language and always faster than sentence formation.
JoBrad · a year ago
I had a similar experience when I was put under during surgery a few years ago. Later I learned that they used ketamine in their concoction.
allemagne · a year ago
I occasionally reach a similar state near sleep where I will be half-dreaming that I'm reading from a page of a book where the words materialize/"come into focus" right before my eyes into what is usually vaguely grammatically correct nonsense.
marmaduke · a year ago
> curated output of the brainstormy process that immediately precedes it

Daniel Dennett gives a nice albeit more detailed version of your idea in his book Consciousness Explained, could be worth a read

samstave · a year ago
Mandelthought psyt.
HarHarVeryFunny · a year ago
> it feels like critique that "LLMs aren't intelligent because they are stochastic parrots" is an observation that they are only equipped to use their 'System 1'.

I wouldn't say LLMs aren't intelligent (at all) since they are based on prediction which I believe is the ability that we recognize as intelligence. Prediction is what our cortex has evolved to do.

Still, intelligence isn't an all or nothing ability - it exists on a spectrum (and not just an IQ score spectrum). My definition of intelligence is "degree of ability to correctly predict future outcomes based on past experience", so it depends on the mechanisms the system (biological or artificial) has available to recognize and predict patterns.

Intelligence also depends on experience, minimally to the extent that you can't recognize (and hence predict) what you don't have experience with, although our vocabulary for talking about this might be better if we distinguished predictive ability from experience rather than bundling them together as "intelligence".

If we compare the predictive machinery of LLMs vs our brain, there is obviously quite a lot missing. Certainly "thinking before speaking" (vs LLM fixed # steps) is part of that, and this Q* approach and tree-of-thoughts will help towards that. Maybe some other missing pieces such as thalamo-cortical loop (iteration) can be retrofitted to LLM/transformer approach too, but I think the critical piece missing for human-level capability is online learning - the ability to act then see the results of your action and learn from that.

We can build a "book smart" AGI (you can't learn what you haven't been exposed to, so maybe unfair to withhold the label "AGI" just because of that) based on current approach, but the only way to learn a skill is by practicing it and experimenting. You can't learn to be a developer, or anything else, just by reading a book or analyzing what other people have produced - you need to understand the real world results of your own predictions/actions, and learn from that.

RandomLensman · a year ago
Defining intelligence as prediction leaves out a lot of other things that humans would see as intelligence in other humans (e.g., creating a novel), also quite simple organisms make predictions (e.g., a predator jumping at prey makes a prediction about positions).
hackerlight · a year ago
> online learning - the ability to act then see the results of your action and learn from that.

I don't think that should be necessary, if you are talking about weight updates. Offline batch mode Q-learning achieves the same thing.

By online learning, did you mean working memory? I'd agree with that. Whether it's RAG, ultra-long-context, and LSTM-like approach, or something else, is TBD.

Grimblewald · a year ago
Id say intelligence is a measure of how well you can make use of what you have. An intelligent person can take some pretty basic principles a really long way, for example. Similarly, they can take a basic comprehension of a system and build on it rapidly to get predictions for that system that defy the level of experience they have. Anyone can gather experience, but not everyone can push that experience's capacity to predict beyond what it should enable.
iteygib · a year ago
To me, it is one of those things like defining what 'art' is, as in creating a model in our heads around a concept. We take our definitions and then use those to construct models like AI that simulate our model well enough.

In other words, I personally do not believe any system we develop will be truly 'intelligent', since intelligence is a concept we created to help explain ourselves. We can't even truly define it, but yet we try to test technologies we develop to see if they possess it. It is a bit non sensical to me.

kderbe · a year ago
Andrej Karpathy makes this same point, using the same book reference, in his "[1hr Talk] Intro to Large Language Models" video from Nov. 2023.

Here is a link to the relevant part of his presentation: https://youtu.be/zjkBMFhNj_g?t=2120

biosed · a year ago
Wasn't most of the claims in that book refuted, some even by the author. I really enjoyed it and found some great insights only to be later told by a friend in that sphere that the book was not correct and even the author had "retracted" some of the assertions.
mannykannot · a year ago
It might still be a useful concept in developing LLMs.
jerpint · a year ago
He won a Nobel prize for his works so not sure how much of it would be refuted
tasty_freeze · a year ago
People often say that LLMs aren't really thinking because they are just producing a stream of words (tokens really) reflexively based on some windows of previous text either read or from its own response. That is true.

But I have the experience when talking of not knowing what I'm going to say until I hear what I've said. Sometimes I do have deliberative thought and planning, trialing phrases in my head before uttering them, but apparently I'm mostly an LLM that is just generating a stream of tokens.

Workaccount2 · a year ago
This is something that is easily observable by anyone at virtually any moment, yet at the same time is something that escapes 99% of the population.

When you are talking to someone in normal conversation, you are both taking in the words you are saying at the same time.

OJFord · a year ago
I'm currently reading it for the first time, completely coincidentally/not for this reason, and on a few occasions I've thought 'Gosh that's just like' or 'analogous to' or 'brilliant description of that problem' for LLMs/generative AI or some aspect of it. I wish I could recall some examples.
glial · a year ago
I think of COT as a memory scratchpad. It gives the LLM some limited write-only working memory that it can use for simple computations (or associations, in its case). Now suppose an LLM had re-writeable memory... I think every prompt-hack, of which COT is one example, is an opportunity for an architecture improvement.
HarHarVeryFunny · a year ago
I think of COT more as a type of planning or thinking before you speak. If you just open your mouth and start talking, which is what a plain LLM does, then you may talk yourself into a corner with no good way to get out of it, or find yourself saying something that really makes no sense. COT effectively allows the LLM to see the potential continuations of what it is considering saying, and pick one that makes sense!

I think lack of COT or any ability to plan ahead is part of why LLMs are prone to hallucinate - if you've already run your mouth and said "the capital of australia is", then it's a bit late to realize you don't know what it is. The plain LLM solution is to do what they always do and predict next word using whatever it had in the training set, such as names of some australian cities and maybe a notion that a capital should be a large important city. IOW it'll hallucinate/bullshit a continuation word such as "Melbourne". With COT it would potentially have the ability to realize that "the capital of australia is" is not a good way to start a sentence when you don't know the answer, and instead say "i don't know". Of course the other cause of hallucinations is that the LLM might not even know what it doesn't know, so might think that "Melbourne" is a great answer.

kouru225 · a year ago
Feel like this is better represented as the default mode network: https://en.m.wikipedia.org/wiki/Default_mode_network

There are questions we know the answers to and we just reflexively spit them out, but then there are questions that are new to us and we have to figure them out separately.

Recent research has shown that new memories are recorded in the brain differently depending on how unique the memory is: https://www.quantamagazine.org/the-usefulness-of-a-memory-gu...

bun_at_work · a year ago
I have a similar view to you and not much to add to your comment, other than to reference a couple books that you might like if you enjoyed 'Thinking, Fast and Slow'.

'The Righteous Mind' by Jonathan Haidt. Here, Haidt describes a very similar 2-system model he describes as the Elephant-rider model.

'A Thousand Brains: A New Theory of Intelligence' by Jeff Hawkins. Here Jeff describes his Thousand Brains theory, which has commonality with the 2-system model described by Kahneman.

I think these theories of intelligence help pave the way for future improvements on LLMs for sure, so just want to share.

iteygib · a year ago
How does evolutionary instinct factor into the system model? Flight or fight responses, reflexes, etc. 'Thinking' does have consequences in terms of evolutionary survival in some circumstances, as in spending too much time deliberating\simulating.
eightysixfour · a year ago
This is a common comparison in the LLM world. I actually think it is closer to the Left/Right Brain differences described in Master and His Emissary, but that’s for a blog post later.
thwarted · a year ago
This sounds similar to the A Brain/B Brain concept that was described by, I believe, Marvin Minsky. I don't know how this might be related to Kahneman's work.
dougmwne · a year ago
I had the same thought from Thinking, Fast and Slow.

Another variation of this seems to be the “thought loop” that agents such as Devin and AutoGPT use.

machiaweliczny · a year ago
It’s a bit over my head for now but seems like GFlowNets are tackling this problem a bit.
dcrimp · a year ago
interesting, hadn't come across these. Will be doing some more reading up on them.
toisanji · a year ago
that is the approach also taken in this paper for building LLM agents with metacognition: https://replicantlife.com/
emmender2 · a year ago
thinking step-by-step requires 100% accuracy in each step. If you are 95% accurate in each step, after the 10th step, the accuracy of the reasoning chain drops to 59%. this is the fundamental problem with llm for reasoning.

reasoning requires deterministic symbolic manipulation for accuracy. only then it can be composed into long chains.

throwuwu · a year ago
You’ve never made a mistake in your reasoning?

Tongue in cheek but this has been considered and has resulted in experiments like tree of thought and various check your work and testing approaches. Thinking step by step is really just another way of saying make a plan or use an algorithm and when humans do either they need to periodically re-evaluate what they’ve done so far and ensure it’s correct.

The trick is training the model to do this as a matter of course and to learn which tool to apply at the right time which is what the paper is about wrt interspersed thoughts.

trenchgun · a year ago
>reasoning requires deterministic symbolic manipulation for accuracy

No, that is automation. Automated reasoning is a thing, indeed. And I can kind of see a world where there is a system which uses LLM for creative thinking, augmented with automated reasoning systems (think datalog, egg, SMT-solver, probabilistic model checking etc).

hesdeadjim · a year ago
I dream of a world where the majority of humans could come close to 59% after attempting a ten step logical process.

Dead Comment

YetAnotherNick · a year ago
Another RL paper with terrible baseline. They used 0 shot non instruction tuned Mistral for GSM8k which has very specific way of output. They got 11% accuracy after improving it, while few shot prompting achieves 37%[1]. GPT 4 could get ~97% with prompting.

[1]: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

hiddencost · a year ago
Fwiw if they're serious scientists, taking a known method and baseline and improving it is good science. Extensions to get state of the art are probably possible, but their goal is to measure just the impact of their change in a simple setting. Let the engineers do the munged system combinations and get SoTA.
YetAnotherNick · a year ago
I am not talking about SoTA. I am talking about deliberate poor baseline. GSM8k consists of two things: solving the problem and getting the output format correct. Getting the output format corrects gives 30% accuracy for the same model where they got 11%. SoTA is 97%.
adlpz · a year ago
Any relation to OpenAI's rumored Q* (i.e. q-star) model? Authors of this paper don't seem affiliated.

Just a name coincidence?

smusamashah · a year ago
I think it's just a play on the same hyped up term.
HarHarVeryFunny · a year ago
I was thinking the same. The STaR paper this is an extension of came out in 2022, so at least possible this is what q-star is based on too, but maybe with Q standing for something else.
pawnty · a year ago
This is the missing piece to train AI which has the ability to reason. There are so many tasks whose answers are known but reason steps are missing. With this method, we can use less annotated data the reach the ability.

The interesting part(I imagine): the generated thought could be hard for human to understand while it is still way more helpful to get the correct answer! If that happens, we have created something more intelligent than ourselves.

kjqgqkejbfefn · a year ago
This is basically what I tried this morning at the prompt level (awful results), but the sketchy idea I had in mind went further by introducing control-flow "meta-tokens" to help the LLM renavigate its context. In this perspective the context would be rethought as a self-editing structured mind-map, with the linear aspect of the context at a time T standing for the execution trace of the exploration of this mind-map so far. Some of those meta-tokens would be able to have side effects on the context, to highlight, give structure, summarize, forget and so on, some of its parts. This could allow for native structured output without using a syntactic format such as json, programmatic constructs in the style of LMQL, implementing memory, etc. The goal: not just to give logical/reasoning abilities to a LLM, but to give it the means to come up with its own cognitive architecture. Implementing structured output (using a <label name="stuff">...</label> token) to also implement memory/scratchpads, would also bring inspectability of those cognitive structures for free. Of course I have no idea how to implement this (I'm a ML tourist).
thesz · a year ago
They do not cite [1], a paper on (learned) variable computation in RNN, applied to language modeling, that predates their work by almost 8 years.

[1] https://openreview.net/pdf?id=S1LVSrcge

Microsoft also had something alike at that time, but for image recognition: a CNN at input and then varable computation at classification.