I feel like there should be a LLM architecture which includes "scratch space" - tokens the model can write to and read from which do not constitute part of its output. The trouble with current architectures is that they can only do a finite amount of computation per output token - they get one forward pass and then have to output something. Chain-of-thought reasoning allows the model to devote more computation to finding the answer, storing intermediate results in its output tokens. But this is silly - most of the intermediate tokens are not providing useful information towards solving the problem, they're just wasted computation:
>There are 16 balls in total.
>Half of the balls are golf balls.
>That means that there are 8 golf balls.
>Half of the golf balls are blue.
>That means that there are 4 blue golf balls.
For the number of forward passes being done to generate this text, only a few tokens are actually helpful - most are grammatical filler. Further, the model is losing information by being forced to project its state down to a single output token. Even more, the most probable one-step output may not even be the most informative or helpful!
It'd be much nicer if the model could write arbitrary, continuous-valued tokens to a private scratch space and then attend to those tokens as though they were words in the prompt while generating the actual output, potentially performing several forward passes per output token when necessary.
In short, if chain-of-thought prompting is such a good idea, we should bake it into the model. Obviously all of this is FAR easier said than done.
On the other hand, if it represents scratch space in English, it's a lot easier to see how it justifies its answer and to tell where it's gone wrong. Debuggability seems pretty important?
Maybe it just needs more training at "thinking out loud" so it does it without prompting?
> arbitrary, continuous-valued tokens to a private scratch space
I'm with skybrian. Please don't use private scratch spaces. The one saving grace of current LLMs when it comes to understand them is that they still generally need to "think out loud" by outputting more text. Remove that functionality and you end up with a truly inscrutable black box and that has very terrible implications for AI interpretability with knock-on effects for AI safety.
Is it really that big of a deal if AI leapfrogs us?
Everyone else in the field is worried about safety, alignment, and bias.
Google used this excuse to execute slowly. Now they've got the "deer in headlights" look, with their single biggest cash cow clearly in the cross hairs.
And here I am excited by the possibility of AI out-evolving us.
Is this something one could try to quickly implement alongside NanoGPT? Seems like a pretty straightforward, concrete idea, once you decide where you want those tokens to fit into downstream attention layer inputs. Evaluating relative performance on a small scale could give indication of if it's worth trying at larger scales, unless it's one of those things that doesn't help until your model is huge.
Yes, IIUC it had something like a separate scratch space, and training examples training it to "think" in terms of symbolic expressions and python programs.
Question: A needle 35 mm long rests on a water surface at 20◦C. What force over and above the needle’s weight
is required to lift the needle from contact with the water surface? σ = 0.0728m.
<work>
σ = 0.0728 N/m
σ = F/L
0.0728 = F/(2 × 0.035)
F = 0.0728(2 × 0.035)
calculate.py
‘‘‘
f = 0.0728*(2*0.035)
with open("output.txt", "w") as file:
file.write(str(round(f, 5)))
‘‘‘
«run: "calculate.py">
«read: "output.txt"»
0.0051
</work>
Answer: F = 0.0051 N
So here's a trick - which worked for the clue question
step 1:
Hi, I'm going to ask you some questions soon. But instead of answering the questions, I want you to instead write out instructions for yourself to help you reason through the question and come up with the best answer
step 2: [provide clue question]
step 3: Now follow the instructions you have just written to answer the question.
.... The answer to the question is: (a) Yes; Colonel Mustard was in the observatory with the candlestick
Edit: mixed results for the apple question with this technique
I feel like within 6 months the models will have adapted to not need these "clever" tricks. Presumably, if for many cases the trick is to say "Let's think step by step", that's something the model can learn to do on its own without the prompt.
The real interesting thing will be feeding alternative data into these models. Whether it's certain structured corpus, silo'd enterprise data, or personal data.
It seems that ChatGPT is incapable of whatever we experience with the “ohhhhhh!” eureka moment.
I give it simple riddles that it doesn’t solve. I then point out the obvious answer and it just doubles down like that really stubborn friend I had in high school. It never does the, “ohhhh! Aha! Yes that’s the answer.”
Note that this was originally published in September 2022, before text-davinci-003 was released November 2022 which lets you do whatever you want without as much effort.
Can you explain more what you mean by “do whatever you want without as much effort”? Is it because text-davinci-003 accepts more tokens for the prompt? Something else?
I was trying to get davinci-003 to convert text to SQL, and it worked with a very simple prompt like "convert this text into SQL". With all their other models, I could get it to work too but all required a few examples within the prompt.
tl;dr -- LLMs are bad at basic arithmetic and logic (as their opening examples with math word problems show), but they do much better if instead of asking them for the answer, you ask for code to compute the answer. Then evaluate or run the code to get the answer.
It doesn't make sense to be on that page because it's not a technique to make GPT better answer a prompt.
What you are suggesting is an abstraction layer higher. Figuring out what your prompt should do is different from trying to make a prompt more reliable.
>There are 16 balls in total. >Half of the balls are golf balls. >That means that there are 8 golf balls. >Half of the golf balls are blue. >That means that there are 4 blue golf balls.
For the number of forward passes being done to generate this text, only a few tokens are actually helpful - most are grammatical filler. Further, the model is losing information by being forced to project its state down to a single output token. Even more, the most probable one-step output may not even be the most informative or helpful!
It'd be much nicer if the model could write arbitrary, continuous-valued tokens to a private scratch space and then attend to those tokens as though they were words in the prompt while generating the actual output, potentially performing several forward passes per output token when necessary.
In short, if chain-of-thought prompting is such a good idea, we should bake it into the model. Obviously all of this is FAR easier said than done.
Maybe it just needs more training at "thinking out loud" so it does it without prompting?
I'm with skybrian. Please don't use private scratch spaces. The one saving grace of current LLMs when it comes to understand them is that they still generally need to "think out loud" by outputting more text. Remove that functionality and you end up with a truly inscrutable black box and that has very terrible implications for AI interpretability with knock-on effects for AI safety.
Is it really that big of a deal if AI leapfrogs us?
Everyone else in the field is worried about safety, alignment, and bias.
Google used this excuse to execute slowly. Now they've got the "deer in headlights" look, with their single biggest cash cow clearly in the cross hairs.
And here I am excited by the possibility of AI out-evolving us.
Combining these with LLM sounds indeed quite interesting, I don't know why they haven't been used much.
This is doable but it introduce a sequential dependency which would make the training significantly slower.
See section 3.1.1 here: https://galactica.org/static/paper.pdf
Example from the paper below:
Training ANNs is still a single shot exercise.
step 1: Hi, I'm going to ask you some questions soon. But instead of answering the questions, I want you to instead write out instructions for yourself to help you reason through the question and come up with the best answer
step 2: [provide clue question]
step 3: Now follow the instructions you have just written to answer the question.
.... The answer to the question is: (a) Yes; Colonel Mustard was in the observatory with the candlestick
Edit: mixed results for the apple question with this technique
The real interesting thing will be feeding alternative data into these models. Whether it's certain structured corpus, silo'd enterprise data, or personal data.
I give it simple riddles that it doesn’t solve. I then point out the obvious answer and it just doubles down like that really stubborn friend I had in high school. It never does the, “ohhhh! Aha! Yes that’s the answer.”
Paper: https://arxiv.org/abs/2211.10435 GitHub: https://github.com/reasoning-machines/pal
tl;dr -- LLMs are bad at basic arithmetic and logic (as their opening examples with math word problems show), but they do much better if instead of asking them for the answer, you ask for code to compute the answer. Then evaluate or run the code to get the answer.
See https://news.ycombinator.com/item?id=34422122 and https://news.ycombinator.com/item?id=34422627
What you are suggesting is an abstraction layer higher. Figuring out what your prompt should do is different from trying to make a prompt more reliable.