They say they trained it on databases they had bought access to etc. And it seems that way.
Because how does ChatGPT:
1. Do what you ask instead of continuing your instructions?
2. Use such nice and helpful language as opposed to just random average of what people say?
3. And most of all — how does it have a structure where it helpfully restates things, summarizes things, warns you against doing dangerous stuff… no way is it just continuing the most probable random Internet text!!
Unlike what the other commenters are saying, RLHF, while powerful, isn't the only way to get an LLM to follow instructions.
Deleted Comment
A better approach would be to use a markov chain built from sampling English text letter-by letter... an even better approach would be to build your stats from some source of English words in IPA transcription with syllable boundaries etc marked, then map from IPA to spelling via some kind of lookup table. We use a similar process in reverse in my research group for building datasets for doing Bayesian phylogenies of language families
This is better for more advanced topics, like dynamic programming (well, advanced for me, anyway). I started out taking over an hour to solve the first problem in the problem sets, but they build on each other slowly, so I was soon able to see solutions within minutes. It took me a weekend to go through chapters I was struggling with, and I did really well on my coding interview the next day.
Deleted Comment