It seems like LLMs would be a fun way to study/manufacture syncretism, notions of the oracular, etc; turn up the temperature, and let godhead appear!
If there’s some platonic notion of divinity or immanence that all faith is just a downward projection from, it seems like its statistical representation in tokenized embedding vectors is about as close as you could get to understanding it holistically across theological boundaries.
All kidding aside, whether you are looking at Markov chain n-gram babble or high temperature LLM inference, the strange things that emerge are a wonderful form of glossolalia in my opinion that speak to some strange essence embedded in the collective space created by the sum of their corpi text. The Delphic oracle is real, and you can subscribe for a low fee of $20/month!
> the strange things that emerge are a wonderful form of glossolalia in my opinion that speak to some strange essence embedded in the collective space created by the sum of their corpi text. The Delphic oracle is real, and you can subscribe for a low fee of $20/month!
I've had some surprisingly insightful tarot readings with the assistance of ChatGPT and Claude. I use tarot for introspection rather than divination, and it turns out LLMs are extremely good at providing a sounding board to mirror and understand those insights.
This is what strikes me about the peter todd phenomenon -- that there are hidden glitch tokens within the LLM that seem to conjure some representation of pure hell, and some representation of pure good.
I've used it for something to this effect, personalized "mantras" or "prayers" which fit the specific collection / mix of theologoical themes/concepts that I personally identify with. Im not necessarily religious but there is something nice about thematically relatable theocratic babble to recite during times of great turmoil and confusion to calmn the mind. like an on-demand mental reset switch.
To take that literally, I imagine it would be horrifying to know the future in as much detail as you care to ask for, but because it is true, to be unable to change it at all. It would make you an NPC in your own life.
So you'd rather be an unaware NPC than an aware one? Feel like you have "Free Will" even when you don't? Only want the truth when it's pretty and prefer a lie when it's uncomfortable?
I'm learning New Testament Greek on my own*, and sometimes I paste a snippet in to Claude Sonnet and ask questions about the language (or occasionally the interpretation); I usually say it's from the New Testament but don't bother with the reference. Probably around half the time, the opening line of the response is, "This verse is <reference>, and...". The reference is almost always accurate.
So the theory behind Guided Immersion is that you shouldn't need most of that. When Priscilla and Aquilla were learning Greek, nobody sat them down and said, "Now definite articles are inflected according to gender, number, and case: ho, hoi, ..." They were just given example after example, and the language processing unit of their brains figured it out.
So Guided Immersion tries to just give you not only vocab, but grammar in such a way that there's always only a handful of concepts you haven't mastered.
I developed Guided Immersion to help myself master Mandarin, actually; I used Anki with Mandarin for probably 8 years before developing Guided Immersion; once I switched I never went back. Then about a year and a half ago ago I ported it over to Koine Greek not knowing any Greek, and started using it myself after watching a handful of YouTube Videos introducing the characters and the basic cases.
Maybe it's just the way my brain works, but I can't imagine sitting down and trying to memorize all those endings, particularly for the verbs.
I have now bought Mounce's "Basics of Biblical Greek Grammar", and "The Morphology of Biblical Greek", to help me refine the "language schema" the algorithm uses. I appreciate the work Mounce has done to find the deeper morphological rules which make sense of what look like "irregular" inflections; teaching the algorithm about those will certainly help it to present things in a more useful way to learners. But I don't think trying to grind through all that in your conscious mind is the way to go.
Mounce's Basics of Biblical Greek and the workbook were good enough that I stopped watching the lectures. The workbook is excellent. Can't recommend it enough.
I tested this back when GPT4 was new. I found ChatGPT could quote the verses well. If I asked it to summarize something, it would sometimes hallucinate stuff that had nothing to do with what was in the text. If I prompted it carefully, it could do a proper exegesis of many passages using the historical-grammatical method.
I believe this happens because the verses and verse-specific commentary are abundant in the pre-training sources they used. Whereas, if one asks a highly-interpretive question, then it starts re-hashing other patterns in its training data which are un-Biblical. Asking about intelligent design, it got super hostile trying to beat me into submission to its materialistic worldview every paragraph.
So, they have their uses. I’ve often pushed for a large model trained on Project Gutenberg to have a 100% legal model for research and personal use. A side benefit of such a scheme would be that Gutenberg has both Bibles and good commentaries which trainers could repeat for memorization. One could add licensed, Christian works on a variety of topics to a derived model to make a Christian assistant AI.
When I test new LLMs (whether SaaS or local), I have them create a fake post to r/AmItheAsshole from the POV of the older brother in the parable of the Prodigal Son.
LLMs are bad databases, so for something like a bible which is so easily and precisely referenced, why not just... look it up?
This is playing against their strengths. By all means ask them for a summary, or some analysis, or textual comparison, but please, please stop treating LLMs as databases.
A year or so ago, there was a complaint from the NY Times (IIRC) that by asking about some event, they were able to get back one of their articles almost verbatim--and alleging that this was a copyright violation. This appears to be a similar outcome, where you do get back the verbatim text. That to me is a good reason to do tests like this, although feel free to do it with the WaPo or some other news outlet instead.
Not sure why you are so upset about a small and neat study ("please, please stop").
If you ask it to summarize (without feeding the entire bible), it needs to know the bible. Knowledge and reasoning are not entirely disconnected.
ChatGPT chat interface has impressed me when going beyond the scope presented on TFA, eg, when asking about predestination, biblical passages for and against, theologians' and scholars' takes on the debate, and exploring the details in subsequent follow-ups. The LLMs have been fed the Bible and all manner of discussions of Bible-related matters. Like the grandparent comment suggests, the LLMs are much more impressive at interpreting biblical passages and presenting the varieties of opinions about them, or finding passages related to specific topics and presenting opinions.
> Not sure why you are so upset about a small and neat study
This article is yet another example of someone misunderstanding what an LLM is at a fundamental level. We are all collectively doing a bad job at explaining what LLMs are, and it's causing issues.
Only recently I was talking to someone who loves ChatGPT because it "takes into account everything I discuss with it", only, it doesn't. They think that it does because it's close-ish, but it's literally not at all doing a thing that they are relying upon it to do for their work.
> If you ask it to summarize (without feeding the entire bible), it needs to know the bible.
There's a difference between "knowing" the bible and its many translations/interpretations, and being able to reproduce them word for word. I would imagine most biblical scholars can produce better discourse on the bible than ChatGPT, but that few if any could reproduce exact verbatim content. I'm not arguing that testing ChatGPT's knowledge of the bible isn't valuable, I'm arguing that LLMs are the wrong tool for the job for verbatim reproduction, and testing that (and ignoring the actual knowledge) is a bad test, in the same way that asking students to regurgitate content verbatim is much less effective as a method of testing understanding than testing their ability to use that understanding.
This is nice work. The safest approach is using the look up - which his data shows to be very good - and combine that with a database of verses. That way textual accuracy can be retained and very useful lookup be carried out by LLM. This same approach can be used for other texts where accurate rendering of the text is critical. For example say you built a tool to cite federal regulations in an app. The text is public domain and likely in the training data of large LLMs but in most use cases hallucinating the text of a fed regulation could expose the user to significant liability. Better to have that canonical text in a database to insure accuracy.
This is interesting. I'm curious about how much (and what) these LLMs memorize verbatim.
Does anyone know any more thorough papers on this topic? For example, this could be tested on every verse in bible and lots of other text that is certainly in the training data: books in project gutenberg, wikipedia articles, etc.
For one anecdotal data point, GPT-4 knows the "navy SEAL copypasta" verbatim. It can reproduce it complete with all the original typos and misspellings, and it can recognize it from the first sentence.
Has there been any serious study of exactly how LLMs store and retrieve memorized sequences? There are so many interesting basic questions here.
Does verbatim completion of a bible passage look different from generation of a novel sequence in interesting ways? How many sequences of this length do they memorize? Do the memorized ones roughly correspond to things humans would find important enough to memorize, or do LLMs memorize just as much SEO garbage as they do bible passages?
LLMs do not store and retrieve sequences. LLMs are not databases. LLMs are not predictable state machines. Understand how these things work.
They take the input context and generate the next token, then feed that whole thing back in as context and predict the next token, and repeat until the most likely next token is their stop word.
If they produce anything like a retrieved sequence, that's because they just happened to pick that set of tokens based on their training data. Regenerating the output from exactly the same input has a non-zero chance of generating different output.
Sure, and human brains aren’t databases either, but it’s sometimes reasonable to say that we “store” and “retrieve” knowledge. All models are wrong but some are useful.
The question I’m asking is, how is this working in an LLM? How exactly do their weights encode (seemingly) the entire bible such that they can recreate long passages verbatim from a prompt that likely doesn’t appear anywhere in the training data (e.g. some vague description of a particular passage).
It should have a zero chance of generating different output if the temperature is set to zero as in TFA. LLMs are not stochastic algorithms unless you add entropy yourself. Of course most people just use ChatGPT with its default settings and know nothing about the specifics.
The point is, though – somehow the model has memorized these passages, in a way that allows reliable reproduction. No doubt in a super amorphous and diffuse way, as minute adjustments to the nth sigbits of myriads of floating-point numbers, but it cannot be denied that it absolutely has encoded the strings in some manner. Or otherwise you have to accept that humans can't memorize things either. Indeed given how much our memory works by association, and how it's considerably more difficult to recount some memorized sequence from an arbitrary starting point, it's easy to argue that in some relevant way human brains are next-token predictors too.
I imagine Bible passages, at least the more widely quoted and discussed ones, appear many, many times in the various available translations, in inspirational, devotional, scholarly articles, in sermon transcripts, etc. This surely reinforces almost word-for-word recall. SEO garage is a bit different each time, so common SEO-reinforced themes might be recalled in LLM output, but not word for word.
If there’s some platonic notion of divinity or immanence that all faith is just a downward projection from, it seems like its statistical representation in tokenized embedding vectors is about as close as you could get to understanding it holistically across theological boundaries.
All kidding aside, whether you are looking at Markov chain n-gram babble or high temperature LLM inference, the strange things that emerge are a wonderful form of glossolalia in my opinion that speak to some strange essence embedded in the collective space created by the sum of their corpi text. The Delphic oracle is real, and you can subscribe for a low fee of $20/month!
I've had some surprisingly insightful tarot readings with the assistance of ChatGPT and Claude. I use tarot for introspection rather than divination, and it turns out LLMs are extremely good at providing a sounding board to mirror and understand those insights.
https://www.lesswrong.com/posts/jkY6QdCfAXHJk3kea/the-petert...
It is so deep.
Dead Comment
* Using a system I developed myself; currently in open development: https://www.laleolanguage.com
https://www.biblicaltraining.org/learn/institute/nt201-bibli...
(Note: They actually host free classes from instructors at over a dozen seminaries. Mounce himself is a top expert in Greek.)
For anyone learning Biblical Hebrew, I found Master's Seminary has some courses on it:
https://www.youtube.com/watch?v=Qvh8yziVsCE&list=PL9392DD285...
https://www.youtube.com/watch?v=joDB5azc_CM&list=PL4DC84F8EB...
So Guided Immersion tries to just give you not only vocab, but grammar in such a way that there's always only a handful of concepts you haven't mastered.
I developed Guided Immersion to help myself master Mandarin, actually; I used Anki with Mandarin for probably 8 years before developing Guided Immersion; once I switched I never went back. Then about a year and a half ago ago I ported it over to Koine Greek not knowing any Greek, and started using it myself after watching a handful of YouTube Videos introducing the characters and the basic cases.
Maybe it's just the way my brain works, but I can't imagine sitting down and trying to memorize all those endings, particularly for the verbs.
I have now bought Mounce's "Basics of Biblical Greek Grammar", and "The Morphology of Biblical Greek", to help me refine the "language schema" the algorithm uses. I appreciate the work Mounce has done to find the deeper morphological rules which make sense of what look like "irregular" inflections; teaching the algorithm about those will certainly help it to present things in a more useful way to learners. But I don't think trying to grind through all that in your conscious mind is the way to go.
I believe this happens because the verses and verse-specific commentary are abundant in the pre-training sources they used. Whereas, if one asks a highly-interpretive question, then it starts re-hashing other patterns in its training data which are un-Biblical. Asking about intelligent design, it got super hostile trying to beat me into submission to its materialistic worldview every paragraph.
So, they have their uses. I’ve often pushed for a large model trained on Project Gutenberg to have a 100% legal model for research and personal use. A side benefit of such a scheme would be that Gutenberg has both Bibles and good commentaries which trainers could repeat for memorization. One could add licensed, Christian works on a variety of topics to a derived model to make a Christian assistant AI.
It's a great, fun test.
This is playing against their strengths. By all means ask them for a summary, or some analysis, or textual comparison, but please, please stop treating LLMs as databases.
If so, what happens to their IP risks?
This article is yet another example of someone misunderstanding what an LLM is at a fundamental level. We are all collectively doing a bad job at explaining what LLMs are, and it's causing issues.
Only recently I was talking to someone who loves ChatGPT because it "takes into account everything I discuss with it", only, it doesn't. They think that it does because it's close-ish, but it's literally not at all doing a thing that they are relying upon it to do for their work.
> If you ask it to summarize (without feeding the entire bible), it needs to know the bible.
There's a difference between "knowing" the bible and its many translations/interpretations, and being able to reproduce them word for word. I would imagine most biblical scholars can produce better discourse on the bible than ChatGPT, but that few if any could reproduce exact verbatim content. I'm not arguing that testing ChatGPT's knowledge of the bible isn't valuable, I'm arguing that LLMs are the wrong tool for the job for verbatim reproduction, and testing that (and ignoring the actual knowledge) is a bad test, in the same way that asking students to regurgitate content verbatim is much less effective as a method of testing understanding than testing their ability to use that understanding.
Does anyone know any more thorough papers on this topic? For example, this could be tested on every verse in bible and lots of other text that is certainly in the training data: books in project gutenberg, wikipedia articles, etc.
Edit: this (and its references) looks like a good place to start: https://arxiv.org/abs/2407.17817v1
Deleted Comment
Does verbatim completion of a bible passage look different from generation of a novel sequence in interesting ways? How many sequences of this length do they memorize? Do the memorized ones roughly correspond to things humans would find important enough to memorize, or do LLMs memorize just as much SEO garbage as they do bible passages?
They take the input context and generate the next token, then feed that whole thing back in as context and predict the next token, and repeat until the most likely next token is their stop word.
If they produce anything like a retrieved sequence, that's because they just happened to pick that set of tokens based on their training data. Regenerating the output from exactly the same input has a non-zero chance of generating different output.
The question I’m asking is, how is this working in an LLM? How exactly do their weights encode (seemingly) the entire bible such that they can recreate long passages verbatim from a prompt that likely doesn’t appear anywhere in the training data (e.g. some vague description of a particular passage).
The point is, though – somehow the model has memorized these passages, in a way that allows reliable reproduction. No doubt in a super amorphous and diffuse way, as minute adjustments to the nth sigbits of myriads of floating-point numbers, but it cannot be denied that it absolutely has encoded the strings in some manner. Or otherwise you have to accept that humans can't memorize things either. Indeed given how much our memory works by association, and how it's considerably more difficult to recount some memorized sequence from an arbitrary starting point, it's easy to argue that in some relevant way human brains are next-token predictors too.