You mention that a language model can't model language, only text, because it only observes examples of text, not language. However, I would argue that these text examples are in fact representations of human language ability. The text that language models are trained on is a product of human language ability. This text, while not a perfect representation of language, is a direct outcome of language use and thus carries within it the patterns and structures that language models can learn.
Moreover, models like Transformers do have a sort of latent space - the embeddings space. This is a continuous space where words and phrases are represented as vectors. The distances and directions between vectors in this space capture semantic and syntactic relationships, which suggests that these models are indeed learning some aspects of language, not just text.
On top of this, many of these models are trained on diverse data sources, including transcriptions of spoken conversations. This introduces elements of pragmatics, or how language is used in real-world conversations, into the training data.
Finally, the translation capabilities of these models further suggest that they are capturing something beyond mere text. They are able to translate between different languages, which implies that they are learning some underlying linguistic structures that are shared across languages.
Again, it's important to stress that these models are far from capturing the full complexity of human language ability. However, to say that they only model text and not language seems to me an oversimplification. They are learning patterns and structures in the data that are intrinsically tied to human language use, and so in a sense, they are modeling aspects of language.
Writing code from scratch to process and search 200k unstructured documents -- parsing, cleaning, chunking, OpenAI embedding API, serialization code, linear search with cosine similarity, and the actual time to debug, test and run all this -- took me less than 3 hours in Go. The flat binary representation of all vectors is under 500 MB. I even went ahead and made it mmap-friendly for the fun of it even though I could read it into all into memory.
Even the dumb linear search I wrote takes just 20-30ms per query on my Macbook for the 200k documents. The search results are fantastic.
Encryption is just one example. Its ability across the entirety of math and science would be equally powerful.
I don't think we're on the threshold of AGI, but it's interesting that the wanna-be AIs are running into human issues...
I see all of the half demos where it doesn't complete anything, I've tried it myself and.. well, if we're being honest it was shite. I've seen a whole load of tweet threads saying what it could be used for..
Literally just looking for one example of a successful run. Anything at all.
I can definitely see that there may be potential (if not this then the ideas that come off the back of this) but even I don't have a real use case for it yet, I'm just tinkering.
I guess my XY question: Am I being suckered into the web3 of AI? Lots of buzz, no use case.
Furthermore, due to autoregressive nature of GPT models, the more auto-gpt generates (the more it works, the more tasks it performs..) the chance of things going off the right path grow exponentially, and then it is 'doomed' to the end [1].
Thus, chance of this being actually useful for anything longer than what a simple prompt can already do with a tool like ChatGPT is very low.
The end result is an impressive concept but a practically unusable tool. And the problem, in general, is that as the auto-gpt improves (which it will at impressive pace), so will our ambition in using it, which will lead to constant disappointment and what we have today will be generally how we feel about it in the future. Always needing "just a bit more", but never really there.
We already have a "baby AGI" that has been deployed in production environment for a few years - it is called Tesla self driving. It was supposed to get us from point a to point b completely autonomously. And for 6 years now it has been almost "almost there", but never really there (and arguably never will be).
What this does though, is create and inflate a giant FOMO, and the best way of dealing with FOMOs (long term) is to stay on the firm ground, observe, wait for clarity and the right action.
[1] Watch in particular Yann LeCun's presentation at https://www.youtube.com/watch?v=x10964w00zk
Lots of algorithms like nearest neighbor search are O(n^2) but algorithms for approximate results run in sublinear time.