I think the "... is all you need" title here is particularly misleading as the paper does in fact use a BERT model for generating the vectors.
So if the implication was that no language model was needed at all and you can just do nearest neighbour on string similarity and patch results together, that implication was clearly wrong.
I think what the paper does show though is that there are methods that can make language models topic-specific without fine-tuning and that yield competitive results even with older models.
Yeah I thought the same -- it struck me at first blush as if it was some kind of super simple architecture that didn't use transformers, and then in the diagram i saw they used BERT to produce the embeddings!
Good information retrieval is a problem we are trying to solve for thousands of years, so even if that's all LLMs are doing then that's still a great achievement.
Of course a more explicit approach like this paper is a really good step in that direction by making it easier to trace information provenance. It might still be nontrivial to answer why the model selected this specific piece of information, and why it was composed in this specific way, but it seems trivial to say where the model got the information from. Which is really all we demand from humans too.
Was the training data quality checked?
If so then LLMs are search engines for catalogs like Yahoo once was and not a good search engine for SEO optimized click farms.
Google search once was great too but then ads and SEO killed it.
Transformers solve compositional reasoning tasks by reducing multi-step compositional reasoning into linearized subgraph matching without problem-solving skills. They can solve problems when they have reasoning graphs in the memory.
LLMs do logic by mimicking logical structures on the text level (and that's why they often need be ordered to do step-by-step for correct answers), so this one may also have the same ability as long as memories are properly utilized.
Funny, I write Clojure for my day job and fun, so I have tried to use ChatGPT to generate code. If anything, it sucks at paren matching. It reminded me of stable diffusion's "six finger problem".
Seems like a common pattern. State of the art models being well replaced by a information retrieval layer (top 10 results) fed into a much lighter model that does something with that plus the original input. Cool result!
This is definitely my bet on where things are going. And not just this particular example - i believe we will identify many recurring submodules and patterns in neural networks that can be extracted into conventional code, leaving a lightweight neural glue layer orchestrating them. This should be more efficient, faster to train, more interpretable, and more reliable, so better for users. But less mysterious, so worse for VCs.
Yeah, that actually sounds amazing to me. If we could limit the LLM to somehow only act as a "reasoning" rather than a "knowledge" layer, such that all the non-trivial domain knowledge has to come from the information retrieval layer, in a fully referenced way, that could potentially "solve" the hallucination problem, no?
Even more than that, I wonder if we could then apply something like this to power some sort of "fact provenance" for the web as a whole, e.g. by populating Wikidata with referenced facts (preferably with extensive human QA).
Yeah, and, on top of that, I think this can lead to smaller (and snappier) agent models, because we no longer have to encode every single piece of information into models. As we carve out more and more parameters and input data, AI development will get more accessible, and we'll get more novel applications. (I'm certainly dreaming here.)
This approach can probably handle most of the queries search engines and Siri-type chatbots handle. The big GPT-type engines can be reserved for the hard problems. Something along those lines is needed to keep the cost of search down. There's an estimate that using a large language model for search is 10x more expensive than existing search engines. Yet few queries really need that much heavy machinery.
The big advantage here would be the ability to attribute entire blocks of text back to a specific source and cross domains just by building a database of embeddings. The downside is that these networks are probably not as creative as they're limited to only data that's available. It might work best to use something like this as an expert system for a GPT like agent to refer to when needed.
Obvious immediate question is, is it as creative? There are a lot creativity left behind when you increase the token size (let's be real, it's just that). As an example creating a new word like "dickstracted"[1] would not ever happen in this model
Over the years I've come to realize the copy paste has probably been a net negative for me and I almost never do it anymore. If you are doing the copy-paste, then change a couple names to match a different pattern thing by hand, the subtle errors you can get by making a mistake can take forever to catch. In code review it always looks plausible unless the reviewer is _very_ careful. Furthermore, it means you are duplicating code - which is sometimes totally fine - but forcing yourself to not copy-paste makes you consider what the abstraction would be and if it would be worth it not.
In the case where you are copy-pasting out of code you don't really understand. Retyping it gives you time to understand and maybe catch existing bugs in the code you are copying.
Generally the repository he was working in, but really it was any application that he had open on his machine. He would remember where words, or portion of words that he needed were, go to them, and copy and paste what he needed.
Just in case you're thinking this: He was not copying large portions of code from stack overflow or anything like that.
He was line by line writing code, a few copy and pastes at a time. Often he times would copy and paste single characters to maintain his flow.
Wait, I need you to please elaborate on this. Where was he sourcing all code he pasted? Did he have a "snippet file" like a painter with a palette or something?
Typically in the code repository with other files that shared a similar pattern context that he was working with.
Sometimes he would need a portion of a word and he would remember that it was in an email he had open, and he would alt+tab and grab the portion of the word from the email, then alt+tab back to the editor and paste the word portion in.
He would go to extreme lengths to not have to move his hands to the home row on the keyboard.
I (used to) work with a colleague who was just the opposite; she did (and still does) ONLY do copy/paste with the mouse. It is excruciating to watch when pairing or on a video meeting.
I get people have different workflows, but not taking advantage of even the minimalist functionality of ones tools I think I will never understand.
Many years ago when I used windows, I had a virus once that killed my keyboard. It was pretty fun to work around that... I ended up using the symbol browser utility to copy individual letters and then right click to paste them places.
My way of doing imperative coding for data science with Python is to write a price of code in Sublime Text, copy and paste to iTerm, run, and get back to the editor. But of course I mapped shift+Enter to do all of that for me. I much prefer this setting to Jupyter Notebooks.
I program with almost all of my script-ey languages, like python, in a similar way - I'm on linux, and I edit everything in Sublime, save it to a file and then run it in a separate terminal. It the command gets complex I'll create a little bash script file, make it execytable, and run that.
I just alt-tab from editer to terminal, check output, etc, and back. That way I have a bunch of unix text-processing tools (grep, sed, etc...) always available. I'm too reliant on print debuging things as I go along, but it's a deeply ingrained habit.
So if the implication was that no language model was needed at all and you can just do nearest neighbour on string similarity and patch results together, that implication was clearly wrong.
I think what the paper does show though is that there are methods that can make language models topic-specific without fine-tuning and that yield competitive results even with older models.
Also the fact that evaluating language models is difficult, and we tend to end up with models that game the evaluation benchmarks.
Of course a more explicit approach like this paper is a really good step in that direction by making it easier to trace information provenance. It might still be nontrivial to answer why the model selected this specific piece of information, and why it was composed in this specific way, but it seems trivial to say where the model got the information from. Which is really all we demand from humans too.
Google search once was great too but then ads and SEO killed it.
Quantum computers would have something to say about this, assuming they ever materialize.
Dead Comment
Faith and Fate: Limits of Transformers on Compositionality https://arxiv.org/abs/2305.18654
Transformers solve compositional reasoning tasks by reducing multi-step compositional reasoning into linearized subgraph matching without problem-solving skills. They can solve problems when they have reasoning graphs in the memory.
however, the "parenthesis" can be any symbol. even grammatical clauses are one sort of "parenthesis" in the way I'm thinking about them
Even more than that, I wonder if we could then apply something like this to power some sort of "fact provenance" for the web as a whole, e.g. by populating Wikidata with referenced facts (preferably with extensive human QA).
Dead Comment
And systems that allow you to "talk" to a PDF via top results of vector search being added to the prompt are also pretty underwhelming.
[1] https://www.urbandictionary.com/define.php?term=Dickstracted
I once worked with a programmer who, the vast majority of time, would only input text into a text editor via copy and paste.
Think anti-vim. His fingers were locked on mouse and crtl+c/v. It was incredible to watch and his programming speed was very impressive.
In the case where you are copy-pasting out of code you don't really understand. Retyping it gives you time to understand and maybe catch existing bugs in the code you are copying.
Just in case you're thinking this: He was not copying large portions of code from stack overflow or anything like that. He was line by line writing code, a few copy and pastes at a time. Often he times would copy and paste single characters to maintain his flow.
Also explains why he was so fast
Sometimes he would need a portion of a word and he would remember that it was in an email he had open, and he would alt+tab and grab the portion of the word from the email, then alt+tab back to the editor and paste the word portion in.
He would go to extreme lengths to not have to move his hands to the home row on the keyboard.
Deleted Comment
I get people have different workflows, but not taking advantage of even the minimalist functionality of ones tools I think I will never understand.
My way of doing imperative coding for data science with Python is to write a price of code in Sublime Text, copy and paste to iTerm, run, and get back to the editor. But of course I mapped shift+Enter to do all of that for me. I much prefer this setting to Jupyter Notebooks.
I just alt-tab from editer to terminal, check output, etc, and back. That way I have a bunch of unix text-processing tools (grep, sed, etc...) always available. I'm too reliant on print debuging things as I go along, but it's a deeply ingrained habit.