NicholasD43 (u/NicholasD43)

stillpointlab · a month ago

> Embeddings are crucial here, as they efficiently identify and integrate vital information—like documents, conversation history, and tool definitions—directly into a model's working memory.

I feel like I'm falling behind here, but can someone explain this to me?

My high-level view of embedding is that I send some text to the provider, they tokenize the text and then run it through some NN that spits out a vector of numbers of a particular size (looks to be variable in this case including 768, 1536 and 3072). I can then use those embeddings in places like a vector DB where I might want to do some kind of similarity search (e.g. cosine difference). I can also use them to do clustering on that similarity which can give me some classification capabilities.

But how does this translate to these things being "directly into a model's working memory'? My understanding is that with RAG I just throw a bunch of the embeddings into a vector DB as keys but the ultimate text I send in the context to the LLM is the source text that the keys represent. I don't actually send the embeddings themselves to the LLM.

So what is is marketing stuff about "directly into a model's working memory."? Is my mental view wrong?

NicholasD43 · a month ago

You're right on this. "Advanced" RAG techniques are all complete marketing BS, in the end all you're doing it passing the text into the model's context window.