It really surprises me that embeddings seem to be one of the least discussed parts of the LLM stack. Intuitively you would think that they would have enormous influence over the network's ability to infer semantic connections. But it doesn't seem that people talk about it too much.
absolutely. the first time i learned more deeply about embeddings i was like "whoa... at least a third of the magic of LLMs comes from embeddings". Understanding that words were already semantically arranged in such a useful pattern demystified LLMs a little bit for me. they're still wonderous, but it feels like the curtain has been rolled back a tiny bit for me