I feel like I'm falling behind here, but can someone explain this to me?
My high-level view of embedding is that I send some text to the provider, they tokenize the text and then run it through some NN that spits out a vector of numbers of a particular size (looks to be variable in this case including 768, 1536 and 3072). I can then use those embeddings in places like a vector DB where I might want to do some kind of similarity search (e.g. cosine difference). I can also use them to do clustering on that similarity which can give me some classification capabilities.
But how does this translate to these things being "directly into a model's working memory'? My understanding is that with RAG I just throw a bunch of the embeddings into a vector DB as keys but the ultimate text I send in the context to the LLM is the source text that the keys represent. I don't actually send the embeddings themselves to the LLM.
So what is is marketing stuff about "directly into a model's working memory."? Is my mental view wrong?
1) at the end of the day, we are still sending raw text over LLM as input to get output back as response.
2) RAG/Embedding is just a way to identify a "certain chunk" to be included in the LLM input so that you don't have to dump the entire ground truth document into LLM Let's take Everlaw for example: all of their legal docs are in embeddings format and RAG/tool call will retrieve relevant document to feed into LLM input.
So in that sense, what do these non-foundational models startups mean when they say they are training or fine tuning models? Where does the line end between inputting into LLM vs having them baked in model weights
Say right now I have an e-commerce site with 20K MAU. All metrics are going to Amplitude and we can use that to see DAU, retention, and purchase volume. At what point in my startup lifecycle do we need to enlist the services?
On one hand, if the leaving co-founder retains all equity, it creates a sandbagging situation on a cap table that's no longer useful to the business. On the other hand, it feels right for the leaving co-founder to enjoy some upside for the years they put in.
You really haven't thought about it hard enough if you haven't tried writing it down.
I have a whole system of journals that I use to collect my thoughts across various subjects I dabble in. Algorithms: there's a journal for that. Abstract algebra? There's a journal for that. Etc.
At work? I use bullet journal... I add sections in for projects I'm working on. When I'm working on refactoring an old area of the code or investigating a hard-to-diagnose error I start writing. I ask questions, get answers, and I update my project journal. It helps me clarify the issue and I find once I can explain the system or the error clearly the answers (or how to find them) becomes obvious.
It may seem quaint, eccentric, or out-dated but it's a practical, reliable tool. Ask questions and write down the answers. Eventually a coherent narrative and a full thought will form before you.
My team of 6 engineers have a social app at around 1,000 DAU. The previous stack has several machines serving APIs and several machines handling different background tasks. Our tech lead is forcing everyone to move to separate Lambdas using CDK to handle each each of these tasks. The debugging, deployment, and architecting shared stacks for Lambdas is taking a toll on me -- all in the name of separation of concerns. How (or should) I push back on this?