This part is what I want to understand. How does the llm “frame” an answer?
This part is what I want to understand. How does the llm “frame” an answer?
1. How do llm/rag generate an answer given a list of documents and a question? I can do bm25 to get a list of documents, but post that what is logic/algorithm which generates answers given those list?
2. For small models like this, how much data you need to fine tune for a specific use case? For eg, if I need this model to be knowledgable about html/css, then I have access to lot of documentation online that I can feed it. But if it is very specific topic, like types of banana, then it may be only a couple of wikipedia pages. So is fine tuning directly dependant on the quantity of data alone?
then your query is converted into embeddings and the top N chunks are returned via similarity search (cosine or dot product or some other method) - this has advantages over bm25 which is lexical
then you can do some processing or just hand over all the chunks as context saying "here are some documents use them to answer this question" + your query to the llm
feels like a cool project/toy
Rather not write it myself
The very few I know that have had this happen where all computer users, and virtually all victims of social hacking such as "hey, I'm from IT department, sending you an email, could you please...". A friend of mine exposed sensible data of thousands of customers of her bank like this.