Readit News logoReadit News
Dontizi commented on Show HN: I created an AI search engine for the Quebec Civil Code   codex-nuvia.ca/app... · Posted by u/Dontizi
sandra_vu · 6 months ago
that is so cool. Care to share a clickable link?
Dontizi · 6 months ago
I don’t know why it’s not working, but here it is: https://www.codex-nuvia.ca/app
Dontizi commented on Show HN: Open-Source DocumentAI with Ollama   rlama.dev/... · Posted by u/Dontizi
andai · 9 months ago
This appears to do no chunking. It just shoves the entire document (entire book, in my case) into the embedding request to Ollama. So it's only helpful if all your documents are small (i.e. no books).

The embedding model (bge-m3 in this case) has a sequence length of 8192 tokens, i.e. rlama tries to embed the whole book, but Ollama can only put the first few pages into the embedding request.

Then when retrieving, it retrieves the entire document instead of the relevant passage (because there is no chunking), but truncates this to the first 1000 characters, i.e. the first half-page of Table of Contents.

As a result, when queried, the model says: "There is no direct mention of the Buddha in the provided documents." (The word Buddha appears 44,121 times in the documents I indexed.)

A better solution (and, as far as I can tell, what every other RAG does) is to split the document into chunks that can actually fit the context of the embedding model, and then retrieve those chunks -- ideally with metadata about which part of the document it's from.

---

I'd also recommend showing the search results to the user (I think just having a vector search engine is already an extremely useful feature, even without the AI summary / question answering), and altering the prompt to provide references (e.g. the based on the chunk metadata like page number).

Dontizi · 9 months ago
I have just implemented chunking with overlap for larger documents to split texts into smaller chunks and ensure access to all documentation in your RAG. It's currently in the testing phase, and I’d like to experiment with different models to optimize the process. Once I confirm that everything is working correctly, I can merge the PR into the main branch, and you’ll just need to update Rlama with `rlama update`.

u/Dontizi

KarmaCake day131May 12, 2024View Original