Dontizi (u/Dontizi) - Readit News

Dontizi commented on Show HN: I created an AI search engine for the Quebec Civil Code codex-nuvia.ca/app... · Posted by u/Dontizi

sandra_vu · 6 months ago

that is so cool. Care to share a clickable link?

Dontizi · 6 months ago

I don’t know why it’s not working, but here it is: https://www.codex-nuvia.ca/app

Dontizi commented on Show HN: I created an AI search engine for the Quebec Civil Code codex-nuvia.ca/app... · Posted by u/Dontizi

Dontizi · 6 months ago

https://www.codex-nuvia.ca/app

Dontizi commented on Show HN: Open-Source DocumentAI with Ollama rlama.dev/... · Posted by u/Dontizi

andai · 9 months ago

This appears to do no chunking. It just shoves the entire document (entire book, in my case) into the embedding request to Ollama. So it's only helpful if all your documents are small (i.e. no books).

The embedding model (bge-m3 in this case) has a sequence length of 8192 tokens, i.e. rlama tries to embed the whole book, but Ollama can only put the first few pages into the embedding request.

Then when retrieving, it retrieves the entire document instead of the relevant passage (because there is no chunking), but truncates this to the first 1000 characters, i.e. the first half-page of Table of Contents.

As a result, when queried, the model says: "There is no direct mention of the Buddha in the provided documents." (The word Buddha appears 44,121 times in the documents I indexed.)

A better solution (and, as far as I can tell, what every other RAG does) is to split the document into chunks that can actually fit the context of the embedding model, and then retrieve those chunks -- ideally with metadata about which part of the document it's from.

---

I'd also recommend showing the search results to the user (I think just having a vector search engine is already an extremely useful feature, even without the AI summary / question answering), and altering the prompt to provide references (e.g. the based on the chunk metadata like page number).

Dontizi · 9 months ago

I have just implemented chunking with overlap for larger documents to split texts into smaller chunks and ensure access to all documentation in your RAG. It's currently in the testing phase, and I’d like to experiment with different models to optimize the process. Once I confirm that everything is working correctly, I can merge the PR into the main branch, and you’ll just need to update Rlama with `rlama update`.