Are your RAG models hallucinating? Maybe the issue isn't the LLM, but the retrieval phase.
I built RAGsplain (ragsplain.com) to help debug this. You upload documents (PDFs, audio, YouTube links), choose a retrieval method (semantic, keyword, or hybrid), and see the exact chunks of context passed to the model — match scores included.
Turns out: when retrieval is bad, even the best model can’t think straight.
It’s open and free to use. Would love feedback from folks building RAG pipelines.
For a quick overview on how it works and to get past those initial steps, this demo video of ragsplain might help:
https://youtu.be/BwuA_e7Xn74