We validated the obvious thing - the best approach depends on your use case: 1. keyword/grep/regex/bmi25 works best (fastest, cheapest, most accurate) when you know exactly what you're looking for. 2. semantic search works best with unstructured data when you're not exactly sure what you're looking for. 3. text2sql works best when you have a few pre-defined queries with limited joins the agent can use to fetch structured data. 4. knowledge graphs works best when you need to find info across unstructured + structure data that go beyond semantics similarity (i.e. find arxiv reports by x author the discuss novel knowledge graph methods published in the past 3 years but don't mention neo4j).
So - we ended up building a simple add/search api that predicts where the data should come from, and what the user needs this week/today and cache it. It's accurate and it's fast.
To solve this, everyone is engineering context, trying to figure out what to put into context to get the best answer using RAG, agentic-search, hierarchy trees etc. At Papr we tested almost every option that exists. These methods work in simple use cases but not at scale. That's why MIT's report says 95% of AI pilots fail, and why we're seeing a thread around vectors not working.
Instead of humans engineering context, we've built a model to predict the right context. Our model ranks #1 on Stanford's STARK benchmark that measures retrieval in complex real-world queries (not useless needle in a haystack benchmark). It's also super fast because it's predicted in advanced, which is essential for a ton of use cases like voice conversations. Try it out on papr.ai, our open source chat app or use papr's memory APIs to create your own experiences with papr.
We've also developers a retrieval loss formula and show that Papr's memory APIs get better with more data. Not worse like other retrieval systems today. A similar pattern to LLMs - the more data the better.