If there is one problem I have to pick to to trace in LLMs, I would pick hallucination. More tracing of "how much" or "why" model hallucinated can lead to correct this problem. Given the explanation in this post about hallucination, I think degree of hallucination can be given as part of response to the user?
I am facing this in RAG use case quite - How do I know model is giving right answer or Hallucinating from my RAG sources?
with that many tables, you might want to use Views: https://cube.dev/docs/reference/data-model/view
In my use case, it's going to be exposed to various kind of stakeholders and there will be versatility of user queries. I can't pre-create views/aggregations for all scenarios.
With a semantic layer, you get the added benefit of writing queries in JSON instead of raw SQL. LLM's are much more consistent at writing a small JSON vs. hundreds of lines of SQL.
We[0] use cube[1] for this. It's the best open source semantic layer, but there's a couple closed source options too.
My schema is - 90+ Tables, 2500+ Columns, well documented
From your experience, does Cube look a fit? My use cases will definitely have JOINS.
Something on similar lines which many may link, Research Rabbit - https://www.researchrabbit.ai/
I'd love to spend my time working on such articles when I'm retired :)