That’s basically the only important context. If you can’t deliver that, it doesn’t matter how well thought through, extensible, or scalable it is.
So this is just a marketing op-ed?
I really don't get what drives companies like Dropbox to throw their carefully built up reputation under the bus like this.
Google did the second type and then got caught lying about bids. Doing so is really tempting but also generally fraud.
6 years ago an intern in chromeOS using Gem5 found an optimization in how Android’s ART emits code that would help all in-order arm cores(a-5x) to the tune of 10%. A simple fix. He prototyped it. It worked. Fix was a dozen lines. It never shipped…
>Ed Francis studies the evolution of military technology over at his YouTube channel, Armoured Archives. But this week, Francis says five years’ worth of research stored on Google Drive has become inaccessible thanks to Google’s automated error.
Francis says the file in question was simply a collection of data on various tanks for a coming video on how military vehicles have evolved across historical conflicts. But Google’s automated systems deemed the file a terrorist threat, resulting in a complete lockdown of his YouTube, GMail, and Google Drive accounts.
https://www.vice.com/en/article/qj8yj7/google-locks-historia...
Having a shitty algorithm kill your whole Google account with no way to reach a human to fix the problem is one thing.
Having a shitty algorithm report you to the police for taking pictures of your child's first bath is a bridge too far.
* Can't remember who said it but it was at a town hall this year
I don't understand why you think tracking user access rights would be infeasible and would not scale. There is a query. You search for matching documents in your vector database / index. Once you have found the potentially relevant list of documents you check which ones can the current user access. You only pass the ones over to the LLM which the user can see.
This is very similar to how banks provide phone based services. The operator on the other side of the line can only see your account details once you have authenticated yourself. They can't accidentally tell you someone else's account balance, because they themselves don't have access to it unless they typed in all the information you provide them to authenticate yourself. You can't trick the operator to provide you with someone else's account balance because they can't see the account balance of anyone without authenticating first.
A basic implementation will return the top, let's say 1000, documents and then do the more expensive access check on each of them. Most of the time, you've now eliminated all of your search results.
Your search must be access aware to do a reasonable job of pre-filtering the content to documents the user has access to, at which point you then can apply post-filtering with the "100% sure" access check.