Deleted Comment
Deleted Comment
Deleted Comment
So my understanding is that this is that the database/index that copilot used already crawled this file so of course it would not need to access the file to be able to tell the information in it.
But then, how do you fix that? Do you then tie audit reports to accessing parts of the database directly? Or are we instructing the LLM to do something like...
"If you are accessing knowledge pinky promise you are going to report it so we can add an audit log"
This really needs some communication from Microsoft on exactly what happened here and how it is being addressed since as of right now this should raise alarm bells for any company using Copilot and people have access to sensitive data that needs to be strictly monitored.
This would make it so that even a compromised downstream service wouldn't actually be able to exfiltrate the authentication token, and all its misdeeds would be logged by the proxy service, making post-incident remediation easier (and being able to definitely prove whether anything bad has actually happened).
Deleted Comment
The parent states:
>Not only are the summaries better than those produced by our human agents...
Now, since they have not mentioned what it took to actually verify that the AI summaries were in fact better than the human agents, I'm sceptical they did the necessary due dillengence.
Why do I think this? Because I have actually tried to do such a verification. In order to verify that the AI summary is actually correct you have to engage in the incredibly tedious task of listening to original recording literally second by second and make sure that what is said does not conflict with the AI summary in question. Not only did the AI summary fail at this test, it failed in the first recording I tested.
The AI summary stated that "Feature x was going to be in Release 3, not 4" whereas the in the recording it is stated that the feature will be in Release 4 not 3, literally the opposite of what the AI said.
I'm sorry but the fact that the AI summary is nicely formatted and has not missed a major topic of conversation means fuck all if the details that are are discussed are spectacularly wrong from a decision tracking perspective, as in literally the opposite of what is stated.
And I know "why" the Ai summary fucked up, because in that instance the topic of conversation was about how there was some confusion about which release that feature was going to be in, that's why the issue was a major item of the meeting agenda in the first place. Predicably, the AI failed to follow the convoluted discussion and "came to" the opposite conclusion.
In short, no fucking thanks.
It just has to be as good as a call center worker with 3-5 minutes working off their own memory of the call, not as good as the ground truth of the call. It's probably going to make weirder mistakes when it makes them though.