roywiggins (u/roywiggins)

roywiggins commented on 95% of Companies See 'Zero Return' on $30B Generative AI Spend thedailyadda.com/95-of-co... · Posted by u/speckx

sillyfluke · 3 days ago

You also will have saved them all the cost of the AI summaries that are incorrect as well.

The parent states:

>Not only are the summaries better than those produced by our human agents...

Now, since they have not mentioned what it took to actually verify that the AI summaries were in fact better than the human agents, I'm sceptical they did the necessary due dillengence.

Why do I think this? Because I have actually tried to do such a verification. In order to verify that the AI summary is actually correct you have to engage in the incredibly tedious task of listening to original recording literally second by second and make sure that what is said does not conflict with the AI summary in question. Not only did the AI summary fail at this test, it failed in the first recording I tested.

The AI summary stated that "Feature x was going to be in Release 3, not 4" whereas the in the recording it is stated that the feature will be in Release 4 not 3, literally the opposite of what the AI said.

I'm sorry but the fact that the AI summary is nicely formatted and has not missed a major topic of conversation means fuck all if the details that are are discussed are spectacularly wrong from a decision tracking perspective, as in literally the opposite of what is stated.

And I know "why" the Ai summary fucked up, because in that instance the topic of conversation was about how there was some confusion about which release that feature was going to be in, that's why the issue was a major item of the meeting agenda in the first place. Predicably, the AI failed to follow the convoluted discussion and "came to" the opposite conclusion.

In short, no fucking thanks.

roywiggins · 3 days ago

In the context of call centers in particular I actually can believe that a moderately inaccurate AI model could be better on average than harried humans writing a summary after the call. Could a human do better carefully working off a recording, absolutely, but that's not what needs to be compared against.

It just has to be as good as a call center worker with 3-5 minutes working off their own memory of the call, not as good as the ground truth of the call. It's probably going to make weirder mistakes when it makes them though.

roywiggins commented on Say farewell to the AI bubble, and get ready for the crash latimes.com/business/stor... · Posted by u/taimurkazmi

1945 · 4 days ago

The author isn't exactly a thought leader in the space, or really any space for that matter. Opinion worth nothing.

roywiggins · 4 days ago

Have bubbles ever been successfully called by thought leaders?

roywiggins commented on Copilot broke audit logs, but Microsoft won't tell customers pistachioapp.com/blog/cop... · Posted by u/Sayrus

nerdjon · 4 days ago

I am very curious realistically how can they reliably fix this.

So my understanding is that this is that the database/index that copilot used already crawled this file so of course it would not need to access the file to be able to tell the information in it.

But then, how do you fix that? Do you then tie audit reports to accessing parts of the database directly? Or are we instructing the LLM to do something like...

"If you are accessing knowledge pinky promise you are going to report it so we can add an audit log"

This really needs some communication from Microsoft on exactly what happened here and how it is being addressed since as of right now this should raise alarm bells for any company using Copilot and people have access to sensitive data that needs to be strictly monitored.

roywiggins · 4 days ago

It seems to me that the contents of the file cached in the index has to be dumped into the LLM's context at some point for it to show up in the result, so you can do the audit reports at that point.

roywiggins commented on How we exploited CodeRabbit: From simple PR to RCE and write access on 1M repos research.kudelskisecurity... · Posted by u/spiridow

elpakal · 5 days ago

presuming they take the output of running these linters and pass it for interpretation to Claude or OpenAI

roywiggins · 5 days ago

It's very silly that the linter process was handed those environment variables, since it wasn't going to do anything with them and didn't need them.

roywiggins commented on How we exploited CodeRabbit: From simple PR to RCE and write access on 1M repos research.kudelskisecurity... · Posted by u/spiridow

yunohn · 5 days ago

While I fully understand that things sometimes get missed, it just seems really bizarre to me that somehow “sandboxing/isolation” was never considered prior to this incident. To me, it feels like the first thing to implement in a system that is explicitly built to run third party untrusted code?

roywiggins · 5 days ago

It seems to me that they thought the linter would be safe to run as it wasn't meant to actually run untrusted code, just statically analyze it.

roywiggins commented on How we exploited CodeRabbit: From simple PR to RCE and write access on 1M repos research.kudelskisecurity... · Posted by u/spiridow

Nextgrid · 5 days ago

If you actually have a business case for defense in depth (hint: nobody does - data breaches aren't actually an issue besides temporarily pissing off some nerds, as Equifax' and various companies stock prices demonstrate), what you'd do is have a proxy service who is entrusted with those keys and can do the operations on behalf of downstream services. It can be as simple as an HTTP proxy that just slaps the "Authorization" header on the requests (and ideally whitelists the URL so someone can't point it to https://httpbin.org/get and get the secret token echoed back).

This would make it so that even a compromised downstream service wouldn't actually be able to exfiltrate the authentication token, and all its misdeeds would be logged by the proxy service, making post-incident remediation easier (and being able to definitely prove whether anything bad has actually happened).

roywiggins · 5 days ago

In this specific case running linters doesn't even need that much I think, it's never going to need to reach out to GitHub on its own, let alone Anthropic etc. The linter process likely doesn't even need network access, just stdout so you can gather the result and fire that back to GitHub or whenever it needs to go. Just executing it with an empty environment would have helped things (though obviously an RCE would still be bad)