stonepresto (u/stonepresto)

stonepresto commented on I used o3 to find a remote zeroday in the Linux SMB implementation sean.heelan.io/2025/05/22... · Posted by u/zielmicha

lyu07282 · 3 months ago

Even a rudimentary exploit can be a significant time investment, it is absolutely not common practice to develop, publish or to demand such exploits from researchers to demonstrate memory corruption vulnerabilities. Everyone thinks they are an expert in infosec its so funny.

stonepresto · 3 months ago

Well, in another subthread the author said he did in fact make a crashing PoC. I guess it depends on the customer's standards, but I would say in the vast majority of cases (especially for nuanced memory corruptions in which the ability to make something exploitable depends on your ability to demonstrate control of the heap) a crashing PoC is the bare minimum. In most VDPs, BBPs, or red team engagements you are required to provide some sort of proof to claim, otherwise you'll be laughed out of the room.

I'm curious which sector of infosec you're referring to in which vulnerability researchers are not required to provide proofs of concept? Maybe internal product VR where there is already an established trust?

stonepresto commented on I used o3 to find a remote zeroday in the Linux SMB implementation sean.heelan.io/2025/05/22... · Posted by u/zielmicha

seanheelan · 3 months ago

I honestly hadn’t anticipated someone would think I hadn’t bothered to verify the vulnerability is real ;)

Since you’re interested: the bug is real but it is, I think, hard to exploit in real world scenarios. I haven’t tried. The timing you need to achieve is quite precise and tight. There are better bugs in ksmbd from an exploitation point of view. All of that is a bit of a “luxury problem” from the PoV of assessing progress in LLM capabilities at finding vulnerabilities though. We can worry about ranking bugs based on convenience for RCE once we can reliably find them at all.

stonepresto · 3 months ago

I'm too much of a skeptic to not do so lol. Great post though overall, don't let my assholery dissuade you! I was pleasantly surprised that it was actually a researcher behind the news story and there was some real evidence / scientific procedure. I thought you had a lot of good insights into how to use LLMs in the VR space specifically, and I'm glad you did benchmarking. It's interesting to see how they're improving.

Yeah race conditions like that are always tricky to make reliable. And yeah I do realize that the purpose of the writeup was more about the efficacy of using LLMs vs the bug itself, and I did get a lot out of that part, I just hyper-focused on the bug because it's what I tend to care the most about. In the end I agree with your conclusion, I believe LLMs are going to become a key part of the VR workflow as they improve and I'm grateful for folks like yourself documenting a way forward for their integration.

Anyways, solid writeup and really appreciate the follow-up!

stonepresto commented on I used o3 to find a remote zeroday in the Linux SMB implementation sean.heelan.io/2025/05/22... · Posted by u/zielmicha

seanheelan · 3 months ago

Hi, author here. Yes, I built a PoC. Yes, it triggered a KASAN report/crash.

stonepresto · 3 months ago

Thank you! I'm really happy to hear you did that. But why not mention that in your blog post? I understand not wanting to include a PoC for responsible disclosure reasons, but including it would have added a lot of credibility to your work for assholes like me lol

stonepresto commented on I used o3 to find a remote zeroday in the Linux SMB implementation sean.heelan.io/2025/05/22... · Posted by u/zielmicha

lyu07282 · 3 months ago

Are you saying you want PoCs that trigger a crash from the use-after-free or you would only be satisfied by full on RCE PoCs?

stonepresto · 3 months ago

PoCs should at least trigger a crash, overwrite a register, or have some other provable effect, the point being to determine:

1) If it is actually a UAF or if there is some other mechanism missing from the context that prevents UAF. 2) The category and severity of the vulnerability. Is it even a DoS, RCE, or is the only impact causing a thread to segfault?

This is all part of the standard vulnerability research process. I'm honestly surprised it got merged in without a PoC, although with high profile projects even the suggestion of a vulnerability in code that can clearly be improved will probably end up getting merged.

stonepresto commented on I used o3 to find a remote zeroday in the Linux SMB implementation sean.heelan.io/2025/05/22... · Posted by u/zielmicha

stonepresto · 3 months ago

I know there were at least a few kernel devs who "validated" this bug, but did anyone actually build a PoC and test it? It's such a critical piece of the process yet a proof of concept is completely omitted? If you don't have a PoC, you don't know what sort of hiccups would come along the way and therefore can't determine exploitability or impact. At least the author avoided calling it an RCE without validation.

But what if there's a missing piece of the puzzle that the author and devs missed or assumed o3 covered, but in fact was out of o3's context, that would invalidate this vulnerability?

I'm not saying there is, nor am I going to take the time to do the author's work for them, rather I am saying this report is not fully validated which feels like a dangerous precedent to set with what will likely be an influential blog post in the LLM VR space moving forward.

IMO the idea of PoC || GTFO should be applied more strictly than ever before to any vulnerability report generated by a model.

The underlying perspective that o3 is much better than previous or other current models still remains, and the methodology is still interesting. I understand the desire and need to get people to focus on something by wording it a specific way, it's the clickbait problem. But dammit, do better. Build a PoC and validate your claims, don't be lazy. If you're going to write a blog post that might influence how vulnerability researchers conduct their research, you should promote validation and not theoretical assumption. The alternative is the proliferation of ignorance through false-but-seemingly-true reporting, versus deepening the community's understanding of a system through vetted and provable reports.

stonepresto commented on Comcast says hackers stole data of close to 36M Xfinity customers techcrunch.com/2023/12/19... · Posted by u/thunderbong

RajT88 · 2 years ago

"Starting today" - there's no date on that notice. But the URI suggests it was authored on the 15th. Apparently not released for 4 days?

stonepresto · 2 years ago

The part of the prompt that suggests its the 15th of December is a GET param, which just means wherever this link was retrieved from is where that date is coming from.

The PDF could have been authored at any time.

Looks like the created date embedded in the metadata is as follows:

2023-12-18T21:21:19.000Z

Created with MS Word. But even that isn't definitive.

stonepresto commented on National Crime Agency response to Meta's rollout of end-to-end-encryption nationalcrimeagency.gov.u... · Posted by u/webmaven

beej71 · 2 years ago

It's a good point. But someone would eventually see it in the dev tools and Meta's credibility would be shot forever.

It's an open question if that impacts their bottom line at all.

stonepresto · 2 years ago

Agreed. I think their bottom line probably is built off of how it would affect their user base. My hunch is given the immensity of the user base, it wouldn't cause enough of a significant exodus for Meta to care either way. But that's speculation, not sure if that can be backed up with evidence from past events.

stonepresto commented on National Crime Agency response to Meta's rollout of end-to-end-encryption nationalcrimeagency.gov.u... · Posted by u/webmaven

rascul · 2 years ago

> I doubt end-to-end encryption blocks that.

Isn't it supposed to?

stonepresto · 2 years ago

What's to stop them from having hooks in their app that can bundle up all the decrypted messages, re-encrypt, and phone home? Certainly it wouldn't be default behavior, but its possible and would allow them to answer warrants.