Readit News logoReadit News
rdtsc · a year ago
> We plan to work with the anti-malware ecosystem to take advantage of these integrated features to modernize their approach, helping to support and even increase security along with reliability.

> Providing safe rollout guidance, best practices, and technologies to make it safer to perform updates to security products.

> Reducing the need for kernel drivers to access important security data.

They are being as diplomatic as they can, but it's definitely a slap to CS. Read as "they don't know how to roll things out, they need guidance on basic QA practices, we'll happily teach them...". Then, they list a set of facilities running in user-mode to avoid needing to run as many things in kernel mode.

I would be interested what the water cooler discussion about CS was like inside Microsoft. Especially in teams needed to respond to customers about "Your windows OS is broken, our hospital patients are suffering...".

nimbius · a year ago
this isnt even the first time its happened. Crowdstrike has killed an OS every month for the past four months.

At this point they are a threat actor. if you havent kicked their amateur-hour software out of your infrastructure by now, chances are good senior management and engineering have at least considered it formally.

https://en.wikipedia.org/wiki/CrowdStrike#Severe_outage_inci...

metadat · a year ago
That incident list is damning. Is senior leadership asleep at the wheel, or how can this many incidents possibly happen every 30 days for months on end? If leadership really cared, they'd make sure post-mortems and other best practices are in place to reduce the frequency.

Unfortunately, the executive disconnect isn't new. It's actually uncommon that they care about the reality for end users and customers (which is antithical to my entire ethos, hence why I get paid the medium bucks). Why bother waking up and going to work everyday unless you are contributing in some way to sustaining a better future for everyone? It's actually great for marketing and it's already going to be a tough 100+ years from today for our children, even with our collective care.

P.s. People can be so selfish, it kind of breaks my brain but not really. Have you seen the CO2 emissions visualization from NASA this week? It was a wakeup call for me.

'Tremendous' NASA Video Shows CO2 Spewing from US into Earth's Atmosphere https://www.newsweek.com/nasa-video-carbon-dioxide-co2-emiss...

It's concerning.. and caught no traction.. http://news.ycombinator.com/item?id=41064029

surfingdino · a year ago
Never assume malice where incompetence will suffice. I have worked on teams where we could not get the basics like a test or integration environments signed off for months yet the managers expected us to go to production. Suffice to say production was also not signed off for half a yer and we had to improvise. I wonder is something similar was at play at CS?
hinkley · a year ago
Staffing problems?

Management often sees, “I have a dozen people on this.” When in fact the bus number was three, you laid one off, another quit and the third is sick or having life struggles.

jgalt212 · a year ago
> this isnt even the first time its happened. Crowdstrike has killed an OS every month for the past four months.

Yeah, but doesn't MS have to sign every kernel mode driver? They've allowed Crowdstrike's foot gun to continue to live in the kernel.

whiplash451 · a year ago
Or maybe crowdstrike is dealing with the hardest threats and hence ends up having to rollout stuff rapidly against zero-days?

Not a CS fanboy, but just wanted to suggest an alternative to sheer incompetence

wannacboatmovie · a year ago
From your linked article:

> A Hacker News user claimed that

Nice to see Wikipedia has devolved even further into a dumpster fire in that they are now citing random HN posts as authoritative sources of facts.

f001 · a year ago
I can tell you they’re quite unhappy about it. Have a friend working there who frustratedly says it wasn’t their fault every-time it comes up. Which is quite often and at every social occasion since.
mns · a year ago
I noticed this at work and in some other contexts last week. We weren't affected by this, but most of the people that brought this up, even technical people (other fields, not security or OS or anything like that), think that this was a Microsoft and Windows issue. they all seem surprised to hear that Microsoft wasn't the root cause of this, and they all seem surprised, because no one knows or understands what Crowdstrike is or does.
fishywang · a year ago
but it's kind of their fault? they designed the api that way, they decided what can be done in userland and what must be done via kernel. they at least _allowed_ it to happen every time.
thejournalizer · a year ago
Honestly most of the conversations were about getting everyone back online.
holsta · a year ago
> they need guidance on basic QA practices

Microsoft has a loooong history of botched (security) updates, so I'm not hopeful they can teach Crowdstrike much.

cogman10 · a year ago
And they've learned a lot from it. For example, MS no longer universally deploys updates across the world, they have a slower rollout to avoid just such an incident.
drdec · a year ago
>> they need guidance on basic QA practices

> Microsoft has a loooong history of botched (security) updates, so I'm not hopeful they can teach Crowdstrike much.

Experience is the best teacher

Rinzler89 · a year ago
Do you happen to have a list of that "loooong history" of botched (security) updates?

I can only find a couple of examples after googling, which a bit smaller than a "loooong history" you're talking about, so unless Microsoft is paying Google to delete results, maybe you're mistaken.

SoftTalker · a year ago
Yes, quite the epitome of throwing stones from a glass house.
gnfargbl · a year ago
It didn't read as particularly diplomatic to me. In particular, this paragraph..

> It is possible today for security tools to balance security and reliability. For example, security vendors can use minimal sensors that run in kernel mode for data collection and enforcement limiting exposure to availability issues. The remainder of the key product functionality includes managing updates, parsing content, and other operations can occur isolated within user mode where recoverability is possible.

...was about as close to tetchy as a post like this would ever get. Basically they are saying "there was no good reason at all why CrowdStrike had to put so much code inside the actual kernel." And with the benefit of hindsight, it's a strong point.

ffhhj · a year ago
> there was no good reason at all why CrowdStrike

Their business is corporate spyware to surveil employees, ofcourse they'll use any tactic to make it work, that's the why. And their EULA states there is no liability for the company:

https://www.crowdstrike.com/terms-conditions/

Dirty policies on top of dirty practices.

blackoil · a year ago
MS should have something like Project Zero for Windows applications and drivers. Any app on more than 1-5% PC should be tested and fuzzed and ... And the vendor than pressured into fixing the issues. Even if it is not technically their fault, it is definitely optics problem for MS, half of the world refers it as Windows blue screen issue.
fragmede · a year ago
Raymond Chen: That Time We Bought EVERYTHING at Egghead.

https://youtu.be/6m_Im7J9Iaw?si=q8jLBefEdgm-PrrZ

MBCook · a year ago
> And the vendor than pressured into fixing the issues

How would Microsoft apply pressure? Short of publicly shaming them what power do they have?

lupusreal · a year ago
Why are they being diplomatic, instead of plainly stating their contempt and revoking CS's driver/etc signing keys? Doing so would help to repair the reputational harm that CrowdStrike inflicted on Windows.

Are their lawyers telling them they can't impede CrowdStrike even though CrowdStrike is breaking Microsoft's product? They should do it anyway and dare CS to take it to court so they can publicly humiliate CS by dragging all the dirty details of their incompetence out.

Aeolun · a year ago
People are free to install kernel modules. It shouldn’t be up to microsoft to stop them from doing so.
cratermoon · a year ago
Microsoft tried to push back on vendors wanting kernel access in 2006 <https://arstechnica.com/information-technology/2006/10/7998/>

Microsoft has (somewhat correctly IMNSHO) pointed at the EU agreement that forced them to open the kernel up to third parties as being a factor in the CrowdStrike catastrophe. <https://www.theregister.com/2024/07/22/windows_crowdstrike_k...>

oneeyedpigeon · a year ago
From the latter:

> However, nothing in that undertaking would have prevented Microsoft from creating an out-of-kernel API for it and other security vendors to use. Instead, CrowdStrike and its ilk run at a low enough level in the kernel to maximize visibility for anti-malware purposes. The flip side is this can cause mayhem should something go wrong.

> The Register asked Microsoft if the position reported by the Wall Street Journal was still the IT titan's stance on why a CrowdStrike update for Windows could cause the chaos it did. Redmond has yet to respond.

mrguyorama · a year ago
>Especially in teams needed to respond to customers about "Your windows OS is broken, our hospital patients are suffering...".

The reality is that this experience has been built into working for Microsoft since at least the 90s. The entire point of the compatibility effort they put into Windows 95 and later came from people blaming a Windows upgrade for breaking software that barely worked in the first place.

Windows Vista was known for lots of BSODs, but at least 60% of them were entirely due to nVidia GPU drivers crashing.

thebytefairy · a year ago
It's a little ironic they are taking the high ground on safe rollout practices when they had an Azure/365 outage caused by a bad config at the same time as the CS incident. Though to be fair, it only affected US central.
naasking · a year ago
People wouldn't need CS if Windows was better designed to begin with...
rty32 · a year ago
Care to elaborate?

How would a better designed Windows eliminate the business & compliance need for installing software like CS? And why hasn't that already happened?

I would think Microsoft and CS' customers have an incentive to not have such third party software on their system if possible.

notepad0x90 · a year ago
I must disagree with that take, your last quoted sentence is in response to all the supposed self-proclaimed experts asking "why does it need kernel access", the ones before that is to limit their own liability.

What I've heard from people in the industry is not this silly "oh no, crowdstrike is so incompetent" b.s. that is being spread on sites like HN and reddit but more of an empathic "it could have been us" sentiment. In this write up as well, Microsoft knows they have caused their share of outages, it is a technical write-up but in part, it is to cover their bases for government investigations and lawsuits that will arise from this incident.

And in part, they are also responsible for recovering from third-party driver errors and repeated boot failures caused by faulty drivers.

retrochameleon · a year ago
CrowdStrike blamed their test software, but in the same breath revealed that they haven't been using any canary deployments. The bug that caused all this was present in their kernel driver for a long time.

For being such a large cybersecurity player and deploying updates to 8.5 million devices, their quality control practices are embarrasingly lacking.

michaelt · a year ago
Anyone in the industry could have a bug get through testing.

Some companies could have a severe and readily reproducible bug get through testing.

A few of those companies have a hand-rolled update mechanism, and can accidentally break their ability to roll back a bad release.

A few of those companies are in a position to push a release that breaks not only their own software, but the entire OS.

Very few companies in that position would roll out to 100% of client machines in a single worldwide deployment.

Deleted Comment

freehorse · a year ago
If "it could have been them", then I would like to read such professionals write exactly about how to avoid having a global outage like this again, rather than "showing empathy" with a corporation. Or do we just leave it up to luck, and if "it happens to them too" in a month or year, oopsies? What about which practices could be improved?
gjsman-1000 · a year ago
Microsoft should be sued, for literally having blood on their hands. There was an easily mitigated design flaw in Windows that would have greatly blunted the impact.

https://news.ycombinator.com/item?id=41095788

dmattia · a year ago
I suppose I was expecting something more authoritative here. They confirm that there was an attempted read-out-of-bounds, as CrowdStrike said, but that's not really new information at this point. I suppose we'll need to wait for more detailed analysis from CrowdStrike at some point.

This post explains why security software has historically run in kernel-mode, and really seems to be pushing new technology that Microsoft has that would push security vendors into user-mode (with APIs that attempt to assist with many of the reasons why they have historically used kernel-mode).

Crowdstrike already runs in user-mode on both Mac and Linux (from what I can tell), and it seems like running in user-mode on Windows would significantly lessen the risk of catastrophic failures like a blue-screen-of-death. I know the bulk of the failures here belong to CrowdStrike, but I can't help but think about the fact that Apple kicked security vendors out of kernel-mode a ways back, and that if Windows had done similarly, an issue like this probably wouldn't have been possible. By even offering kernel-mode options to external vendors, I believe Microsoft is creating risk for themselves.

Rinzler89 · a year ago
> I can't help but think about the fact that Apple kicked security vendors out of kernel-mode a ways back, and that if Windows had done similarly, an issue like this probably wouldn't have been possible

Like others already said, Microsoft already tried to do that with PatchGuard in 2006 with the launch of Windows Vista and the likes of Symantec and McAfee complained to the EU about this would harm the sales of their products, so the EU told Microsoft to not do it in 2009[1].

Apple has the luxury of a small market share on the desktop PC space to not attract the attention of the regulators, plus a user base that's used to Apple constantly rewriting the OS, deprecating APIs, switching CPU architectures, etc. without giving a fuck about breaking backwards compatibility or cutting off developers access to OS features their products use and getting away with it, luxuries that Microsoft doesn't have.

IMHO, sticking with Window's default security and not using third party anit-malware has made Windows vastly more secure and rulabile than it was in the days when you'd be looking on installing the likes of Symantec or McAfee for your "protection" which ended up acting like malware after a while throwing dark patterns at you to milk more subsection fees, so as much as it hurts their sales, it's important for the regulators to understand that security is far more important than the regulations they put on Windows for Internet Explorer and Media Player and just like Apple's apps-store, it's sometimes better to let the original product maker handle security and not leave the product open at all points just so some of these bandits can make a living selling security for it. It's like foxes complaining to regulators how chicken wire is a threat to their existence.

[1] https://stratechery.com/2024/crashes-and-competition/

Corrado · a year ago
I work in a heavily regulated industry (healthcare) and I can tell you that if anti-virus products weren't required to pass audits we wouldn't be using them. I'm not super familiar with Windows built-in security anymore but macOS (our platform of choice) is pretty secure without any additional products. In fact, I'm pretty sure that adding A/V "solutions" makes us more vulnerable, not less.
nopcode · a year ago
Microsoft sells endpoint security products and it would be unfair if third party solutions couldn't leverage the same APIs, it makes a lot of sense that a regulator steps in. I'm not aware of Apple selling security products or competing with third party security products.

Dead Comment

michaelt · a year ago
> Crowdstrike already runs in user-mode on both Mac and Linux (from what I can tell),

Crowdstrike provides a Linux kernel module, and expects users to manually install an extra Secure Boot key for it, as part of their corporate laptop setup procedure.

This has always seemed inadvisable to me, but checkbox checkers gotta check checkboxes I guess.

wazzaps · a year ago
They also support (and recommend I think?) an eBPF-based sensor
__MatrixMan__ · a year ago
I agree. Microsoft's core competency has traditionally been backwards compatibility, but if each security vendor can tamper with windows at the deepest level and is allowed to continue explore all of the ways that they can leverage that... What you end up with is a fleet of different windowses, each diverging further with time. It dilutes the benefits brought by investment into the stability of the system because whatever fights are won in one fragment must be refought in others before you can have confidence in the stability of all fragments.

It seems like madness to me.

TillE · a year ago
> pushing new technology that Microsoft has that would push security vendors into user-mode

This doesn't exist. It's briefly hinted at in their conclusion, but right now it's simply not there.

There is no userspace equivalent of filesystem minifilters, ObRegisterCallbacks, etc.

dmattia · a year ago
This is fascinating, thank you for the info! If I am understanding, it would have then been difficult/impossible for CrowdStrike to create a user-mode only sensor without these equivalent APIs.

So I guess I'm not sure I see validity in the claims of those blaming the EU here. It seems as though the EU would have allowed Microsoft to kick users out of kernel-space if they had APIs that allowed making security products in user-space. Like Linux/Mac already appear to have.

whimsicalism · a year ago
The EU requires MS to provide kernel-level access to security vendors due to their crazy anti-compete provisions
dmattia · a year ago
This seems to be only partially true when I read into it. The EU said that Microsoft would need to move their security tools into user-space (or at least to use the same APIs as are available in user-space). If they did that (like Apple has done), they could kick everyone out of kernel-space if they wanted.
GordonS · a year ago
For one thing, being difficult to kill is huge selling point for EDR - move it to user space and it's a lot easier to kill.
pas · a year ago
A kernel-space watchdog (that checks integrity of the image) would be much easier than a filter that updates from the internet.

Sure, the whole thing is definitely a hard problem, but CS fucking up even the most basic QA **and** error handling ... it just shows how ridiculous their whole claim to having super fancy technology is.

Animats · a year ago
So how did this kernel level driver get through WHQL verification? The Static Driver Verifier should have caught this.[1] Do some security vendors get to bypass that? Microsoft is very quiet about that.

That's the sort of thing a negligence lawyer focuses on. Partner at Brown Rudrick: "The most likely legal theory will be one of negligence. [Congress] will drag the guy over the coals, they'll maybe implicate him and his company and put in place a negligence action. There'll maybe be a couple of plaintiffs lawyers who dig up some exceptional theory on negligence, and get some class action lawsuits going. Again, we still don't know all the facts in this case, and there are other dimensions which have not yet been fully explored, including how CrowdStrike had access to kernel level updates on the Microsoft operating system? How come Microsoft didn't have any control over these updates being pushed on their kernel?"

The first two class actions are already starting.

[1] https://learn.microsoft.com/en-us/windows-hardware/drivers/d...

[2] https://www.channele2e.com/analysis/crowdstrike-legal-and-li...

meowkit · a year ago
Because it wasn't an updated driver, it was a malformed blob config.

https://www.youtube.com/watch?v=ZHrayP-Y71Qhttps://www.youtube.com/watch?v=wAzEJxOo1ts

That verification is for interactions with the OS. Its not going to catch driver specific exceptions.

Animats · a year ago
If the driver can dereference nil, it shouldn't pass the Static Driver Verifier.[1]

[1] https://learn.microsoft.com/en-us/windows-hardware/drivers/d...

Deleted Comment

akira2501 · a year ago
> where security and availability are non-negotiable.

Yep. You just have to pretend that everyone who deployed Windows had an actual competitive choice available to them.

> A second benefit of loading into kernel mode is tamper resistance.

I guess availability is negotiable after all.

qsdf38100 · a year ago
> Yep. You just have to pretend that everyone who deployed Windows had an actual competitive choice available to them.

Could you elaborate? How is that related to security and availability being non negotiable?

akira2501 · a year ago
Microsoft's statement implies that people choose Windows because of it's security and availability. Whereas most people end up with Windows because the software they want to run only operates on that single platform.

The security and availability, to the extent they even exist, are clearly not part of the market's decision making process.

squirrel · a year ago
Telling that there’s no mention of eBPF, which is standard on Linux and available on Windows, but hasn’t been brought into the main Windows OS. Static analysis might or might not have caught the Blue Friday bug, but it certainly increases the protection level over the current do-as-you-wish model for kernel modules.
EasyMark · a year ago
Oh I like this breakdown a lot. Fairly technical, links to resources used, flow of debug process, didn’t get lost in a the weeds of details and how clever they were. I wish more debug retrospectives were like this. It seems like you end up with 100 pages of analysis or a couple of vague paragraphs.
userbinator · a year ago
I'm going to be the controversial one here and say that, as bad as CrowdStrike was, the alternative of having only Microsoft be able to decide what people can do is far worse. I've already seen many others trying to use this incident to advocate for digital totalitarianism.
scarface_74 · a year ago
Microsoft as the OS vendor will always be a potential source of updates that crash computers. Now with a third party, you’re adding another level of risk.
superposeur · a year ago
I’m surprised no one has yet noted that Microsoft itself is a chief CrowdStrike competitor.
tonymet · a year ago
i thought crowdstrike provided features that go beyond windows defender. is there another MS product that competes?
superposeur · a year ago
FWIW, here is CrowdStrike’s own comparison of features:

https://www.crowdstrike.com/compare/crowdstrike-vs-microsoft...

abhinavk · a year ago
There is a paid version called Microsoft Defender for Endpoint.