I know people many suffered with this and in some cases the suffering was real. For most people it was just an inconvenience.
And I am a bit ashamed to say, the panic from users users who could not get to Office 365 or their Windows PC cannot boot still brings a bit of a smile to my face. I think there is a German word for this :)
A relative worked from my house for those days because their internet was also broken. We worked for 2 different companies, me on a Linux Workstation, him on Windows 11 spending time with his help desk with me "translating".
One lesson learned, the Help Desk people should be trained to avoid Tech Words when dealing with people who's workflow is just Email and Excel. Both the Help Desk and my relative had nothing but a high level of frustration dealing with each other.
If frustration or suffering of other people brings a good smile on your face, just because their employer has different IT policies compared to yours, that's not really cool or something to brag about, rather contrary.
No real insights here, just high level explanation and empty quotes from folks.
> The root cause analysis (RCA) means that a CrowdStrike programmer(s) did not check their inputs before pushing an update to the CrowdStrike Falcon Windows Sensor in production.
How is it this isn’t just automated and / or that update automatically run in a VM or something and when it crashed the rollout prevented?
These sort of things happen occasionally despite all the possible safeguards you can set up. It's just gonna be something else you didn't predict and frankly the wrong thing to focus on.
The real problem this exposed is that somehow it's apparently the law for corporations to introduce a single point of failure into their entire IT system with absolutely zero fallback capacity or any workarounds at all. It's just entropy meeting stupidity.
It's like it was mandated for planes to have a built in off switch that can be triggered remotely and people then blame the company for accidentally triggering it, crashing every single one and killing everyone, instead of asking why the fuck do all planes have an off switch?!
Not everything, there are companies that do not own, or use any single windows machine for this exact reason waiting to happen. Was this worse than msblaster or? I remember that, painful.
The article didn't even touch on the social factor of insurers and other types of middle managers uniformly pushing everyone to install an unnecessary RCE-based piece of software to check off their finely crafted bullet points that demand centralized legibility at the expense of everything else. When Microsoft does something that knocks some large amount of systems out, it's at least understandable why such a monoculture exists. But this state of affairs was entirely self inflicted. And an article in CACM should really be addressing these factors, because everybody already knows Crowdstroke itself was supremely incompetent. The question is not how Crowdstroke can prevent this type of software bug from happening again, but rather how we as a society can prevent the creation of more centralizing companies like Crowdstroke, especially ones that leverage the regulatory apparatus to drive adoption of their top-down version of "security".
And I am a bit ashamed to say, the panic from users users who could not get to Office 365 or their Windows PC cannot boot still brings a bit of a smile to my face. I think there is a German word for this :)
A relative worked from my house for those days because their internet was also broken. We worked for 2 different companies, me on a Linux Workstation, him on Windows 11 spending time with his help desk with me "translating".
One lesson learned, the Help Desk people should be trained to avoid Tech Words when dealing with people who's workflow is just Email and Excel. Both the Help Desk and my relative had nothing but a high level of frustration dealing with each other.
That would be ‘schadenfreude.’
Deleted Comment
Deleted Comment
> The root cause analysis (RCA) means that a CrowdStrike programmer(s) did not check their inputs before pushing an update to the CrowdStrike Falcon Windows Sensor in production.
How is it this isn’t just automated and / or that update automatically run in a VM or something and when it crashed the rollout prevented?
It’s not a new concept…
The real problem this exposed is that somehow it's apparently the law for corporations to introduce a single point of failure into their entire IT system with absolutely zero fallback capacity or any workarounds at all. It's just entropy meeting stupidity.
It's like it was mandated for planes to have a built in off switch that can be triggered remotely and people then blame the company for accidentally triggering it, crashing every single one and killing everyone, instead of asking why the fuck do all planes have an off switch?!
Dead Comment