As you pointed out, the reliance on complex, mission-critical systems is only increasing, and cascading failures are an inherent risk we must address proactively. By learning from organizations like AWS that have successfully integrated Operational Safety into their practices, we can work towards a more resilient and reliable software ecosystem. Let's continue to advocate for making Operational Safety a foundational element in software operations across the industry.
"As of Linkerd 2.15.0, the open source project no longer publishes stable releases. Instead, the vendor community around Linkerd is responsible for supported, stable releases."
Dead Comment
It's just sooo hard to reconstruct the "minimum relevant context" from running infrastructure state matched against a moving target (i.e. the next version you should be upgrading to).