Readit News logoReadit News
modernpacifist · 2 years ago
I don't know about others, but I can't help but smile when I read the detailed series of events in aviation postmortems. To be able to zero in on what turned out to be a single faulty part and then trace the entire provenance and environment that led to that defective part entering service speaks to the robustness of the industry. I say that sincerely since mistakes are going to happen and in my view robustness has less to do with the number of mistakes but how one responds to them.

Being an SRE at a FAANG and generally spending a lot of my life dealing with reliability, I am consistently in awe of the aviation industry. I can only hope (and do my small contribution) that the software/tech industry can one day be an equal in this regard.

And finally, the biggest of kudos to the Kyra Dempsey the writer. What an approachable article despite being (necessarily) heavy on the engineering content.

WalterBright · 2 years ago
As a former Boeing engineer, other industries can learn a great deal from how airplanes are designed. The Fukushima and Deepwater Horizon disasters were both "zipper" failures that showed little thought was given to "when X fails, then what?"

Note I wrote when X fails, not if X fails. It's a different way of thinking.

lloeki · 2 years ago
When I worked in an industrial context, some coding tasks would seem trivial to today's Joe Random software dev, but we had to be constantly thinking about failure modes: from degraded modes that would keep a plant 100% operative 100% of the time in spite of some component being down, to driving a 10m high oven has the opportunity to break airborne water molecules from mere ambient humidity into hydrogen whose buildups could be dangerously explosive if some parameters were not kept in check, implying that the code/system has to have a number of contingency plans. "Sane default" suddenly has a very tangible meaning.
f1shy · 2 years ago
As an engineer I think a lot about tradeoffs of cost vs other criteria. There is little I can learn from nuclear or aviation industry, as the cost structure ist so completely different. I’m very happy that the costs of safety in aviation are very good accepted, but I understand that few people are willing to pay similar costs for other things like, say, cars.
arendtio · 2 years ago
In the context of disasters that happened due to software failures (e.g. Ariane 5 [1]), one of my professors used to tell us, that software doesn't break somewhen but is broken from the beginning.

I like the idea of thinking 'when' instead of 'if', but the verdict should be even harder when it comes to software engineering because it has this rare material at its disposal, which doesn't degrade over time.

[1] https://en.wikipedia.org/wiki/Ariane_5#Notable_launches

WalterBright · 2 years ago
An example of zipper failure in the Airbus incident is when a wire bundle gets cut, all the functions of all the wires in that bundle are lost. Having two or more smaller bundles physically separated would greatly reduce that risk. Certainly, having the primary and the backup system in the same bundle is a bad idea.

On the 757, one set of control cables runs under the floor. The backup set runs in the ceiling.

laydn · 2 years ago
What's fascinating about airplane design for me is not the huge technical complexity, but rather, the way it is designed such that a lot of its subsystems are serviceable by technicians so quickly and reliably, not just in a fully controlled environment like a maintenance hangar, but right on the tarmac, waiting for takeoff.
cedivad · 2 years ago
> When my AoA sensor fails, then what?

crickets, let's just randomise which sensor we use during boot, that ought to do it!

asystole · 2 years ago
I agree in principle, but I don't think industries should be looking at current-day Boeing's engineering practices except for an example of how a proud company's culture can rot from the inside out with fatal consequences.
oefnak · 2 years ago
Are you serious in saying that other industries could learn from Boeing?
sylens · 2 years ago
I think many of us are so used to working with software, with its constant need for adaptation and modification in order to meet an ever growing list of integration requirements, that we forget the benefits of working with a finalized spec with known constants like melting points, air pressure, and gravity.
abid786 · 2 years ago
Completely agree - I think it can go one of two ways. Software is more malleable than airplanes are and that also comes with downsides (like how much time and effort it takes to bring a new plane to the market)
WalterBright · 2 years ago
Airliners face constantly changing specifications. No two airliners are built the same.
mzi · 2 years ago
It took hundreds of subject experts from ten organizations in seven countries almost three years to reach that conclusion.

Here at HN we want a post mortem for a cloud failure in a matter of hours.

modernpacifist · 2 years ago
> Here at HN we want a post mortem for a cloud failure in a matter of hours.

I'll go one further - I've yet to finish writing a postmortem on one incident before the next one happens. I also have my doubts that folks wanting a PM in O(hours) actually care about its contents/findings/remediations - its just a tick box in the process of day-to-day ops.

thaumasiotes · 2 years ago
Something similar that struck me was that, in early February, Russia invaded Ukraine.

And then, I saw an endless stream of aggrieved comments from people who were personally outraged that the outcome, whatever it might be, hadn't been finalized yet at the late, late date of... late February.

mlrtime · 2 years ago
I work at mid tier FAANG, our SLA for post mortems have SLA in the 7-14 day period. Nobody seriously wants a full PM in hours.

They may want a mitigation or RCA in hours, but even AWS gives us NDA restricted PMs in > 24 hours.

bitcharmer · 2 years ago
Apples to oranges
crabmusket · 2 years ago
> To be able to zero in on what turned out to be a single faulty part and then trace the entire provenance and environment that led to that defective part entering service speaks to the robustness of the industry.

And to be able to reconstruct the chain of events after the components in question have exploded and been scattered throughout south-east Asia is incredible.

Gare · 2 years ago
My impressiom was that the defective part was still inside the engine when it landed.
nextos · 2 years ago
Aviation is great because the industry learns so much after incidents and accidents. There is a culture of trying to improve, rather than merely seeking culprits.

However, I have been told by an insider that supply chain integrity is an underappreciated issue. Someone has been caught selling fake plane parts through an elaborate scheme, and there are other suspicious suppliers, which is a bit unsettling:

"Safran confirmed the fraudulent documentation, launching an investigation that found thousands of parts across at least 126 CFM56 engines were sold without a legitimate airworthiness certificate."

https://www.businessinsider.com/scammer-fooled-us-airlines-b...

EdwardDiego · 2 years ago
Admiral Cloudberg has covered a case where counterfeit or EOL-but-with-new-paperworks components were involved in a crash.

https://admiralcloudberg.medium.com/riven-by-deceit-the-cras...

inglor_cz · 2 years ago
I suspect this is precisely what is happening in Russian civil aviation now. No legit parts supplied, so there will be a lot of fake/problematic parts imported through black channels.
bambax · 2 years ago
The Checklist Manifesto (2009) is a great short book that shows how using simple checklists would help immensely in many different industries, esp. in medical (the author is a surgeon).

Checklists of course are not the same as detailed post-mortems but they belong to the same way of thinking. And they would cost pretty much nothing to implement.

Also CRM: it's very important to have a culture where underlings feel they can speak up when something doesn't look right -- or when a checklist item is overlooked, for that matter.

sgarland · 2 years ago
Yes, but they do have one critical failure mode: that the checklist failed to account for something (or that an expected reaction to a step being performed didn’t occur).

I was a submarine nuclear reactor operator, and one of my Commanding Officers once ordered that we stop using checklists during routine operations for precisely this reason. Instead, we had to fully read and parse the source documentation for every step. Before, while we of course had them open, they served as more of a backstop.

His argument – which I to some extent agree with – was that by reading the source documentation every time, we would better engage our critical thinking and assess plant conditions, rather than skimming a simplified version. To be clear, the checklists had been generated and approved by our Engineering Officer, but they were still simplifications.

jacquesm · 2 years ago
Checklists are great if you use them properly: to make sure you remember. Checklists are dangerous when they are used improperly: to replace or shut-down critical thinking.
Simon_ORourke · 2 years ago
A colleague of mine came from a major aviation design company before joining tech and said they were in a state of culture shock at how critical systems were designed and monitored. Even if there are no hard real time requirements for a billing system, this guy was surprised at just how lax tech design patterns tended to be.
Horffupolde · 2 years ago
If 200 people died after a db instance crashed, software would be equal in that regard.
girvo · 2 years ago
To prove this, software that deals with medical stuff is somewhat more like aviation.
mlrtime · 2 years ago
Likewise, in "aviation" when the entertainment system completely fails in a 4 hour flight, there is most like no post mortem at all. They turn it off/on again just like most of us.
mewpmewp2 · 2 years ago
Some people who think this is ideal for any sort of software tech sound they would also want a 3 hour post mortem with whoever designed the rooms, after slightly stubbing a toe.
blauditore · 2 years ago
This kind of makes sense, but it is only possible because of public pressure/interest. Many people are irrationally emotional about flying (fear, excitement etc.), that's why articles and documentaries like this post are so popular.

On a side note, that's also why there's all the nomsense security theater at airports.

jstanley · 2 years ago
> robustness has less to do with the number of mistakes but how one responds to them

It must have something to do with the number of mistakes, otherwise it's all a waste of time!

It's all well and good responding to mistakes as thoroughly as possible, but if it's not reducing the number of mistakes, what's it all for?

krisoft · 2 years ago
> It must have something to do with the number of mistakes, otherwise it's all a waste of time!

Not really. Imagine two systems with the same amount of mistakes. (Here the mistakes can be either bugs, or operator mistakes.)

One is designed such that every mistake brings the whole system down for a day with millions of dollars of lost revenue each time.

The other is designed such that when a mistake happens it is caught early, and when it is not caught it only impacts some limited parts of the system and recovering from the mistake is fast and reliable.

They both have the same amount of mistakes, yet one of these two systems is wastly more reliable.

> if it's not reducing the number of mistakes, what's it all for

For reducing their impact.

colechristensen · 2 years ago
Aerospace things have to be like this or they just wouldn’t work at all. There are just too many points of failure and redundancy is capped by physics. When there’s a million things which if they went wrong could cause catastrophic failure, you have to be really good at learning how to not make mistakes.
WalterBright · 2 years ago
> you have to be really good at learning how to not make mistakes.

Not exactly. The idea is not not making mistakes, it's whatcha gonna do about X when (not if) it fails.

mewpmewp2 · 2 years ago
> Being an SRE at a FAANG and generally spending a lot of my life dealing with reliability, I am consistently in awe of the aviation industry. I can only hope (and do my small contribution) that the software/tech industry can one day be an equal in this regard.

There's a slight difference in terms of what kind of damage an airplane malfunctioning causes compared to a button on an e-commerce shop rendering improperly for one of the browsers. My point is that the level of investment in reliability and process should be proportional to the potential damage of any incidents.

solids · 2 years ago
I agree, and also I enjoy the attitude. While in my profession the postmortems goal is finding who to blame, here the attitude is towards preventing it to happen again, no matter what. Or at least that’s how I feel.
mewpmewp2 · 2 years ago
Your profession? Or you mean your company? Unless it's a very specific profession I would not know, it would usually imply that the company is dysfunctional.
bomewish · 2 years ago
Richard Hipp talks a lot about how SQLite adopted testing procedures directly from aviation.
switch007 · 2 years ago
> I can only hope that the software/tech industry can one day be an equal in this regard

I’d love to be an engineer with unlimited time budget to worry about “when, not if, X happens” (to quote a sibling comment).

But people don’t tend to die when we mess up, so we don’t get that budget.

akarve · 2 years ago
Hard agree. Civil & mechanical engineering have a culture and history of blameless analysis of failure. Software engineering could learn from them.

See the excellent To Engineer is Human in just this topic of analyzed failures in civil engineering.

jdietrich · 2 years ago
To a half-competent machinist or manufacturing metrologist, half a millimetre of concentricity error on a part of that size might as well be half a mile. It's a huge, grievous error that can be seen with the naked eye. You don't get an error of that scale through normal variation, it's a clear sign of a serious problem with your setup.

This part of the article really leapt out at me:

The tolerance for this bore was supposed to be Ø 0.05 mm according to the design drawings, but was changed to Ø 0.5 mm in the manufacturing drawings without explanation. Even so, the non-conformance on the accident hub was between Ø 0.90 and Ø 0.98 (an offset of 0.45–0.49 mm), which should have been flagged by the machine. The CMM records from the accident hub were not retained, so it was not possible for investigators to confirm that the error was actually registered.

The meaning might not be obvious if you've never worked in a machine shop, but it's crystal clear if you have. Many people at that plant knew that they were delivering out-of-spec parts. Everyone who handled that part could have told you at a glance that the counterbore was badly off-centre. Rather than going back to remake the parts, rather than figuring out why the parts were bad, they just went through the motions of QC, shipped them anyway, falsified documentation and discarded evidence. For all the complexity of the analysis, the root cause is blindingly simple - flagrant negligence, concealed by flagrant deceit.

ambyra · 2 years ago
The article said it wasn’t visible because that stub was machined after it was placed in the hub. Which begs the question “why would you weld a tube in place and then finish machining it after?” Maybe it was easier/faster to machine it while it was on a hub. Also, wasn’t there an oil filter that had to go in there? Wouldn’t the oil filter experience interference if the counterbore was offset?

Closing comment: damn I thought people paid more attention when building turbines.

spacecadet · 2 years ago
Yes, but what the poster meant is that it would be and that is confirmed in the images.
gumby · 2 years ago
30 years ago I was in an emergency landing due to engine failure situation (flight attendants take away your shoes, practice crash position, rearrange the passengers etc) and the thing that stuck out the most for me was that everybody did as they were told. No self righteous people; it was clear to everyone why there are flight attendants aboard and that they were key to your survival. The evacuation was orderly, though the follow up was lengthy (e.g. everybody’s passport was still on board).

More recently I’ve seen pictures of people evacuating down the slides with their luggage! Seems incredibly dangerous, not just for the slide experience but in slowing down evacuation. We had no fire in the cabin but what if we had?

Oh yeah, you know the stereotype of the press sticking their camera in your face to see how freaked out you are? It does happen in real life.

MBCook · 2 years ago
You’re not supposed to take anything on the slides. No luggage. No shoes. Just you.

But it is ignored. Which is sad, people could really get hurt.

Your right though the fact as many people comply as they do is kind of incredible given how people act in other situations.

jshier · 2 years ago
Yeah, according to the linked article 5 - 10% of people are injured using the escape slides, which is why they waited for the stairs in this case.
gumby · 2 years ago
They took out per shoes away so that was that. According to a parallel reply, they no longer do that.
dataflow · 2 years ago
Why in the world do you have to take your shoes off before going down the slides? I could understand jackets or jewelry, but shoes?

Deleted Comment

qingcharles · 2 years ago
I was in a hotel fire evacuation once and the stairwells were all blocked because everyone brought every piece of their luggage with them.
gumby · 2 years ago
Disgraceful.
andrewaylett · 2 years ago
People evacuate with their luggage because in times of high stress, we fall back on habit. What do we do when it's time to leave an aircraft? We make sure we have all our belongings with us!

That's just one reason why it's important to listen to the safety briefing, even if you've heard it before. The repeated drill helps us to remember what to do, even when there's added stress.

smdyc1 · 2 years ago
I don't really think the phenomenon is anything other than people selfishly wanting their belongings to be saved over another passenger's life.
woutr_be · 2 years ago
I’ve always wondered what happens after an emergency landing. Do you just kinda sit there and wait for bags and personal belongings to be offloaded? And then wait for another flight out?

Deleted Comment

abrookewood · 2 years ago
Honestly, each and every one of those people should either be charged with reckless endangerments, put on an no-fly list or both. It really pisses me off when I see that. F**ing entitled idiots.
woutr_be · 2 years ago
I remember there was this video of a plane in Russia that was on fire, multiple people died. And you see people walking away with their luggage, can’t help it think people would still be alive if it wasn’t for those who so urgently needed their suitcases.
aunty_helen · 2 years ago
My first job was working at a mro that overhauled engines a bit smaller than the Trent 900s but same principles apply.

I built qa software to digitize the forms and signature process like what’s mentioned in the article as having not correctly been signed off on.

I ate lunch with repair engineers that had dark wells of knowledge about the engines they worked on. They could talk so deep on a subject that lunch break was over and we’d resume conversation over weeks.

There’s a paragraph in this post that hits a few points that are very subtle. The missing sign offs and engineers not knowing the process and and and. I think the criticism of RR is valid here. The qa manager at the mro I worked at was a force of nature. He was feared and uncompromising. He was also the signature that could cause an engine shutdown in flight. I admired this person and still do.

There’s small issues like this that go on every day on every engine model all over the world. There’s thousands of engines flying right now that have little defects that could cause a shutdown. There’s issues that have been identified, signed off as low risk and will be checked next time the engine comes in for overhaul.

There’s engineers out there that see the same fault, a premature cracked pipe, carbon buildup, abnormal corrosion, after a while of seeing this problem, they’ll raise the paperwork which will go up the chain and sit. It may be ignored, taken for information for future designs, identified as something that should be fixed or monitored or the frequency of monitoring increased. Maybe the part life will be reduced or you will be forced to NDT the part at each overhaul.

The cheese wheel concept is great as these systems are so complex there’s always going to be some issues.

As for Qantas, near the end it mentions the plane was repaired at great cost. It’s a source of company pride that they’ve never lost an airframe. They repair planes which are BER (beyond economic repair) just to keep this record.

grecy · 2 years ago
> As for Qantas, near the end it mentions the plane was repaired at great cost

Indeed. Qantas has been ranked the safest airline int he world almost every year since forever [1]

I clearly remember when QF32 happened and everyone was utterly shocked. That simply DOES NOT happen to Qantas.

[1] https://www.forbes.com/sites/laurabegleybloom/2023/01/03/ran...

dabiged · 2 years ago
QANTAS has, for the last 10+, had a CEO who was not part of this culture and did everything he could to drive costs down. He laid off huge swaths of engineers, outsourced key maintenance contracts to the lowest bidder and left the airline with an aging fleet that needs billions spent to replenish. He was recently fired by the board for essentially destroying the reputation of the airline within Australia, with their practice of cancelling flights at short notice, illegally sacking thousands of staff during COVID and taking 100's of millions of dollars from the Australian government to keep staff employed during the airline's grounding during COVID and handing it all to shareholders.

It is a situation very similar to the downfall of Boeing.

radiowave · 2 years ago
Agreed. I've worked in a company that was AS9001 certified, and pretty much the first things a quality auditor would have wanted to look at would be non-conformances and concessions. With than number of missing signatures we'd have been skinned alive, and it would likely have prompted the auditor to then turn the place upside down looking for more problems.

That would then have produced major failings in the audit, if not the outright revocation of the quality accreditation, which I would then expect to be followed up on by an audit from the customer (which in the case of TFA would be Rolls Royce), asking some rather uncomfortable questions of the management, examining whether the inter-company concession process was being adhered to, and perhaps reflecting internally (i.e. within RR) - "Do we think these folks are the right people to be making these parts for us?"

From what I've read here it seems to me that Rolls Royce were astonishingly lax in not riding their subcontractors nearly hard enough, quality wise.

jnsaff2 · 2 years ago
I had a small experience with RR as a company through a contract. Including some time spent in Derby.

The things I saw left me question how any innovation could happen at all in there or why we did not have a much higher rate of fuck-you-shima per year or how the hell plane engines are not exploding daily.

IIRC the B777 engine controllers are still m68k. Discontinued in 1995.

masklinn · 2 years ago
> IIRC the B777 engine controllers are still m68k. Discontinued in 1995.

That seems sensible? You’d need a really compelling reason to rewrite the entire control software and recertify the engine to match. Especially for an engine which has seen no order in 15 years.

ulfw · 2 years ago
I was on the flight and took the picture referenced as "A passenger took this photo in flight, showing turbine fragment exit holes in the upper surface of the wing. (ATSB)" Forced myself on another A380 flight shortly after so I won't lose faith in it's engineering safety.
gumby · 2 years ago
Wow. I was (long ago!) in an engine fire emergency landing situation and though I did take a connecting flight to get home I didn’t fly for a while afterwards. Psychologically, your choice was probably the smarter one.
ghaff · 2 years ago
I've been in a couple situations.

- The main one was that I had a flight from Vancouver to Victoria and the weather was too bad for the helicopter to fly. So we took a prop. On takeoff, some cross-wind hit the plane and we tipped over. My colleague and I who were sitting across from each other thought that was it.

- The other one was my plane was reported crashed when I was visiting my parents for some holiday or other. I got panicked call on drive back from airport.

saagarjha · 2 years ago
Hopefully without incident that time?
ulfw · 2 years ago
Thankfully yes! I lived in Singapore at the time and thought... my goodness. It's a small island. If you end up afraid of flying, what do you do!?

Kudos to the Qantas crew on board as well as Captain de Crespigny and his co-pilots and two check captains. We happened to have a lot of experienced pilot power on board.

A video from that time: https://youtu.be/U8Un2boLZD8

yukkuri · 2 years ago
Good on you!
ogurechny · 2 years ago
The article is complex and well written, but I am a bit perplexed by the victorious tone and never-ending praise of safety. It resembles a sales pitch a bit too much, even though no one is selling anything. Maybe it's unintentional, and being around salesmen just does that to people.

If you are like me, you've probably said “hmm…” to yourself multiple times when certain things were mentioned, because those were things that actually didn't work (that they were left intact really boosts the credibility of the author). From calculation software that had never ever been tested with out-of-ordinary data to the computer keeping the broken engine running. From pure luck with fuel tanks being almost full and unable to explode to absence of any physical kill switch to stop the engine. An hour being generously available to go through ALL the checklists to clear the notifications. An hour of passengers and crew staying on top of the poodle of fuel hoping that nothing would ignite it. Finally, pure randomness in debris flying the way it did. It's not a story of “layers of safety” overlapping, it's a story of “layers of randomness” overlapping.

What would be really interesting is a distribution of outcomes for all possible trajectories of debris, i. e., how (un)lucky they actually were. I guess corporations don't release models like those to the public.

Also, that special chamber for oil filter requiring precise drilling of a perfectly fine pipe seems “ewww” to me. It is not serviceable anyway without reinstalling everything from scratch, as far as I understand, why not make it a single piece?

Game_Ender · 2 years ago
The author is positive because of all the safety layers that existed and staid intact, despite how flawed humans and companies are. The culture of looking at previous accidents like the UA232, where they lost ann engine and ALL controls with it, meant the A380 control system was engineered to take even more damage and it worked.

I do agree though it did not spend enough effort focusing on the areas to improve:

- A computer controlled engine that runs for 60 seconds while on fire, and lets a dangerous part spin too fast. It seems like something that should of been covered ahead of time.

- An engine manufacturing process that is so complex it’s almost impossible to validate.

- A fault management system that only shows you 1 or 2 at a time when you have 40.

genocidicbunny · 2 years ago
> - A fault management system that only shows you 1 or 2 at a time when you have 40.

As long as the system prioritizes the warnings/cautions with the most pressing ones shown first, this is a very good thing. In a high-stress situation, you don't want the pilots to have to deal with figuring out which of the 40 warnings need to be taken care of first.

mixdup · 2 years ago
I suspect the ECAM only showing a couple of failures at a time is a design feature, not a flaw, to prevent overwhelming the crew as they work through them
benhurmarcel · 2 years ago
> the computer keeping the broken engine running

That’s on purpose, you don’t want an automation decide such a drastic move as shutting down an engine. That’s the pilot’s decision.

> absence of any physical kill switch to stop the engine

There is, you shut down the fuel flow with a valve. But that “kill switch” was damaged.

> An hour being generously available to go through ALL the checklists to clear the notifications

Again, pilot decision to do it if time is available. Isn’t it safer that way?

> pure randomness in debris flying the way it did

Well that’s the nature of the failure. It’s like complaining that which HDD fails in a datacenter is random.

> outcomes for all possible trajectories of debris,

Yes it’s not public data, but all positive trajectories are analyzed at the design stage, and structural and systems components are kept segregated accordingly.

ogurechny · 2 years ago
I'm not an idiot (citation needed). I can see that a storm unplugging some imaginary tiny heartbeat cable, which in turn shuts down all the engines instantly, is not how planes should operate. What I don't understand is the approach to defend status quo, and pretend that “randomness is now conquered”.

It seems to me that fixing one complex problem creates 10 other complex problems. They can be rare, but it's ignorant to shift focus from them.

otherme123 · 2 years ago
I've read dozens of Admiral Cloudberg articles, and when you do so you notice a pattern: in old aviation crashes, a single error or a single part failure usually took down a plane with tens of dead bodies. Also the story of how and why the sterile flight deck started in response to some crashes where the pilots were distracted talking. In modern aviation accidents, it seems very unlikely. Even with an engine exploding, the pieces ripping half the cables, a wing, the fuel reservoir, hydraulics, and the airplane is still almost perfectly flyable and landable. Do the same to any car, were nothing is redundant, and lets see how well it performs.

The beauty of it is that everyone in aviation seems eager to learn and build on errors. This event prompted new actions that makes future flying even safer, despite having no victims.

ogurechny · 2 years ago
That's the problem. Even if there were victims, one could've written the exact same article about “flying even safer”.
jnsaff2 · 2 years ago
The victorious tone comes in my opinion (though I'm projecting a bit) from this graph[0].

There has been very systematic and deliberate effort to better aviation safety DESPITE commercial pressures.

The swiss cheese means that there are many more layers of randomness that have to line up. Many of those layers came from previous accidents. Those layers are not random at all. Also none of those layers are hole free.

If that disk had disintegrated differently a potentially different set of layers would have applied. Would it have meant fatalities? Possibly. Would it have instantly blown up the plane? We don't know.

But it is pretty obvious that had many of those layers not existed then the chances of a much more disastrous outcome would have been much higher.

[0] https://upload.wikimedia.org/wikipedia/commons/e/ef/Fataliti...

angry_octet · 2 years ago
And on other aviation systems we do examine multiple failure modes. For example, a round going though the fuselage of an Apache, tumbling and smashing and causing spalling, thousands of simulated trials. Then coupled physics models that look at dozens of unintended interactions, avgas squirting out onto electronics, hot manifolds, etc.

There a whole field of Fault Tree Analysis that looks at how adjacent faults can propagate into unrelated components, then Event Tree Analysis to determine what will happen next. Models that assess robustness against failures even when we have no idea how the failure will occur.

Reliability of cyber physical systems is a constantly evolving field, lots of recent work on concepts like probabilistic model checking, ML for anomaly detection, resistance to cyber attacks, and so on.

ogurechny · 2 years ago
There is more that one way to interpret this history of “triumph of technology and human mind”, yada yada.

This flight can be seen as an expensive (thrilling, entertaining, newsworthy, etc.) experiment on live subjects whose outcome was not controlled by existing tools and procedures.

The same for everything before to which it is compared so lightheartedly.

Please don't forget that your image shows a giant graveyard.

matheusmoreira · 2 years ago
That this plane was maneuverable despite a massive engine explosion that took out 65% of its roll control surfaces is absolutely a victory of the engineers of that aircraft. I was shocked when I read that.

Sheer dumb luck was certainly involved. Those discs could have cleaved the plane in half to say nothing of the humans in its way but somehow missed most of the plane entirely. We definitely need to count every single one of those blessings. It's hard not to be positive when such an episode ended with zero fatalities, zero injuries even.

nojs · 2 years ago
To me it’s impressive because presumably shards of debris cutting through so many distinct parts of the plane at the same time like this is a rare thing compared to more localized failures which the plane would be designed for. Yet all the different failsafes still worked enough to get the plane safely to the ground.
mlrtime · 2 years ago
It is very common and encouraged to add a "What went well" in post mortems. This is not a pat yourself on the back moment. It is to reflect on what failed and what didn't.
Neil44 · 2 years ago
I guess it's a glass half full type situation. There's a lot of universes where that plane did not make it back and a lot of decisions aligned to ensure that it did.
caf · 2 years ago
They do have multiple kill switches to stop the engines, up to dumping a bunch of flame retardant into it which makes it impossible to restart. The problem was that all these systems for the #1 engine were rendered inoperable by the damage caused by the failure of the #2 engine.

Certainly there was a fair bit of luck involved as well.

Stratoscope · 2 years ago
It may be a cliché to call someone a "national treasure", but I would take it a step further for Admiral Cloudberg: she is a world treasure.

Kyra has written so many great articles under her nom de cloud. Trust me, just pick any of them and you will learn something.

https://news.ycombinator.com/from?site=admiralcloudberg.medi...

genewitch · 2 years ago
there's a video podcast, too, which they should put on TV instead of whatever is on there now, overdramatized claptrap
ren_engineer · 2 years ago
there are some crazy talented pilots out there who are able to perform under massive amounts of pressure, United Flight 232 is a more extreme version of this article

https://en.wikipedia.org/wiki/United_Airlines_Flight_232

>Despite the fatalities, the accident is considered a good example of successful crew resource management. A majority of those aboard survived; experienced test pilots in simulators were unable to reproduce a survivable landing. It has been termed "The Impossible Landing" as it is considered one of the most impressive landings ever performed in the history of aviation

plane lost all hydraulics and had to be steered and crash landed using only the engines

mopsi · 2 years ago
Errol Morris made an exceptional documentary about UA232. One of the pilots just looks into the camera and tells the story. https://www.youtube.com/watch?v=nf33RDu_D6M
oh_sigh · 2 years ago
Not just any camera - an Interrotron!
macintux · 2 years ago
That is an amazing story, thanks for sharing it. This part leapt out at me:

> Rescuers did not identify the debris that was the remains of the cockpit, with the four crew members alive inside, until 35 minutes after the crash.

I can't imagine spending a half hour waiting to be rescued, not knowing whether any of your passengers had survived.

Sebguer · 2 years ago
Article by the same author as the submitted one on this: https://admiralcloudberg.medium.com/fields-of-fortune-the-cr...
feerceKitteh · 2 years ago
I’m only aware of one other incident of an aircraft landing after loss of hydraulics.

https://en.m.wikipedia.org/wiki/2003_Baghdad_DHL_attempted_s...