I posted this as a comment the other day in another Boeing and MCAS discussion.
In my research into the topic the saddest bit of information I've seen is the image of the black box data for the flight (the first crash): https://i.imgur.com/WJuhjlO.png
You can see from the graph that in the final minutes and seconds, the pilot put insane amounts of force on the control column (aka the yoke) to try to pull the plane out of the dive - to save the 189 people on board. But no, MCAS was overpowering and lacked the documentation for the pilot to try anything else.
Also interesting to see is the amount of times the pilots bring the nose up, only for MCAS to kick in and force the nose back down. 26 times.
If the pilots were putting that much force on the yoke it is pretty apparent that the pilots really truly wanted the plan to do what they asked of it.
I am curious what scenarios the designers of the plan drove then to not trust that the human in the seat really has no idea what they are doing. Was the ignoring the wishes of the pilot and attempt to prevent a crazed irresponsible and unlicensed idiot from doing something? These are trained humans why does the computer totally ignore there efforts?
If it only were that simple. Airbus in particular has a lot of systems which PREVENT the human from doing things.
One plane actually crashed because the prevention system disabled itself and the pilots believed it was still there to protect them from bad actions on their part:
> caused the autopilot to disconnect, after which the crew reacted incorrectly and ultimately caused the aircraft to enter an aerodynamic stall, from which it did not recover
Interesting that there are two angle of attack sensor readings in the black box image when the major issue with MCAS seems to be that it was relying on only a single sensor?
I really see this as a failure of the Systems Engineering process. With so many people unaware of the impacts of the changes, it’s up to the systems types to have the big picture view and make sure these sorts of things are taken into account.
Especially if as the article says a failure of the AOA sensor on the system would be Hazardous (looks like it was Catastrophic when paired with MCAS in retrospect), that would have made the functional Design Assurance level for this system DAL B, which adds enough rigour not only in the software development process but so much before you even get to that in terms of Safety Assessments and ESPECIALLY change impact analyses when the function changes.
For sure there may have been pressure from management to keep MCAS out of the manual but it’s not really up to he regulatory agency to be experts on the aircraft design, if things are being hidden by the company then I’d consider this bordering on professional misconduct on the parts of the engineers overseeing this work.
I say this as a Professional Engineer working as an aerospace systems engineer.
Thinking about this a little more, I also see a failure here in terms of naming things, which I have noticed in my career can be scoffed at but is so important.
As the article says, the function of MCAS changed and its operational envelope was greatly expanded. What if, internally at least, the new system was referred to as MCAS2?
This is somewhere that things can get political, as was exactly the case with the Max, where they did not want anything to be considered a change to the aircraft type, let alone knowing the MCAS system existed in two majorly differing versions.
This is so true. Naming to me is one of the most important decisions. When I see someone who frets over naming decisions for hours, or will sit down with me and really think through what we're going to name something - I know they get the gravity of the situation. I trust those kinds of people much more for big picture architectural or project lead decisions.
Naming things and communicating changes are all core parts of the Systems Engineering process.
Failing to clearly communicate precisely how big this change was, and not making it extremely clear to all stakeholders what was happening and doing analyses of how these failure modes have changed, is really awful.
Admittedly I have 0 familiarity of the internals of a corporate electrical engineering environment, but this is a logically sound idea that in many other industries is standard. OS 10.02, Apache license v. 2, Archer router xyz Rev 02.
What's the limiting factor here in the case of MCAS?
As a former Systems Engineer who worked on avionics, I would tend to agree; a major change to a system should have at least triggered the integration-level engineers (called the Platform team at my former job) to think hard about potential broad impacts.
It should have resulted in that team actively seeking buy-in, and clear communication that all subsystems comprehended precisely what was changing.
It likely should have triggered some kind of additional scrutiny from the safety organization.
That it didn’t is heartbreaking. It seems like either some common practices were not followed or were rushed.
I can’t see Boeing keeping their CMMI certification level after this news breaks. Certainly some major steps were skipped in the Systems Engineering process.
>Especially if as the article says a failure of the AOA sensor on the system would be Hazardous (looks like it was Catastrophic
From my understanding, this was an intentional decision, as the only way they could certify the airframe without simulator training being required was to feed the MCAS with only 1 sensor.
The only way they could do that is keeping the system overall rated as hazardous, as the Catastrophic rating would require multiple redundancy plus the training.
This can be corroborated from the Australian 60 minutes expose.
>For sure there may have been pressure from management to keep MCAS out of the manual but it’s not really up to he regulatory agency to be experts on the aircraft design, if things are being hidden by the company then I’d consider this bordering on professional misconduct on the parts of the engineers overseeing this work.
Given all the environments I've worked in, including midsize construction projects I'm willing to bet my hand on the fact that some engineers or third party planning bureaus actually did spot these flaws and reported them and then were turned down and ignored, but not before having been told that it's not their decision/problem.
That's always the moment when I'm happy to have documented the decisions.
> There the thing: nobody wants to pay engineers on par with managers
I used to think like that but I learned in time that the core issue is that engineers rarely know how to present their job in terms of monetary benefit during salary negotiations.
i.e. as an automation engineer I led an initiative that saved a previous employer more than 60k€ yearly using automation and optimizing other validation workflows. as principal engineer at my current job I saved them more than 30k€ yearly by replacing a licensed component with an open source one, filling the feature gaps myself outside of work hours.
these thing get noticed, not just the stuff done but the ability of thinking in money, and unlocks the full engineering potential in salary negotiations
I don't know if one could say a failure of the Systems Engineering process. One could also say it is a situation where management, regulatory and market constraints became so unwieldy that there was simply no way Systems Engineering could possibly satisfy the requirements. Sure, it's a failure to push back again this but that seems a bit different.
This whole plane sounds like any ugly hack. They slapped very different engines on an existing airframe. Then, when it inevitability exhibited undesirable behaviour, they tried to paper over the cracks. Then they hid this information from their customers, regulatory agencies and the pilots.
It makes me wonder if there are other issues with the Max that the public doesn't know about yet.
I hope a thorough review of Boeing's internal communications is already underway. If there is proof that these decisions were made for financial gain, they should face criminal charges.
IMO, whether it was greed or just general incompetence, Boeing has demonstrated that they are not responsible enough to self-certify their aircraft.
The sad thing is that we'll probably see this plane fly again by the end of the year, because the millions of dollars in retrofits will still be cheaper than having to scrap all existing planes.
We have no idea what other potentially lethal corners have been cut. What if they go back into service, after several months of retrofitting all of them at Boeing maintenance hangers, and then the following year there are two more deadly crashes from some other overlooked hack.
Really, these planes need to be scrapped. The engines, equipment, seats, etc can all be stripped and used in other planes, but the air frames will need to be recycled and this line of planes should end here.
Even if it doesn't (probably won't), I highly doubt we'll see another generation of 737s. They did survive the rudder problems way back from the... 80s? or 90s? .. So their reputation might recover, but they still can't make the types of planes airlines want and keep that name/certification.
> Perhaps the single most complex, insidious, and long-lasting mechanical problem in the history of commercial aviation was the mysterious rudder issue that plagued the Boeing 737 throughout the 1990s. Although it had long been rumoured to exist, the defect was suddenly thrust into the spotlight when United Airlines flight 585 crashed on approach to Colorado Springs on the third of March, 1991, killing all 25 people on board. The crash resulted in the longest investigation in NTSB history, years of arduous litigation, and a battle with Boeing over the safety of its most popular plane.
In what world is scrapping the airframes due to a (serious) software fault the best and most sensible solution? Do you believe there could be undiagnosed problems with the wings, fuselage, tail, hydraulics, electrics, fueling system, gear, etc.?
>>> The sad thing is that we'll probably see this plane fly again by the end of the year, because the millions of dollars in retrofits will still be cheaper than having to scrap all existing planes.
In the US maybe. In Europe and Asia, I don't think so.
The planes have been grounded pending investigation. Given the speed of investigations and the political ramifications of all of this, I bet they won't be ungrounded anytime soon.
It will fly only to be voted down with customers feet. I know a lot of people who looked on the plane type for the first time during their holiday bookings. This plane is finished.
I think this is false thinking. The 737 is by now a very well tested airframe. Yes, you change some things and there are unexpected results, but the core is sound. Starting over on a completely new design, everything has to be debugged from scratch.
This reasoning reminds me of the tendency to argue "this code needs to be thrown out and rewritten from scratch" among software engineers. It's easy to see the flaws, but not so easy to all the things that have been fixed. See https://www.joelonsoftware.com/2000/04/06/things-you-should-...
This is like taking your MySQL app and running it against MongoDB with a bunch of hacks to translate the SQL into mongo calls.
And then claiming you did all this so you could "avoid a rewrite".
Some architectural decisions are far reaching and leaky. This is not the ideal but it often is the ideal trade off. The alternative is decoupling to the nth degree resulting in a hunk of junk impossible to change that won't get off the ground.
Engines were mounted to make a stable configuration inherently unstable in not so uncommon circumstances. The pitch was used to control the the most critical axis of the plane - something done normally with the rudder. This gave an automated system the authority to override the pilot with a wide margin. The behavior of the plane differs wildly from the old 737 as soon as the surprising narrow corridor of normal flight situation is exceeded.
"Other issues" is what worries me most about this as well. We now know that MCAS is safety-critical, so it's going to be reviewed, but what else got missed/minimized during the design and review process? The fact that Boeing is still trying to minimize the danger of MCAS does not fill me with confidence in their willingness to raise and fix other problems.
They don't seem to have learned from this, which means it's likely to happen again.
> This whole plane sounds like any ugly hack. They slapped very different engines on an existing airframe. Then, when it inevitability exhibited undesirable behaviour, they tried to paper over the cracks. Then they hid this information from their customers, regulatory agencies and the pilots.
Don’t forget, the system they used to paper over the cracks had a single point of failure.
Indeed. The article makes it sound like a foul-up late in the design process, but this plane was corrupt from the very beginning when Boeing set out to dodge the requirement for re-certifying the airframe.
To be fair, it's not like this isn't standard industry practice. Airframe evolve and change, with incremental changes typically only affecting limited areas of impact.
OTOH seeking a non aerodynamic solution to a significant stability degrading airframe modification was IMHO a bridge too far. If the pitch stability (not the pitch feel) problems couldn't be dealt with aerodymancally without busting type certification, than perhaps the whole concept was just too much of a stretch for the venerable old 737.
It surprises me, though, that this couldn't have been engineered out with enhancements to the horizontal stabilizer, such as tip fences or a span increase to offset the lift from the engine nacelles.
If it could have, but software was cheaper, then that's even a darker indictment of Boeing's engineering incompetence.
The FAA needs to grow a pair and declare this type certification and all other certifications older than ~30 years EOLed. That doesn't mean you can't continue operating those planes, it just means you can't retrofit some changes onto a deeply legacy model and still call it the same type.
It's an ugly hack because the design tries to stay within what The Regulator considers the same airplane type.
This in turn is because getting a new airplane type to market would cost some (I assume, facts are welcome!) unholy amount of money and time to get approved.
If you consider that decision "greed" or "a rational response to perverse regulatory incentives" is I suppose a personality test as good as any :)
Those regulatory incentives are there for a reason. They are written in blood for the most part.
It's one thing to scoff and believe "Oh, it's nothing! You're just holding us all back!"...
...Right up until a plane load of freight or people plunges out of the sky.
Free market economics optimizes for one thing, and one thing only as a first,order optimization. That's why we regulate. To ensure that all those nuisance secondary facets are accounted by everyone equally to ensure that market forces natural race to the bottom doesn't compromise the central tenet of air safety; that everyone and everything that goes up, comes back down, safely, controlled, and alive.
The business part, if you think about it, is secondary to the capability to make and safely deploy a new plane. A nice bonus.
Sacrificing the quality of the final product for the sake of looking better on the balance sheets us a cardinal sin. Plain and simple. Based on testimony from inside, that sin seems to be SOP at Boeing for the better part of the last decade.
I think this sort of excuse is always superficially applicable and therefore meaningless. There is no malfeasance that can't be described as "a rational response to perverse regulatory incentives", but "rational" falsely implies a singular possibility chosen objectively. It's incorrect to defend a specific failure as being compelled, because one hasn't explained why this failure and not the near-infinite number of other possible ones.
The day someone very high up the corporate ladder truly gets held responsible for this type of greed & negligence and will be put a way for a long prison sentence would be a good day for society. But I am not holding my breath...
But I hope that the CEO Dennis Muilenburg deep down understands he seriously fuxxed up real bad and every now and then is having a hard time falling a sleep in his $10M mansion knowing that he is ultimately responsible for hundreds of peoples unnecessary deaths due to his failed values as a leader.
Their credibility outside the US is shattered. Their word & opinion caries zero weight anymore since they have proven themselves to be morally corrupt.
His response to the question of "should you step down?" was "no, absolutely not, people's lives depend on me leading this company." No one is that important, and 346 people died unnecessarily on his watch from negligent business processes. He should be forced out.
Your comment makes it sound like you believe there was a single person with nefarious intentions or criminal negligence who chose to put lives at stake in exchange for profits.
That is almost certainly not what happened. It is more likely a system of procedures and policies which failed. The company should take the hit, but unless an investigation reveals otherwise, I see no reason a single individual should take the blame for all of this.
> That is almost certainly not what happened. It is more likely a system of procedures and policies which failed. The company should take the hit, but unless an investigation reveals otherwise, I see no reason a single individual should take the blame for all of this.
Aren't the insane compensations of executives justified by their "great responsibility"? So I think it makes them eventually responsible for what their companies do.
How MCAS slipped through certification process was not the main issue (mistakes in complex products can happen). The main issue was Boeing not caring that MCAS was dangerous even after discovering it.
After the Lion Air crash, it was very apparent to Boeing that MCAS was not safe. This whole article focuses on how MCAS slipped through development+certification - but really even after Boeing new the dangers of MCAS, the MAX still was allowed to fly.
It was hidden and dangerous. Then it was open and dangerous but was still defended by Boeing. Damning.
Great article. But for me there's a huge question being left unanswered, like the elephant in the room:
Why did exactly did the engineers/test pilots feel the need to "enhance" the original MCAS with the new, more powerful version that worked at lower speeds? What did they know? I doubt they did it for the hell of it. And therefore, what has changed that that enhanced functionality is now no longer necessary, and it's fine that MCAS is being returned to its original, more subtle implementation?
These things just don't add up for me and Boeing's constant pronouncements that they did nothing wrong, everything was fine, and now they're fixing it so everything will be even more fine ring very hollow indeed. I would almost like to see everyone involved in this subpoenaed so the public can learn the truth of what, exactly, took place.
Until we have some answers, especially to my main one - what was so bad about the airframe's handling that it was necessary to massively increase the power of the MCAS system, but is now apparently not necessary anymore and it's fine for them to nerf it - I don't think I'll be flying on a MAX.
The answer to why they wanted to “enhance” MCAS is that they wanted it to be certified as a 737 like all previous versions, which means pilots need to be able to fly it exactly like previous 737s without additional training, and a technical hack which “corrects” pilots' actions facilitates that.
Is this only me, or all this one-vs-two AoA sensor talk seems some kind of diversion from the real problem with this plane.
I mean, if one-sensor based MCAS failed twice so early in the life span of the plane model, what is the probability that a two-sensor model will fail pretty soon as well? The math should be simple, we have all data needed: combined hours flown by all planes of the type and number of failures (at least two known, which can help us to estimate a MTBF of the sensor).
If the sensor had just stopped responding, there wouldn’t have been any problem. The planes would keep flying, the sensors would get replaced, and everyone would be fine.
What happened was that the sensor gave erroneous readings. The MCAS system reacted to those erroneous reading and crashes the plane.
With two sensors, you can detect failure. It’s very unlikely that both would fail simultaneously. If they did, it’s very unlikely that both would provide the same erroneous readings.
> It’s very unlikely that both would fail simultaneously. If they did, it’s very unlikely that both would provide the same erroneous readings.
They don't have to fail simultaneously in a flight. And they don't have to fail by internal sensor problems.
There are many cases in which they can simultaneously fail and give same readings, article even mentioned such types of events:
>> That probability may have underestimated the risk of so-called external events that have damaged sensors in the past, such as collisions with birds, bumps from ramp stairs or mechanics’ stepping on them.
And AF447 gives an example when such erroneous readings combined with pilot errors may lead to.
Increased redundancy in airborne systems is very unintuitive. Double redundancy can be more dangerous than having a single system (at least when you're talking engines or sensors anyway).
Triple redundancy is the norm for the specific reason that it's highly likely for symmetrically placed sensors to be prone to failing in the same way not long after each other, but having a third differently placed can keep you flying.
Although there's at least one instance where an Airbus plane had two AoA sensors malfunction at the same time and outvote the last remaining sensor.
This is why critical systems are built with higher degrees of redundancy and graceful degradation of operational envelope in mind.
Training on how to deal with unaided flight is also absolutely essential. Many Airbus accidents where pilot's were caught off guard when the automation that kept them from breaking out of the operating envelope failed.
Long story short; Boeing has put themselves in the unenviable position of having delivered a product in ways that are not only illegal, but deadly, and short of pilots accepting a significant burden in the form of being as good at or better than the MCAS system at this point; a lot of man hours and capital has been expended to end up in a situation where every MAX is in a not inconsiderable risk of being scrapped.
> " it’s very unlikely that both would provide the same erroneous readings. "
You're assuming that faulty sensors will tend to have random output. But since we're talking about a real life mechanism, it seems likely it has some erroneous states that are more likely to occur than others. For instance if the mechanism often fails up against one of it's mechanical limits, the sensor might erroneously read out the limit position every time.
You can't actually say anything about the distribution of failure states for a sensor without evaluating that particular sensor.
> It’s very unlikely that both would fail simultaneously.
Ice, insects, birds, and volcanic ash are all things that tend to cause the pitot and the static tubes to become blocked. When you encounter ice, insects, birds, and volcanic ash, it is often the case that you get multiple simultaneous blockages. Blockages of the various tubes are not statistically independent events in practice.
Well the other issue is the system used that bad sensor data to automatically correct the pilots with no indication of why or how to even turn it off. The pilots knew the system was wrong and couldn't force the place to correct.
You get a reading of 20 on one sensor and get a reading of 34 on the second, which one is correct. To achieve reliability a minimum of five sensors need be used. four primary and one back-up. If three primary agree then system normal. If two primary disagree then switch to backup.
That's why you need 5 sensors or so on something this mission-critical. Enough that you can have a clear democratic majority if one or two goes on the fritz.
> what is the probability that a two-sensor model will fail pretty soon as well
It's actually higher than the probability that a one-sensor version will fail. With two sensors, you have an effective failure if either sensor fails, and the probability of that happening is roughly twice the probability that a single sensor will fail (assuming failures are independent, which is not necessarily a valid assumption).
However, with two sensors you can tell when one has failed (even though you may not know which one it was) and so the consequences of the failure might be less severe.
The problem is: now pilots need to be prepared to fly the plane with a failure sensor, which is to say, without MCAS. To do that, they will need additional training. Avoiding that was the whole point of MCAS in the first place. That's the reason it's taking so long to sort this out. Technically, it's an easy problem to solve. It's the economics that are daunting.
If MCAS is disabled for some reason because of sensor failure, how does that factor into the common type rating? Same goes for if they significantly lower how much input it provides.
The AOA sensors are effectively a consumable, and would undergo regular replacement over the life of the aircraft, the odds of BOTH of them failing at the same moment in the same flight is very very small.
Ok, that makes sense. But are the hours at which they got replaced are on order (or several) of magnitude lower than a two-sensor failure can occur? I hope it is calculated.
I have the impression that people are overlooking the sensors. They are suppose to be very, very, reliable. Two different planes got wrong reading from sensor in the same side, this seems to be a red flag for me. I wonder in what side of the sensor cable the problem is.
They’re not expected to be that reliable. They’re small vanes sticking out the side of the nose, vulnerable to bird strikes. The article mentions hundreds of reported failures over the years. The way to make the system reliable is redundancy.
I want to know why the Boeing flight computer needs pitot tube input at all. Modern ublox GPSes can easily obtain 3D lock on multiple satellite constellations within a minute of booting. Several of these in parallel for redundancy if you are paranoid. Flight controllers on fixed wings don't even need a magnetometer to stabilize. Just GPS path heading. If all else fails, solid state accelerometers are very reliable. Accelerometer only based dead reckoning works great. If all else fails, a single accelerometer should be sufficient to get the plane relatively stable. A barometer can help too, but doesn't seem necessary. These systems can be easily combined with fallback logic to keep the plane in the sky. I just don't understand what is so hard about this for Boeing. I understand airspeed is not the same as ground speed, but this should provide enough information to the flight computer to keep the plane in the air or at least stable.
If all you are using is ground-based position/speed, then you are ignoring the very real possibility that the air you are flying through is not stationary relative to the ground. In actual fact, especially at high altitudes, the air can be moving very fast, and the difference between ground speed and airspeed can be the difference between flying and stalling.
Also, your GPS measurements give you position, direction, and speed, but they don't give you orientation. You would have to have another instrument to feed that into the system (such systems exist).
Unless you limit yourself to flying very near the ground and very near sea level, the speed of an aircraft is more complex than a single number. In fact four different speed numbers are commonly used: indicated airspeed, calibrated airspeed, true airspeed, and ground speed.
* IAS is the raw airspeed reading from the pitot tube.
* CAS is IAS corrected for instrument errors, e.g. if the plane is at an angle that disrupts air flow around the pitot tube.
* TAS is basically CAS adjusted for altitude and air pressure. It’s the aircraft’s speed relative to the air around it.
* Ground speed (or speed over the ground) is TAS adjusted for the wind. This is the number that GPS is going to give you.
IAS and CAS are particularly important for describing performance characteristics - if an aircraft stalls at 100 knots CAS, then it always stalls at that CAS. If you try to describe the stall speed in terms of TAS you go from a single data point to a graph of speed and altitude.
"Why (does) the Boeing flight computer needs pitot tube input at all ?" If there is a strong tailwind, the plane needs a much higher ground speed to avoid stalls.
If these accidents prove anything, it's that we need a computer that takes many different inputs (GPS from the tail and the nose, pitot, barometer, AoA indicator, input from the pilot, engine RPM, etc) and put them into a mathematical model of the airplane before overriding the pilot.
Additionally the AOA sensor - which is basically a weather vane - does not output usable data before the airflow around the airplane has reached certain velocity (it needs air flowing around it). Which is reported... by the pitot tubes.
Boeing engineers did consider [MCAS activation due to failed sensor] in their safety analysis of the original MCAS. They classified the event as “hazardous,” ... could trigger erroneously less often than once in 10 million flight hours.
The incuriosity of all parties to an event categorized as hazardous is astonishing. Boeing says it's a system that's completely transparent to the pilot, and therefore there is no need to describe a failure that they say would be hazardous. What part of that passes a reasonable smell test? It's safe unless it fails, which would be rare, but if it fails people could die? But meh, it's rare so let's not even find out what would happen if it happened?
Boeing must be compelled to show their work for this probability computation, because it is clearly wrong. And both Boeing and the FAA have to answer why there's no mandatory testing of hazardous events. At least what does a simulator think will happen in various states of perturbed sensor data, and how does a pilot react when not expecting such an event?
Oh, and the part about depending on a single sensor is not, per Boeing, a single point of failure because human pilots are part of the system? That's a gem. The pilots are the backup? This poisonous form of logic is perverse.
If the pilots had recieved training, then they could be a backup. So probably whoever did that safety analysis was assuming pilots would know how and when to turn off the system, but the pilots in fact didn't know this system existed at all.
Administrative mitigation like pilots are usually the least preferential ways of mitigating hazards. Humans are often the least consistent, most fallible part of a system. If there were engineering solutions available I would hope Boeing would implement them.
In my research into the topic the saddest bit of information I've seen is the image of the black box data for the flight (the first crash): https://i.imgur.com/WJuhjlO.png You can see from the graph that in the final minutes and seconds, the pilot put insane amounts of force on the control column (aka the yoke) to try to pull the plane out of the dive - to save the 189 people on board. But no, MCAS was overpowering and lacked the documentation for the pilot to try anything else.
Also interesting to see is the amount of times the pilots bring the nose up, only for MCAS to kick in and force the nose back down. 26 times.
All data from this Seattle Times article, which was written before the second crash occurred: [1] https://www.seattletimes.com/business/boeing-aerospace/black...
I am curious what scenarios the designers of the plan drove then to not trust that the human in the seat really has no idea what they are doing. Was the ignoring the wishes of the pilot and attempt to prevent a crazed irresponsible and unlicensed idiot from doing something? These are trained humans why does the computer totally ignore there efforts?
[0] https://en.wikipedia.org/wiki/Aeroflot_Flight_593#Accident
One plane actually crashed because the prevention system disabled itself and the pilots believed it was still there to protect them from bad actions on their part:
> caused the autopilot to disconnect, after which the crew reacted incorrectly and ultimately caused the aircraft to enter an aerodynamic stall, from which it did not recover
https://en.wikipedia.org/wiki/Air_France_Flight_447
So why did it crash?
Dead Comment
Especially if as the article says a failure of the AOA sensor on the system would be Hazardous (looks like it was Catastrophic when paired with MCAS in retrospect), that would have made the functional Design Assurance level for this system DAL B, which adds enough rigour not only in the software development process but so much before you even get to that in terms of Safety Assessments and ESPECIALLY change impact analyses when the function changes.
For sure there may have been pressure from management to keep MCAS out of the manual but it’s not really up to he regulatory agency to be experts on the aircraft design, if things are being hidden by the company then I’d consider this bordering on professional misconduct on the parts of the engineers overseeing this work.
I say this as a Professional Engineer working as an aerospace systems engineer.
As the article says, the function of MCAS changed and its operational envelope was greatly expanded. What if, internally at least, the new system was referred to as MCAS2?
This is somewhere that things can get political, as was exactly the case with the Max, where they did not want anything to be considered a change to the aircraft type, let alone knowing the MCAS system existed in two majorly differing versions.
I believe changing the aircraft type would trigger regulatory events carrying rather gargantuan costs.
Avoiding those seem to be the entire reason for the existence of the 737 MAX.
Failing to clearly communicate precisely how big this change was, and not making it extremely clear to all stakeholders what was happening and doing analyses of how these failure modes have changed, is really awful.
It should have resulted in that team actively seeking buy-in, and clear communication that all subsystems comprehended precisely what was changing.
It likely should have triggered some kind of additional scrutiny from the safety organization.
That it didn’t is heartbreaking. It seems like either some common practices were not followed or were rushed.
I can’t see Boeing keeping their CMMI certification level after this news breaks. Certainly some major steps were skipped in the Systems Engineering process.
From my understanding, this was an intentional decision, as the only way they could certify the airframe without simulator training being required was to feed the MCAS with only 1 sensor.
The only way they could do that is keeping the system overall rated as hazardous, as the Catastrophic rating would require multiple redundancy plus the training.
This can be corroborated from the Australian 60 minutes expose.
>For sure there may have been pressure from management to keep MCAS out of the manual but it’s not really up to he regulatory agency to be experts on the aircraft design, if things are being hidden by the company then I’d consider this bordering on professional misconduct on the parts of the engineers overseeing this work.
That is my conclusion as well.
There the thing: nobody wants to pay engineers on par with managers. Maybe one can not manage something he/she can not understand.
That's always the moment when I'm happy to have documented the decisions.
I used to think like that but I learned in time that the core issue is that engineers rarely know how to present their job in terms of monetary benefit during salary negotiations.
i.e. as an automation engineer I led an initiative that saved a previous employer more than 60k€ yearly using automation and optimizing other validation workflows. as principal engineer at my current job I saved them more than 30k€ yearly by replacing a licensed component with an open source one, filling the feature gaps myself outside of work hours.
these thing get noticed, not just the stuff done but the ability of thinking in money, and unlocks the full engineering potential in salary negotiations
It makes me wonder if there are other issues with the Max that the public doesn't know about yet.
I hope a thorough review of Boeing's internal communications is already underway. If there is proof that these decisions were made for financial gain, they should face criminal charges.
IMO, whether it was greed or just general incompetence, Boeing has demonstrated that they are not responsible enough to self-certify their aircraft.
We have no idea what other potentially lethal corners have been cut. What if they go back into service, after several months of retrofitting all of them at Boeing maintenance hangers, and then the following year there are two more deadly crashes from some other overlooked hack.
Really, these planes need to be scrapped. The engines, equipment, seats, etc can all be stripped and used in other planes, but the air frames will need to be recycled and this line of planes should end here.
Even if it doesn't (probably won't), I highly doubt we'll see another generation of 737s. They did survive the rudder problems way back from the... 80s? or 90s? .. So their reputation might recover, but they still can't make the types of planes airlines want and keep that name/certification.
> Perhaps the single most complex, insidious, and long-lasting mechanical problem in the history of commercial aviation was the mysterious rudder issue that plagued the Boeing 737 throughout the 1990s. Although it had long been rumoured to exist, the defect was suddenly thrust into the spotlight when United Airlines flight 585 crashed on approach to Colorado Springs on the third of March, 1991, killing all 25 people on board. The crash resulted in the longest investigation in NTSB history, years of arduous litigation, and a battle with Boeing over the safety of its most popular plane.
In the US maybe. In Europe and Asia, I don't think so.
The planes have been grounded pending investigation. Given the speed of investigations and the political ramifications of all of this, I bet they won't be ungrounded anytime soon.
This reasoning reminds me of the tendency to argue "this code needs to be thrown out and rewritten from scratch" among software engineers. It's easy to see the flaws, but not so easy to all the things that have been fixed. See https://www.joelonsoftware.com/2000/04/06/things-you-should-...
This is like taking your MySQL app and running it against MongoDB with a bunch of hacks to translate the SQL into mongo calls.
And then claiming you did all this so you could "avoid a rewrite".
Some architectural decisions are far reaching and leaky. This is not the ideal but it often is the ideal trade off. The alternative is decoupling to the nth degree resulting in a hunk of junk impossible to change that won't get off the ground.
They don't seem to have learned from this, which means it's likely to happen again.
Don’t forget, the system they used to paper over the cracks had a single point of failure.
OTOH seeking a non aerodynamic solution to a significant stability degrading airframe modification was IMHO a bridge too far. If the pitch stability (not the pitch feel) problems couldn't be dealt with aerodymancally without busting type certification, than perhaps the whole concept was just too much of a stretch for the venerable old 737.
It surprises me, though, that this couldn't have been engineered out with enhancements to the horizontal stabilizer, such as tip fences or a span increase to offset the lift from the engine nacelles.
If it could have, but software was cheaper, then that's even a darker indictment of Boeing's engineering incompetence.
This in turn is because getting a new airplane type to market would cost some (I assume, facts are welcome!) unholy amount of money and time to get approved.
If you consider that decision "greed" or "a rational response to perverse regulatory incentives" is I suppose a personality test as good as any :)
It's one thing to scoff and believe "Oh, it's nothing! You're just holding us all back!"...
...Right up until a plane load of freight or people plunges out of the sky.
Free market economics optimizes for one thing, and one thing only as a first,order optimization. That's why we regulate. To ensure that all those nuisance secondary facets are accounted by everyone equally to ensure that market forces natural race to the bottom doesn't compromise the central tenet of air safety; that everyone and everything that goes up, comes back down, safely, controlled, and alive.
The business part, if you think about it, is secondary to the capability to make and safely deploy a new plane. A nice bonus.
Sacrificing the quality of the final product for the sake of looking better on the balance sheets us a cardinal sin. Plain and simple. Based on testimony from inside, that sin seems to be SOP at Boeing for the better part of the last decade.
But I hope that the CEO Dennis Muilenburg deep down understands he seriously fuxxed up real bad and every now and then is having a hard time falling a sleep in his $10M mansion knowing that he is ultimately responsible for hundreds of peoples unnecessary deaths due to his failed values as a leader.
which is extremely sad because it was really hard won.
classic story of gutting a gov organization, and regulatory capture.
i know some really good people at faa and the situ makes their blood boil. mine too.
That is almost certainly not what happened. It is more likely a system of procedures and policies which failed. The company should take the hit, but unless an investigation reveals otherwise, I see no reason a single individual should take the blame for all of this.
Aren't the insane compensations of executives justified by their "great responsibility"? So I think it makes them eventually responsible for what their companies do.
Deleted Comment
After the Lion Air crash, it was very apparent to Boeing that MCAS was not safe. This whole article focuses on how MCAS slipped through development+certification - but really even after Boeing new the dangers of MCAS, the MAX still was allowed to fly.
It was hidden and dangerous. Then it was open and dangerous but was still defended by Boeing. Damning.
I think it's a huge issue, but perhaps not criminal. The hiding/lying/etc is a criminal issue in my view.
Why did exactly did the engineers/test pilots feel the need to "enhance" the original MCAS with the new, more powerful version that worked at lower speeds? What did they know? I doubt they did it for the hell of it. And therefore, what has changed that that enhanced functionality is now no longer necessary, and it's fine that MCAS is being returned to its original, more subtle implementation?
These things just don't add up for me and Boeing's constant pronouncements that they did nothing wrong, everything was fine, and now they're fixing it so everything will be even more fine ring very hollow indeed. I would almost like to see everyone involved in this subpoenaed so the public can learn the truth of what, exactly, took place.
Until we have some answers, especially to my main one - what was so bad about the airframe's handling that it was necessary to massively increase the power of the MCAS system, but is now apparently not necessary anymore and it's fine for them to nerf it - I don't think I'll be flying on a MAX.
I mean, if one-sensor based MCAS failed twice so early in the life span of the plane model, what is the probability that a two-sensor model will fail pretty soon as well? The math should be simple, we have all data needed: combined hours flown by all planes of the type and number of failures (at least two known, which can help us to estimate a MTBF of the sensor).
If the sensor had just stopped responding, there wouldn’t have been any problem. The planes would keep flying, the sensors would get replaced, and everyone would be fine.
What happened was that the sensor gave erroneous readings. The MCAS system reacted to those erroneous reading and crashes the plane.
With two sensors, you can detect failure. It’s very unlikely that both would fail simultaneously. If they did, it’s very unlikely that both would provide the same erroneous readings.
Birgenair 301 crashed into the Atlantic because mud dauber wasps built nests in both pitot tubes while the plane was on the ground. It happens.
They don't have to fail simultaneously in a flight. And they don't have to fail by internal sensor problems. There are many cases in which they can simultaneously fail and give same readings, article even mentioned such types of events:
>> That probability may have underestimated the risk of so-called external events that have damaged sensors in the past, such as collisions with birds, bumps from ramp stairs or mechanics’ stepping on them.
And AF447 gives an example when such erroneous readings combined with pilot errors may lead to.
Triple redundancy is the norm for the specific reason that it's highly likely for symmetrically placed sensors to be prone to failing in the same way not long after each other, but having a third differently placed can keep you flying.
Although there's at least one instance where an Airbus plane had two AoA sensors malfunction at the same time and outvote the last remaining sensor.
This is why critical systems are built with higher degrees of redundancy and graceful degradation of operational envelope in mind.
Training on how to deal with unaided flight is also absolutely essential. Many Airbus accidents where pilot's were caught off guard when the automation that kept them from breaking out of the operating envelope failed.
Long story short; Boeing has put themselves in the unenviable position of having delivered a product in ways that are not only illegal, but deadly, and short of pilots accepting a significant burden in the form of being as good at or better than the MCAS system at this point; a lot of man hours and capital has been expended to end up in a situation where every MAX is in a not inconsiderable risk of being scrapped.
The problem is that in order to save tiny amount of money Boing made plane rely on unreliable sensors.
You're assuming that faulty sensors will tend to have random output. But since we're talking about a real life mechanism, it seems likely it has some erroneous states that are more likely to occur than others. For instance if the mechanism often fails up against one of it's mechanical limits, the sensor might erroneously read out the limit position every time.
You can't actually say anything about the distribution of failure states for a sensor without evaluating that particular sensor.
Ice, insects, birds, and volcanic ash are all things that tend to cause the pitot and the static tubes to become blocked. When you encounter ice, insects, birds, and volcanic ash, it is often the case that you get multiple simultaneous blockages. Blockages of the various tubes are not statistically independent events in practice.
You get a reading of 20 on one sensor and get a reading of 34 on the second, which one is correct. To achieve reliability a minimum of five sensors need be used. four primary and one back-up. If three primary agree then system normal. If two primary disagree then switch to backup.
It's actually higher than the probability that a one-sensor version will fail. With two sensors, you have an effective failure if either sensor fails, and the probability of that happening is roughly twice the probability that a single sensor will fail (assuming failures are independent, which is not necessarily a valid assumption).
However, with two sensors you can tell when one has failed (even though you may not know which one it was) and so the consequences of the failure might be less severe.
The problem is: now pilots need to be prepared to fly the plane with a failure sensor, which is to say, without MCAS. To do that, they will need additional training. Avoiding that was the whole point of MCAS in the first place. That's the reason it's taking so long to sort this out. Technically, it's an easy problem to solve. It's the economics that are daunting.
Also, your GPS measurements give you position, direction, and speed, but they don't give you orientation. You would have to have another instrument to feed that into the system (such systems exist).
But yes, it would be a sanity check.
* IAS is the raw airspeed reading from the pitot tube.
* CAS is IAS corrected for instrument errors, e.g. if the plane is at an angle that disrupts air flow around the pitot tube.
* TAS is basically CAS adjusted for altitude and air pressure. It’s the aircraft’s speed relative to the air around it.
* Ground speed (or speed over the ground) is TAS adjusted for the wind. This is the number that GPS is going to give you.
IAS and CAS are particularly important for describing performance characteristics - if an aircraft stalls at 100 knots CAS, then it always stalls at that CAS. If you try to describe the stall speed in terms of TAS you go from a single data point to a graph of speed and altitude.
If these accidents prove anything, it's that we need a computer that takes many different inputs (GPS from the tail and the nose, pitot, barometer, AoA indicator, input from the pilot, engine RPM, etc) and put them into a mathematical model of the airplane before overriding the pilot.
The incuriosity of all parties to an event categorized as hazardous is astonishing. Boeing says it's a system that's completely transparent to the pilot, and therefore there is no need to describe a failure that they say would be hazardous. What part of that passes a reasonable smell test? It's safe unless it fails, which would be rare, but if it fails people could die? But meh, it's rare so let's not even find out what would happen if it happened?
Boeing must be compelled to show their work for this probability computation, because it is clearly wrong. And both Boeing and the FAA have to answer why there's no mandatory testing of hazardous events. At least what does a simulator think will happen in various states of perturbed sensor data, and how does a pilot react when not expecting such an event?
Oh, and the part about depending on a single sensor is not, per Boeing, a single point of failure because human pilots are part of the system? That's a gem. The pilots are the backup? This poisonous form of logic is perverse.