The faulty AOA sensor on Ethiopian flight 302
The NTSB report submitted in response to the third draft covers two areas: Airframe/Systems and Operational and Human Factors. Today, I’m going to focus on the airframes and systems. The BEA report will give us more insight into the Operational and Human Factors, as it is based on their analysis of the Cockpit Voice Recording.
The aircraft was given take-off clearance from Addis Ababa Bole International Airport at 08:37 local time (05:37 UTC). The captain was the Pilot Flying. The take-off roll seemed normal and the aircraft lifted off.
The Boeing 737 MAX8 has two independent Angle of Attack (AOA) sensors, one on each side, manufactured by Collins Aerospace.
Forty-five seconds after take-off, with the aircraft just 50 feet above the ground, the flight data recorder shows that the AOA values coming from the left AOA sensor were completely wrong. Just as in the crash of Lion Air flight 510, the left-side AOA sensor had a much higher AOA value than the right-side sensor, in this case about 59°.
The left stick shaker activated. The airspeed and altitude values generated from the air data on the left side began to deviate from the values on the right side. On the flight deck, IAS DISAGREE and ALT DISAGREE alerts appeared alongside a Master Caution alert.
The radar controller identified the flight on radar and asked the flight crew to climb to FL340 (34,000 feet) and turn right when able flying directly to the RUDOL beacon.
As the Boeing 737 MAX banked to the right for the turn, the flaps were fully retracted. This was when the Maneuvering Characteristics Augmentation System (MCAS) activated, trimming the nose down for nine seconds. The captain, as Pilot Flying, attempted to counteract this by pitching up with a force of over 90 pounds. He then used the electric trim to stabilise the aircraft for the climb. Five seconds later, the MCAS activated again, pitching the nose down.
This was the beginning of the fatal accident sequence. It is clear that the left-side AOA sensor failed, submitting erroneous AOA data which triggued the MCAS activation.
As a part of the investigation, the NTSB asked Collins Aerospace to investigate how and why the left AOA sensor failed in the first place.
Collins Aerospace conducted various tests to determine what might have caused the failure. They considered multiple scenarios of AOA sensor failures, including manufactoring defects, internal component failures, heater failures, non-impact structural failures and AOA vane impact failures. They used vibration, acceleration and flight simulation tests with physics-based performance modelling. The data from the Flight Data Recorder (FDR) could not be reconciled with an internal failure of the sensor; however, it was consistent with previous birdstrike incidents where the impact had caused the AOA vane to separate.
The data from the left AOA sensor matched the right sensor until 44 seconds into the flight, when it suddenly diverged. At the same time, the heat panel showed a LEFT ALPHA VANE failure, which Collins Aerospaced concluded was consistent with the circuit failing after the vane had broken off.
The most likely scenario was that a bird of at least half a pound (226 grams) had struck the vane at 170 knots (the estimated speed of the aircraft), snapping the vane off at the hub of the sensor. Collins recreated the scenario in wind tunnels, which showed the same near-instantaneous change in the left AOA value and resulting data dynamics.
The draft report from the Ethiopian Airplane Accident Investigation Bureau (EAIB) confirmed that the left-hand AOA sensor failed but did not probe the cause. Similarly, the draft acknowledged the Collins report but did not give any details. There was no mention of the fault tree analysis which indicated a partial AOA vane separation. Instead, the draft and the final report stated that the EAIB could not comment on the analysis by Collins Aerospace as the EAIB were not present for the testing.
Collins Aerospace insisted that they had extended repeated invitations for the EAIB to participate in the simulation testing and that they offered live demonstrations of their tests. When the EAIB declined, representatives from Collins Aerospace travelled with the NTSB to Addis Ababa to present their analysis to the EAIB, offering the investigators the chance to question their methods and findings. Finally, they submitted a full report which included the full details of their methodology and results.
It is difficult to comprehend why the EAIB have dismissed this report, given the level of detail that Collins Aerospace was able to provide to show that a bird strike was the probable cause for the left-side AOA sensor failure.
On the 26th of November 2018, four months before the crash, a Boeing 767-300 suffered an engine failure on take-off from Addis Ababa Bole International Airport. The probable cause was a large bird, either a Steppe or Tawny Eagle, which had been ingested into the engine. These eagles are common in the area around the airport and weigh between 2 and 3.4 kilos. The EAIB investigation into the incident concluded that there was an ongoing risk of bird strikes at the airport and recommended that the airport authority “take practical measures to minimize/eliminate bird hazards around the airport so that arriving and departing flights are conducted safely withoutany human and material loss.”
Nevertheless, the EAIB dismissed a bird strike as the most likely cause of the sudden erroneous data from the left-hand AOA sensor. The draft report states that the runway area was searched, but there was no wreckage from the AOA vane and no evidence of a bird. This, the investigators conclude, indicates that there was no birdstrike.
The NTSB argued at the time that this conclusion was questionable. The search did not take place until eight days after the crash, during which time the evidence could have been lost. More importantly, the FDR data shows that the Boeing 737 Max was positioned above taxiway D when the bird strike would have occurred. However, as per the draft report, the search area did not include Taxiway D and its surrounds.
The NTSB recommended that an additional contributing factor be added to reflect the inciting incident of the crash:
the airplane’s impact with a foreign object, which damaged the AOA sensor and caused the erroneous AOA values
However, in the final report, the EAIB dimissed the NTSB commentary again. Instead, it concluded that “unexplained electrical and electronic faults” caused the sensor heater to fail which in turn caused the erroneous data from the AOA sensor.
On the day of the accident, the weather data is clear: there was no risk of icing, thus the failure of the heater would not and could not cause the AOA sensor to generate faulty data. In addition, Collins Aerospace found no electrical fault that was consistent with the actual circumstances of the crash. The NTSB has since reiterated that the evidence shows that the failure was caused by “separation of the AOA sensor vane due to impact with a foreign object, which was most likely a bird.”
The NTSB contended that there were three further areas in which the EAIB draft misrepresented the situation.
The third draft suggests that there may have been problems with the manual electric trim system but, the NTSB wrote in their commentary, the analysis does not offer any evidence for this. Boeing and the NTSB maintain that the trim data from the FDR is consistent with flight crew input. Boeing assessed possible trim system failures but was unable to find any scenario which matched the FDR data.
The EAIB draft also highlighted that the design changes to the 737-8 MAX were not official and were not approved by the FAA. The NTSB argue this point, although it is perhaps a case of semantics.
Boeing’s changes to the MCAS design were official in March 2016 and were communicated to the FAA in July 2016, as described in the NTSB System Safety and Certification Specialist’s Report, section H, Certification of the MCAS Implementation and Function.
Boeing applied for and, in March 2017, was granted an amended type certificate for the 737-8 MAX. For further information, see the NTSB System Safety and Certification Specialist’s Report.
It is hard to defend Boeing and the FAA’s behaviour in this regard, even if we accept the argument that the EAIB should have described the fiasco more clearly. I believe that the NTSB’s apparent point-scoring here is meant to lay the groundwork for their next argument.
At the time, Ethiopian Airlines faced criticism for the flight crew’s reaction to the MCAS intervention. After the crash of Lion Air 610, the existence of the MCAS and the correct method for handling this type of failure had been distributed as a matter of urgency. It seemed that the Ethiopian Airline flight crew should have known how to deal with the situation based on the high degree of attention given to MCAS at the time.
The EAIB draft report defended Ethiopian Airlines, saying that they had requested more information about the MCAS after the Lion Air accident but that Boeing did not respond appropriately.
However, this doesn’t appear to be true. The NTSB argue that all 737 MAX operators received several bulletins and an Emergency Airworthiness Directive from the FAA in November 2018.
Then in December, Boeing responded directly to Ethiopian Airlines request for more information, with specific guidance for the operations manual bulletin and the checklist. This response underscored the key information for dealing with MCAS interference.
As is stated in the OMB, ‘If uncommanded stabilizer trim movement is experienced in conjunction with the erroneous AOA flight deck effects, the instructed course of action is to use the Stabilizer Cutout switches per the existing [runaway stabilizer] procedure.
The NTSB noted this on the draft, commenting that the EAIB’s defense of Ethiopian Airlines, in this instance, was misleading at best, as they clearly had received detailed information from Boeing.
In the final report, the EAIB were more specific, explaining that the MCAS would not have activated if the flaps were extended; however the Flight Crew Operating Manual bulletin and the airworthiness directive from the FAA did not mention this detail.
This is correct, although the use of the flaps to counteract the MCAS was used by Lion Air 610: the first officer retracted the flaps without comment, which led to the MCAS activating again.
However the NTSB argue that the multi-operator message on the 10th of November 2018 clearly stated that extending the flaps would stop the MCAS from activating and that this message was included in the appendix of the final report.
These three items do not directly affect the conclusion of theEAIB report or alter the probable cause and contributing factors. However, I think the NTSB are highlighting these statements in the report as a part of the groundwork for their operational analysis. Their point is that although the activation of the MCAS on Ethiopian flight 302 caused the accident, the aircraft would have been recoverable if the crew had responded to the emergency using the information distributed in November and December 2018.
We will focus on the operational and human factors issues in more detail in the coming weeks.
References:
- Aircraft Accident Investigation Report B737- MAX 8, ET-AVJ (hosted on the BEA website for reference)
- US Comments on Draft Aircraft Accident Investigation Report Ethiopian Airlines Flight 302
- National Transportation Safety Board Response to Final Aircraft Accident Investigation Report Ethiopian Airlines Flight 302
Whew, that is a very sorry state of affairs. Great writeup, as always. Thank you.
I had not known that this aircraft had two AoA sensors. ISTR discussion of the fact that one sensor was standard, but a second was an extra-cost item — but not that the default software couldn’t ignore one sensor if it suddenly started reporting wildly unusual data. Wikipedia’s MCAS article mentions software that was an additional option, beyond the 2nd AoA sensor, that would at least have flashed a light to warn that the two sensors disagreed; the discussion of fixes says there’s now non-optional software to disconnect MCAS if the readings differ by more than a few degrees, which is the way the system should have been designed originally.
It would be tempting to slam the people who wrote the MCAS code if I hadn’t also had to deal with ridiculous demands from management back in my coding days.
This article suggests to me that Ethiopian can’t be a safe airline because the government that supervises it can’t or won’t make it fix deficiencies. I can’t judge whether the refusal to acknowledge local problems relates to the government’s refusal to acknowledge its failings in the war in Tigray; an administration that is trying to defend the indefensible might be less inclined to listen to criticism in any area.
No, the system should’ve been designed with 3 sensors, and if one is out of whack, MCAS should discard its data, use the remaining two sensors, and stay active. Remember MCAS is a safety-critical system in certain flight states, which is why it was introduced in the first place.
The problem originates with Boeing hiding this from the FAA because their sales department didn’t want the FAA to set more thorough training requirements for pilots who switch from an older 737 model to the MAX. If the engineers had added a third AoA sensor to the aircraft, someone at the FAA would have wondered why. It was money above safety: Boeing engineers are perfectly capable of designing safe aircraft, but weren’t allowed to.
This incident, as well as the similar Lion Air crash, illustrates the extent to which modern commercial aviation has become dependent on, I would nearly write “addicted to”, computerization and automation.
The hard fact remains that Boeing considered it okay to obscure certain design features for commercial reasons. The 737 Max has certain features that make it a very different aircraft from previous 737 versions. One of them being much bigger engines that have been moved forward on the wings, affecting the behaviour of the aircraft under certain circumstances.
In order to compensate, a system called MCAS was installed, making totally automated corrections in the flight control systems.
The AOA sensors are a critical component. And, as Sylvia shows here, a failure can have catastrophic consequences.
Pilots in both, and many other cases where no aircraft were involved in an accident, were virtually kept out of the loop. And thus, they were not trained to cope with an AOA failure.
This all in order to save millions of $$$ so that type commonality with preceding B737 versions will be ensured.
The way I read it, a pilot who has previously qualified on a 737-100 or -200 series only needs a differences course to transfer to any of the subsequent versions of the Boeing 737.
What strikes me is that flying (large) commercial airliners nowadays requires very different operational skills.
Very high emphasis is now put on training pilots how to deal with an increasingly complex piece of machinery. And with it, a deterioration in the basic skill-set. The old adagio “Avigate, navigate, communicate” is now a bit dubious. Avigate, the ability to control the aircraft in an emergency situation, is not really an option when the crew are trained to rely on whatever remains operational, to select the correct path in order to regain control and all that in a situation where all sorts of warnings are flashing on the panel and the automation changes from one “law” to another, not seldom changing the behaviour of the aircraft with it. So, TOGA may under certain circumstances not engage the autothrottle. A pilot, under high pressure to react in a life or death situation in an aircraft travelling at, perhaps, 200 mph, may forget that in case “A” the autothrottle will move to full power, but in case “B will remain at flight idle. To work out how to “Aviagate” may take all, or even more time than available. That realization in itself will certainly raise the pressure and further deteriorate the cockpit CRM and work flow.
In many cases, the cabin crew remained highly professional and saved many lives, where the cockpit crew were overwhelmed
In my days KLM pilots were mainly trained by a government-run flying academy, the RLS. Training included aerobatics, on North American Harvard, Saab Safir and, I believe, later on the Cessna 150 Aerobat (correct me if I am wrong). This school, still trains pilots as the KLM Flying Academy but I am not sure if aerobatics are still on the menu.
My son, who has a PhD in computer science, is amazed about the very poor design of modern cockpits. In his opinion, the extreme complexity is accompanied by a very poor ergonomic design.
There is a lot of superimposing one system, and procedure, on another. Caused not so much by any attempt to arrive at a logically integrated design, rather to make the whole work without imposing too many changes that will required a different way of crew training, and with it certification of new aircraft. Plus: crew trained on an (imagined) new cockpit, based on a totally different design philosophy, won’t come “on stream” for at least another year, and will not be able to operate older (current generation) aircraft.
This is, of course, a line of thought that may not be feasible for different reasons, but it is interesting to hear to view of a computer scientist.
I’ve commented extensively on past posts, about the multitude of system and software design issues, and the flawed business/safety decisions.
However, even with the fatally flawed implementation, a clear “MCAS applying Trim” warnings on both pilots main displays, and a pair of failsafe “MCAS Off” switches (with an “MCAS Disabled” on the main displays so you know it is disabled), could have saved the situation.
However, that would have required documentation and retraining, so was apparently viewed as not acceptable…