Alitalia flight 404: The result of a dozen small decisions

15 Apr 22 8 Comments

Last week, we looked at the key events that led to the Alitalia flight 404 crash. A few points from the previous post:

  • Maintenance replaced both radio receivers in the aircraft and the crew were aware they needed testing.
  • On approach, the two receivers gave conflicting results. At least one of the receivers appears to have had a fault.
  • The captain decided to trust VHF-NAV 1 and did not do any troubleshooting to verify that the receiver was working.
  • The first officer was not confident; the relationship between the two pilots seemed to be one of professor and student rather than a team working together.
  • The captain chose to override the first officer’s decision to go around. Instead, he encouraged the first officer to “catch the glide”, that is, to level out and intercept the glideslope again.
Alitalia AZ404 approach sequence as reconstructed in the accident report

None of this clearly explains why they crashed into the side of Stadlerberg mountain, perfectly lined up with the runway but 1,250 feet too low.

The air traffic controller suddenly realised that Alitalia flight 404 had disappeared from radar. When the flight crew did not respond to his call, the air traffic controller contacted the next inbound flight.

“Do you have an aircraft in sight about two miles ahead of you?”

His response must have chilled the controller to the bone. “There is a fire on the ground but there is no traffic in sight.”

The Swiss Federal Aircraft Accidents Inquiry Board (now part of the Swiss Transportation Safety Investigation Board) investigated the crash with a team of over 80 Swiss investigators with support from an Italian team and representatives from the NTSB and FAA in the US.

They could not explore the wreckage as the fire, kickstarted by over five thousand kilograms of fuel, continued to burn until the evening of the following day. Rumours abounded: some witnesses were sure that the aircraft was on fire before it crashed into Stadlerberg; others reported that it had exploded in the air.

However, it soon became clear that the aircraft was much too low and slightly off course on its approach. The investigators initially focused on the ILS for runway 14. They quickly established that all of the airport navigational aids were working as intended. Yet, on the cockpit voice recorder, the captain, a highly experienced pilot, could be clearly heard saying “glide path capture”.

The aircraft had a King receiver KNR 6030 in the No. 1 navigation system and a Collins 51RV-2B in the number 2 navigation system, both of which had passed all self-checks. The technicians at Milan Linate had tested them both on the local VOR and the localiser. However, the position of the parked aircraft had made it impossible to test the ILS glideslope signals.

The recordings from the cockpit made it clear that both crew used the same navigation receiver, Nav 1, after noticing that they did not match. The receiver and connectors were salvaged from the wreckage. Of course, everything was severely damaged, but there was no clear sign of anything wrong with the receiver.

Investigators reconstructed the flight using another Alitalia DC-9 with the same equipment, following the flight path of flight 404 to 4,000 feet above mean sea level. They found that until the glideslope was intercepted, the glideslope needles all showed in the fully UP position, that is, they were out of sight. No one could look at that display and believe they had intercepted the glideslope.

The AAIB considered the idea that a mobile phone, at the time bulky and blocky, could have caused the interference, but they could not recreate the theoretical interference. Besides, no portable telephones were found in the wreckage; the only similar device was an electronic calculator.

In another reconstruction, an IFR-equipped helicopter repeatedly followed the approach path of the accident aircraft. The helicopter could follow the doomed flight path to the crash site safely. Again, every approach led to glideslope indications in the fully UP position. One of the instruments showed a warning flag from 6.8 nautical miles out; the other did not show a warning flag.

And yet, the captain had clearly reported that he had the glideslope much further out than could possibly be true. Investigators were baffled.

Six months later, there was an unexpected breakthrough. An Alitalia pilot reported seeing a glideslope indication where there was none. While the flight was on approach, the instrument showed a centred localiser indication and there was no warning flag. The King KNR 6030 receiver was removed and tested. They quickly verified the centred glideslope indication and took apart the receiver to find the cause. They traced the frozen indication, a failure mode that should not be possible in modern receivers, to a cold solder in the deviation driver circuit.

Alitalia contacted the investigators immediately. There was no way to prove that this happened to Alitalia flight 404, but it was the first sensible explanation for why the captain thought he saw the glideslope before there was one.

However, it was not the first time that a frozen indication with no warning flag had been reported. In 1984, seven years before the accident, Douglas Aircraft Company had issued an “All Operator Letter” that frozen glideslope or localiser indications without a warning flag had been reported in two navigation receiver groups. The Collins 51RV-2B, the other unit in this aircraft, should have been included. In 1975, Collins recommended that all Collins 51RV-2B units be modified to monitoring status. By the time of the letter, almost ten years later, Douglas assumed that all existing units had been modified, and thus, Alitalia was not affected and did not need the warning. The investigators concluded that, at the time of the accident, “[Alitalia operating crews] including the accident flight crew, were unaware of the possible false indications.”

Both VHF navigation units installed in Milan Linate had the same flaw: if the receiver produced no output signal, the indication was shown as “on course”. This means that if there were a short circuit or a signal break between the receiver and the indicator, it would show as “on the glideslope” rather than that the connection had been broken.

This explains the captain’s statement. The Nav 1 instruments were centred as if the DC-9 was on the glideslope and there was no warning flag; thus, he believed they were on the glideslope. Even so, there seems to have been plenty of opportunities for him to revisit that assumption. The question then became: why did he continue to believe that they were on the glideslope?

To start, the recordings showed that the Ground Proximity Warning System (GPWS) never sounded in the cockpit. The reconstructions established that the sink rate and terrain closure rate were never excessive for a final approach; thus, those warnings would not be triggered. However, the GPWS did have an alert to signal that the aircraft was below the glideslope; it seems likely that the same short circuit in the receiver caused the GPWS.

However, in the reconstructions, the investigators established that the excessive sink rate and the excessive terrain closure rate would not sound, as the sink rate was within limits and the flight profile and terrain would not trigger an alert. It’s not clear why the GPWS didn’t warn that they were below the glideslope; it’s possible that the same short circuit in the receiver disabled that function. As they never found the GPWS unit in the wreckage, it is impossible to be sure.

The VHF navigation units which had been installed in Milan Linate both had the same disadvantage: if the receiver did not produce an output signal, the indication is “on course”. Thus, if there were a short circuit or a signal break between the receiver and the indicator, it would show as on the glideslope rather than that the signal had been broken. The investigation concluded that the captain was looking at a “frozen” centred glideslope indication.

Notably, after Collins had recommended the receivers be upgraded in 1975, the work still hadn’t been done by November 1990. Investigators also reported that it was difficult to transcribe the conversation in the cockpit, for two reasons. One is that the pilots were not wearing their headsets (which had microphones attached), so the CVR recording was dependent on the area microphone, which also picked up general noise and radio chatter. The second was what they described as the “inferior technical quality” of the recording equipment.

The DC-9 was equipped with “drum-pointer” altimeters, an old-fashioned altimeter (even then) that shows the altitude in 1,000-foot steps on the drum with a round scale that shows the 100-foot steps, numbered one through nine.

Drum-pointer altimeters are challenging to read. For some altitudes, the pointer can actually obscure the number on the drum. Drum-pointer altimeters are notorious for being misread, with multiple studies showing that the thousand indicator, in particular, takes more time to perceive and that pilots often need to look at the scale more than once to process the information properly.

A picture starts to emerge of an airline that was wholly compliant but nevertheless slow to upgrade or renew equipment, slowly building up a collection of outdated technology, which over time increased the chances of pilot error.

Similarly, the captain and the first officer were generally compliant during the flight but nevertheless showed multiple indications of poor CRM. This was not a sign of the times. Crew resource management (CRM) had received attention in aviation since the late 1970s after David Beaty wrote “The Human Factor in Aircraft Accidents”, and by 1990 had been widely adopted as a part of pilot training. Although the crew only discussed operational matters while in the cockpit, it is quite clear from the transcript that the captain saw the first officer as a student rather than as a team member. He held forth on issues that were not relevant to their flight, like the possibility of circling to land on runway 28 after being informed that runway 14 was in use. At the same time, he missed the fact that the first officer was looking at the chart for runway 16 instead of runway 14 and thus was checking the wrong approach information. The first officer showed a lack of confidence on several occasions. Instead of dealing with it, the captain continued to feed more and more information to the first officer, including testing him on radio failure procedures on final approach.

This lack of CRM came to a head when the first officer decided to go around. At the time, the captain had the discretion to overrule that decision, which would no longer be acceptable in modern aviation. However, by then, it was crystal clear that they should go around. The investigators believed that the captain might have misread the drum altimeter to say that they were 1,250 feet above ground level, rather than 250 feet. Even if that were true, both pilots had expressed uncertainty about their location and they knew that the navigational aids needed testing and might not be functioning correctly. They did not have the runway in sight. A few seconds after the captain stopped the go-around, the radio altimeter warning sounded: final confirmation that they were much too close to the ground. Neither pilot reacted, completely focused on “catching the glide”.

The approach controller’s final instruction had been to descend to 4,000 feet for a heading of 110° and cleared for the approach. The published procedure stated that they should not descend below 4,000 feet until they were established on the localiser and eight nautical miles out.

The DC-9 turned to heading 150°, but the controller didn’t question this. Once the flight crew were established on the ILS, the controller’s radar vectoring was complete. The controller didn’t ask them to call established and the flight crew never made that call. Controllers at the facility admitted that this was common; it was busy enough that they rarely chased if the flight crew did not report.

Meanwhile, the captain believed that the flight had “intercepted” at 11.5 nautical miles out. The DC-9 entered its final descent, flying below the glideslope until impact.

Although the Minimum Safe Altitude Warning (MSAW) was standard in ground-based radar in the US in 1990, Zürich airport did not have this function, so there was no alert on the radar that the aircraft was flying below the glide path. The approach controller never thought to offer position information to the flight; he presumed they knew where they were. This is the third “almost”: ATC was compliant and every call was correct. However, a single position report or a simple request for the crew to report established on the ILS could have been enough for the controller to interrupt the captain’s complacency.

That same night, other inbound aircraft reported that they had the approach and runway lighting in sight throughout their final descent. However, the accident crew saw nothing and didn’t consider that seeing nothing was a warning signal.

While considering their descent, a Precision Approach Path Indicator (PAPI) would have been useful: a set of red and white lights used to confirm the glide slope as the aircraft approaches the runway.

By 1990, most airports had PAPI lights installed at the runway threshold which would be visible from a few miles out. But at the time, runway 14 didn’t have PAPI or even the older VASI (visual approach slope indicator) lighting.

If the airport had installed a precision approach path indicator at the threshold or a minimum safe altitude warning system as a part of their radar, then the crew would have been concerned that the PAPI lights were not in sight.

Instrument Approach Chart for Zurich

Even a simple light at the top of Stadlerberg would have helped orient the pilots. However, lights were not required on obstructions that were over 5.5 kilometres from the airport, so no light had been installed.

The report took over two years to complete. I do not have a copy of the initial conclusions of the draft report, but we can intuit a lot from the NTSB commentary.

The NTSB highlights the report’s conclusion that the aircraft would have cleared the mountain if the first officer had been allowed to continue his go around. However, the cancelled go-around was then not mentioned as a cause.

The NTSB also argues a cause that named the specific navigation receiver. Instead, the response says, the conclusion should emphasise that the crew did not troubleshoot the issue, instead simply picking a receiver to trust. Further, the NTSB was not convinced by the conclusion that the pilots were not aware of the possibility of incorrect indications in the navigation equipment in use.

It appears to us that the crew of I-ATJA was aware of the possibility of incorrect indications in front of them on the accident flight. What we believe is that Alitalia pilots in general were not aware of the fact that the instrument indications could be incorrect with no ‘off’ flags showing.

 Finally, the NTSB argues that the final report overemphasises the drum-pointer altimeter within the context of this accident sequence.

NTSB did not believe that a captain with more than 10,000 hours of flying time (most of which probably involved drum-pointer altimeters) would overlook the distinctive crosshatched scale only visible below 1,000 feet on his altimeter.

Instead, the NTSB argued that both pilots were aware of their altitude throughout the approach, just very confused about their distance from the airport. The repeated observations regarding distances and whether or not they’d passed the outer marker seem to bear this out.

The final report has revised conclusions which take these arguments into account, at least to an extent.

The accident was caused by:

  • False indication of VHF NAV unit No 1 in the aircraft.
  • Probable altimeter misreading by the PIC.
  • No GPWS warning in the cockpit.
  • Pilots not aware of the possibility of incorrect indications in the NAV equipment in use (without flag-alarm).
  • Inadequate failure analysis by the pilots.
  • Non-compliance by the pilots with basic procedural instructions during the approach.
  • Unsuitable cooperation between the pilots during the approach.
  • COPI’s initiated go-around procédure aborted by the PIC.
  • The Approach Controller not observing the leaving of the cleared altitude of 4000 ft QNH before the FAP.

The recommendations are a bit easier to read. The list starts with banning specific equipment (NAV receivers that do not have monitoring, drum pointer altimeters and GPWS that cannot detect the NAV receiver failure). The recommendations continue with two crucial changes to flight procedures which are now a part of everyday aviation. First, the crew would not change all navigation instruments to one receiver without any troubleshooting, avoiding the confirmation bias as we saw in the cockpit in this flight. But the broader change is that once a pilot, either pilot, initiates a go around, it cannot be overridden.

For more detail, you can read the scan of the accident report or Flight Safety’s breakdown of the key points of the report from 1994.

The captain made two grave mistakes: he blindly trusted a single receiver to give him the correct information and he overruled the first officer’s decision to go around. However, we can also see the impact of years of small decisions by the Airline and the Air Traffic Control Unit and the Airport to save money rather than upgrade systems and invest in modern safety features.

Category: Accident Reports,

8 Comments

  • So why did the Captain blithely assume that NAV1 was correct in NAV2 was in error?

    The captain himself had reported NAV1 as malfunctioning on the prior leg. Although it’s true that the technicians replaced both NAV1 and NAV2, but wouldn’t you be suspicious when you had a disagreement.

    I don’t understand on what basis he chose one display over the other with apparently no Consideration of how to determine which was correct.

    • We see what we want to see. If I expect to capture the glide slope, and one instrument shows me I have, and the other one shows me I haven’t, I want the one that shows me what I expect to see to be correct, because that affirms my world view.
      It’s a variant of the plan completion bias (also called get-there-itis), and it can trip anyone up.

      Same thing with ATC: pilot navigates as if on the glide slope, so the controller doesn’t bother to cross-check the altitude, because everything the controller did notice was as expected.

      It’s hard to train for this.

  • Certainly the PIC believed that he was on the glideslope and if one of the NAV showed what he expected, that fed into confirmation bias.

    But what information led him to believe he was on the glideslope? Was the alleged “frozen” centred glideslope indication sufficient, even though other indicators pointed to a location inconsistent with being on the glideslope? Apparently, yes.

    Interesting human factors case.

    • Part of the problem which I’m not sure I managed to make clear is there was a general believe that the warning flag would appear if the receiver wasn’t working. This was proven to be untrue but the captain (and pilots in general) still believed that if the flag didn’t appear, the receiver must be working. The primary instruments showed that he was on the glideslope and he was happy to believe them.

  • Can someone explain to me how my phone can show latitude, longitude, and altitude, but planes don’t have a GPS sensor that uses those 3 pieces of information to analyze where the plane is, and whether it’s in danger, using a database that holds all pertinent information.
    If Google Driving maps map ridiculously small roads, why don’t we have a global airport database for all non-secret airports that includes all of the information anyone would need to land, VFR or IFR.
    It seems ridiculous to me that a simple system doesn’t monitor the flight in real time using GPS data, it could literally say, “You are approaching a mountain, at your current speed you will impact the mountain in 37 seconds if you do not pull hard right immediately.” It could be programmed to deliver status reports every time something significant changed. We’re working on quantum computers, but airlines are offline and on their own? That just doesn’t make sense to me.
    Can someone please explain why this hasn’t happened yet? Does someone like Elon Musk have to come along and make flying significantly safer with a simple gadget?

    • As was pointed out in the previous article, this accident happened 3 decades ago; avionics now are different.

      However, the path of improvement is very slow, and for good reason. Given the recent 737-MAX mess, how confident are you that any manufacturer’s claims about the suitability of a GPS unit should be believed? It’s all programming — and as a software engineer of 20 years experience, I know how crufty a lot of programming is. (And based on a course I took, we’re unlikely to have provably correct programming; all the developers can do is be as careful as possible, and hope that QA can imagine and develop tests for situations we haven’t thought of.) Putting it another way: people don’t die when a quantum computer crashes, because they aren’t in critical paths — and won’t be for a long while. And they USUALLY don’t die when a phone misbehaves — but there has been at least one local case where someone did, because all of the pretty tech in your phone is seen as bells and whistles rather than mission-critical.

      And don’t get me started on Google; their maps are crowdsourced, with all the random errors one would expect from such a method. Local example: a dangerous connection between two major highways near Boston was replaced by a safer connection (separating a merge and a split/crossover); for months, Google said the only way to get between these two highways was back roads connecting other exits, because they’d deleted the old connection from the database and not added the new one. Do you want to rely on somebody noticing there’s a new tower near a flight path and submitting a personal report (and hoping the Google people actually understand the report and enter it correctly)? Factoids are easy; reliable data is hard. Oh yes, and note that Google has 2-dimensional data; adding topography would require far more work than making the existing maps did.

      Gadgets are like solutions: “For every problem there is a solution that is simple, easy, — and wrong.” Considering Musk’s track record, he’s one of the last people I’d trust to come up with something usable; have you noticed how many auto-piloted Teslas have been crashing? (Not to mention how little progress he’s making; Wikipedia says he promised in 2016 that there would be fully-autonomous cars in 2017. Notice: there aren’t any yet.) And the economics are also hard: somebody has to decide to spend the money on developing and testing something that may or may not be usable, and then get the airlines to pay for it. (cf, again the MAX mess, which was made worse because the default system had a single sensor — extra sensors cost extra money.)

      And add to all this that once a new device has been ~~perfected, pilots have to be trained to pay attention to it — at a time when they’re already dealing with a heavy attention load. Human factors are even harder to fix than hardware.

      I hope somebody more current on instrumentation can comment on just what does exist now — particularly whether current instruments (and the training to use them) should have prevented this crash; just as interesting would be how long developing such instruments took.

    • In 1990, your phone couldn’t do any of this. In 1990, GPS was military technology that was artificially degraded for civilian use and might have stopped working in an armed conflict.

      Your “planes don’t have” and “why don’t we have” really means “why don’t I know about”, because airliners do have GPS, an inertial navigation system using accelerometers, and even optical gyroscopes so sensitive they can determine the aircraft’s latitude by measuring the spin of the Earth. Do a web search on TAWS and EGPWS for your database of airports and mountains.

      Outrage rooted in the lack of knowledge is really at the core of every conspiracy theory, from Flat Earth to QAnon; it’s a mental habit that you should really strive to break.

  • Sylvia notes “Part of the problem which I’m not sure I managed to make clear is there was a general believe that the warning flag would appear if the receiver wasn’t working. This was proven to be untrue…”

    Collins had specified, way back in 1975!, that the receivers be modified due to the possibility of false glideslope without a flag when there was an open circuit. It is fascinating that the regulatory bodies seem to have no prior interest in mandating this fix. Alitalia was not mandated to make this fix prior to the accident flight, and of course did not.

    Makes you wonder how many aircraft problems are known to the manufacturer, not mandated, and not likely implemented even though it is suggested by the manufacturer. The trick of course for the regulators is deciding which of these problems would be mission critical in a failure mode, and which are less critical.

Leave a Reply to Mike S

Your email address will not be published. Required fields are marked *

*
*
*

This site uses Akismet to reduce spam. Learn how your comment data is processed.