We’re Still Talking about the 737 Max 8
The entire aviation world is focused on the Boeing 737 Max 8, especially as the details from the Lion Air cockpit voice recorder files have been leaked. The truth is, I don’t have a lot to say about it at this stage (unlike me to keep quiet, I know!) and so instead, I’ve put together a collection of some of the clearest and best pieces written about the current situation.
Preliminary Report on Lion Air flight 610 on Fear of Landing
Starting with my own piece but right now the preliminary report released in November is the best official information we have on what happened. This is a summary of the key information and includes a copy of the report.
Exclusive: Lion Air pilots scoured handbook in minutes before crash in Reuters
The cockpit voice recorder tape/transcription has not been released but some details have leaked to the press. Reuters spoke to three people who claimed to have heard the recording. The flight crew spoke of a problem with the airspeed and that the captain’s display showed a problem but not the first officer’s. They focused completely on the airspeed and altitude, said one source, never mentioning the trim.
Doomed Boeing Jets Lacked 2 Safety Features That Company Sold Only as Extras in the New York Times
The angle of attack indicator which would signal an AOA disagree when it received two different values from the two different sensors was an optional upgrade on the Boeing 737 Max. The disagree warning will become standard although the indicator itself will remain an optional upgrade. (Thanks to Kathleen for sending me the link.)
Capt. Sullenberger on the FAA and Boeing: ‘Our credibility as leaders in aviation is being damaged’ in Market Watch
Capt. Sullenberger goes public with his opinion, explaining that the situation between Boeing and the FAA has been building up for quite some time.
FBI joining criminal investigation into certification of Boeing 737 MAX by Steve Miletich for the Seattle Times
Miletich gives the details of the criminal investigation into the certification process
Pontifications: Fluid, dynamic events upend MAX story by Scott Hamilton on Leeham News and Analysis
Hamilton helps us to understand what this means, with a list of the unprecedented actions against the FAA taking place now, putting them into a historical context within the US.
Ethiopian ET302 similarities to Lion Air JT610 on Satcom Guru
This piece gives an in-depth look at the AoA malfunction modes and how it relates to the MCAS logic and what Boeing need to do to prevent the situation from arising again.
Analysis Of What Really Happened To The Boeing 737 Max From A Pilot & Software Engineer by Trevor Sumner and his brother in law
I retweeted this on Twitter because, although there’s a few assumptions buried in there, the explanation does a good job of showing the complexity of the situation. Threadreader has collected the tweets into a single page for easy reading. (Thanks to Mike and hat_eater for the link.)
How the American fervor for deregulation contributed to the 737 Max crashes by Jeff Wise for Slate
Wise looks at the government influence on the airline industry and the issues of short-term profitability when weighed against long-term cost.
The situation remains very fluid and I’d beware of any aviation expert or journalist trying to predict what is going to happen. There is a number of different aspects coming into play, ranging from politics to stature to economics to industrial — focusing on a single slice can be interesting for broadening our knowledge but shouldn’t be considered exclusively. There are no easy answers.
Finally, there’s some great conversation happening on the previous post with people from a good variety of backgrounds shedding light on what they can see in this mess. I hope that it will continue!
At a quick glance: where did Slate get the information that pilots need to get a medical every two years instead of three?
Certainly in Europe, as a commercial (airline) pilot I always had to get my medical EVERY SINGLE YEAR. Together with my licence renewal which included a “prof check”. Usually with a tough simulator session.
I will comment if and when I find more to comment on, I am far from finished reading.
Huh. A third class medical for a PPL is 24 months … do you think they have someone there who is a private pilot and just assumed they were all the same? Commercial flying in the US is every 12 months.
I have to say I’m stunned to learn that the AoA disagree light and the AoA meter are optional extras. With graphical displays it would cost nothing to include a warning and gauge as standard.
Without that information, and without apparent description of MCAS in the flight manuals, it is sadly much more understandable how pilots might have failed to spot that a poorly designed system was actively trying to fly the plane into the ground.
Having worked in safety critical production teams, the thing I really fear coming out is that as with the Challenger disaster, the engineering team at Boeing might we’ll have realised the dangers in the MCAS design, suggested major rework to make the plane safe, but then been overruled by management for commercial reasons.
If that turns out to be the case (and there is currently _no_ public evidence either way) then management will have the deaths of 500+ people on their shoulders, and reputational damage that may indirectly cost the jobs of thousands of people.
That would be an extremely serious criminal matter, but equally one with major political and commercial implications, which is sadly not the best scenario for learning the truth…
Spot-on. I suspect that this may be one reason why Garuda cancelled a big order. Whatg else did Boeing hold back?
Rereading the linked articles, I think (as a non-pilot) these two lines are central to understanding the situation:
“United Airlines, which ordered 137 of the planes and has received 14, did not select the indicators or the disagree light. A United spokesman said the airline does not include the features because its pilots use other data to fly the plane.”
Whilst their pilots might not use the AoA sensors to fly the plane, sadly the MCAS system does, which means otherwise supplemental data which didn’t need to be displayed, suddenly become central to safe operation of the aircraft.
The implications of this change of purpose/importance of the AoA sensors simply doesn’t appear to have been widely recognized, probably in a large part because Boeing seems to have done almost nothing (deliberately?) to highlight the change.
Thank you Colin, this is a really good explanation of how the AoA sensors became critical (rather than the ‘Boeing were selling an important safety feature as an add-on for profit’ narrative that’s currently prevalent.
Boeing had good reason not to highlight the change, because the point was to keep the aircraft under the 737 rating, which meant that 737 pilots would not need additional training to get the type rating. From an airline point of view, this made upgrading a 737 fleet to the MAX a no-brainer, because they wouldn’t have to pay for type-rating training on the new plane. Mendel has talked about the chain of events in a comment in the previous post.
I find Trevor Sumner’s it’s-not-my-profession’s-fault explanation less than convincing; as colintd pointed out last week, most of the work of programming is figuring out and coding edge cases. Pardon my foot-stamping and cane-waving, but when I learned the first language I really worked with (39 years ago), it was all about edge cases because there was no protection against trying to put more data into memory than there was room for; coders were expected to be … cautious … about code that handed them a memory address and said “put whatever you want in this.” The spiffy new languages provide a lot of protection against such crude cases, but they may not teach people to be cautious about limits — especially real-world limits.
What Sumner’s analysis really points out is what Sylvia’s articles have shown us over and over: actually breaking an airplane in flight requires a long chain of failures that could have been prevented if any one factor had not been wrong. (This is in marked contrast to, e.g., buildings, where a single-point failure can overload neighboring points leading to a chain of failures that becomes a disaster. See Levi and Salvatori, _Why Buildings Fall Down_, for some fascinating discussion of specific cases.)
Following the spec is no excuse; a good software engineer uses their imagination. (At least in the US; I’ve read that the custom in Japanese code shops is that coders are told exactly what inputs to expect and what outputs to generate from them — a sort of assembly-line approach that IMO has issues.) I would also point to a line from Napoleon, quoted in _Up the Organization_, to the effect that following bad orders from a superior is no excuse for failure.
I agree that software is not the only issue; lack of documentation, failure to include warning lights on the video display, and other factors can be pointed to. But the advantage of software is that it weighs nothing, so it can cover a wider range of possibilities without affecting resulting performance. Making sure it does this will take more time, and it’s possible Boeing’s culture treats software as being a manufactured good rather than acknowledging the 80/20 rule, but IMO that does make the software blameless.
Jeff Wise’s article is no surprise; I wonder whether anyone will hear it.
Sylvia — ISTM that the narrative you put down is correct; Boeing treated a critical warning as a profit source. The fact that the warning’s criticality comes from a chain of prior decisions doesn’t ameliorate Boeing’s failure to make the warning standard.
Here I must agree, my own actual flying times are too far in the past to be of any use here.
But as a former professional pilot I am in total agreement with the opinions expressed by Colin and Chip.
It certainly seems that the “max 8” was very far removed from the specs of the original B737, far enough to require a separate type rating.
Were pilots new to the type given proper training or only a “differences course”?
And, for the sake of some nickel-and-dime software upgrades, it appears that indeed many lives were lost. Sacrificed, a very harsh word and perhaps a bit unfair but that is what would appear. The 737 has been the most successful aircraft type in the history of aviation. Complacency and commercial considerations have marred its record. Perhaps even damaged it irrevocably.
This is a very sad story.
A story of cutting corners, where a bit more thought should, and probably could, have prevented the disaster.
The fall-out for Boeing could be substantial. Not catastrophic because the international dominance of Boeing is such that it is unlikely that the company will be brought down. There are other types in the Boeing line that will prevent it sliding into close down.
But nevertheless, a large number of jobs may be in jeopardy.
This apart from the grief of the families for the loss of loved ones, caused by the two disasters.
Even so, we must not lose sight of the fact that aviation still is the safest way to travel.
If the description in this Seattle Times piece https://www.seattletimes.com/business/boeing-aerospace/failed-certification-faa-missed-safety-issues-in-the-737-max-system-implicated-in-the-lion-air-crash/ is accurate, the errors in the development & certification process behind the MCAS mod directly mirrors the Challenger explosion, with commercial and reputational pressures causing management to override safety concerns from the engineers.
Moreover, having looked into the actual nature of the AoA sensor, tying the fate of an aircraft and all its passengers into correct operation of single, pretty delicate, externally mounted vane represents a major systems design failure, probably to the extent of being criminal, both from a design and certification perspective.
There’s a key factor not seen here that was blatant in the Challenger case: crude politics — Reagan wanted that launch for prestige. The commercial pressures here (as described by the story) may have been worse — Boeing could claim they’d lose thousands of sales and jobs if they were later to market — but they’re not the same. This is an appalling story — and AFAICT the worst of the failures happened under the Obama administration; sounds like the FAA may have been rotting for a long while.
PS: my cite above was incorrect; it’s LevY and SalvaDori, not Levi and Salvatori. Nothing to do with flying per se other than a discussion (written long before 9/11) of why the Empire State Building didn’t fall when hit by a WWII bomber), but a lot of discussion of how both design failures and gradual erosion can lead to catastrophe. (The latter is less common in aviation but does happen.)
I would love to believe that national aviation authorities are not under political pressure to “help” approval of their own nations major aircraft producers (be that Boeing or Airbus). I’m just not sure that this is reflected in reality.
I absolutely agree on the building book though, and would recommend it to anyone who has even a passing interest in engineering and/or safety. https://www.amazon.co.uk/Why-Buildings-Fall-Down-Structures/dp/039331152X/ref=asap_bc?ie=UTF8
It is one of two books I would recommend to anyone interested in safety critical/reliable systems, be that as a engineer, user or observer.
The other book is this Nancy Leveson text https://www.amazon.co.uk/Safeware-System-Safety-Computers-19/dp/0201119722/ref=sr_1_fkmrnull_1?keywords=SAFEWARE%3A+SYSTEM+SAFETY+AND+COMPUTERS+Nancy+G.+Leveson&qid=1553513001&s=gateway&sr=8-1-fkmrnull .
Its primary focus is on safe software and system design, but it has detailed examinations of a wide range of problems including those with the DC10, Apollo13, Challenger, Three Mile Island, Chernobyl, Windscale, Therac 25 and Flixborough.
The breadth of examples (many with minimal or zero software elements) is part of the central theme that “safe” system design, isn’t just about software or engineering, but involves an end-to-end approach to risk, from initial project conception, through implementation, all the way to employee hiring practices and operational oversight.
At heart, computerization gives you a wonderful building box of parts including almost infinitely long levers and pieces of rope. You can use this toolkit to do amazing things which simply aren’t possible with traditional manual or mechanical systems, but you can also get awfully tangled and/or produce a lot of nooses and pitfalls…
(There is also a newer book by Nancy Leveson which can be read online here https://mitpress.mit.edu/books/engineering-safer-world which extends the core ideas with more recent examples, but I still refer back to the older text as I prefer the more extensive examples given.)
Finally (I will try and stop hogging the comments section!) if you want another painful example of the risk of “enhanced” models of earlier safe systems, which in reality generated fatal risks, have a look here for a very good summary of the Therac 25 fatalities http://sunnyday.mit.edu/papers/therac.pdf .
Every time I read this I still feel almost physically sick about how relatively simple code errors and naive design killed people, but it is important this risk is understood in the increasingly complex/automated modern world.
Your comment got thrown into moderation, maybe the blog has a deepseated suspicion of multiple links. Anyway, sorry for the delay and you are welcome to post as many comments as you like; it’s not like I have to pay for the ink!
There are a lot of angles, twists and turns.
I cannot disagree with the parallel between this accident, actually two and possibly a few incidents where the crew managed to solve the problem and thus it did not make the headlines. Yes, in the case of the Challenger there was the overriding factor of prestige and commercial pressure.
BUT: Anyone who volunteers to climb into a space craft and accepts to be blasted into space also accepts that the risks associated with space flight are manifold compared to the risk that passengers can reasonably expect when boarding a public transport jet aircraft.
No, I do not exonerate the management who chose to override or ignore concerns from the engineers, but there is a VAST difference.
Regarding the twin towers: I think that the comparison is flawed here. The builders and architect were not really expected to design the building “terrorist-proof”. The 9/11 attacks were deliberate. The flights chosen by the terrorists were selected for their long-range destinations. This secured that a large quantity of aviation fuel would be on board, And that seems to have made a real, terrible difference. The difference between being able to escape or being trapped above an inferno. An inferno caused by thousands of gallons of burning jet fuel.
Of course, it would have been nice if the architect had designed the buildings differently but that is the wisdom of hindsight.
Ah Sylvia, that was a lot of ink indeed ! My fingers are black.
Maybe I am overlooking something, but it seems that MCAS requires a special kind of angle of attack sensor.
Many aircraft types that I have flown, especially jets, had them fitted as standard, rarely as optional extras.
The Learjet 25D that I flew had a vane type. Essetially a vane that would move with the airflow. It was very reliable.
Later versions were working on a the basis of a rotating conical cylinder with slits. The airflow caused pressure, the difference between the two slits, upper and lower, was measured and the resulting AoA presented in the cockpit.
The Citation? Ditto. I have only had a failure, when the heating element failed and we had to rely again on the computed Vref on an approach into Stockholm Bromma in winter.
To prevent a stall, after the prototype BAC 1-11 crashed during stall tests, the stick shaker and -pusher was introduced. Close to the stall the stick would literally vibrate, accomaponied by a rattling sound. It would wake up any sleepy flight crew, but if they still did not get it the pusher would literally push the control column forward in order to recover from an incipient stall. Crude? Yes but effective.
The Citation I flew first did not have a “glass cockpit”, but the angle of attack indicator was positioned in the left side of the flight director. Little red dot in the central position meant Vref. Flap settings were incorporated, so even with a failed airspeed indicator it was quite possible to fly an accurate speed on approach.
Okay, the two “737 Max” accidents took place during climb, not approach, but this old system would at least provide the pilots with an indication that related to a safe speed.
What is the reason, the logic behind the decision that Boeing needed to develop a more sophisticated = more complex = (obviously) less reliable system and install a warning only when the client, the airline, is prepared to pay for it as an optional extra? The more I read this, the more it “sucks”.
Has the 737 been the victim of having been overdeveloped? It has happened before that some kind of “fix” was needed. The King Air, I seem to remember, was stretched to become the 1900. I believe it needed some canard winglets at the front. The Metroliner that I flew had a “SAS” or Stability Augmentation System. It malfunctioned and was never repaired, it did not seem to make any difference. Just like the nosewheel steering which did not work either, but we, the crews, had no real need for it.
Old-fashioned seat-of-the-pants flying was good enough in those days!
More discussion of the training the pilots didn’t get: “The head of the UK’s Flight Safety Committee says that airlines and plane manufacturers are keeping safety training to an “absolute minimum” under pressure to keep their costs low.”. https://www.bbc.com/news/business-47655115
For those who haven’t come across it, this website http://www.b737.org.uk/mcas.htm gives some excellent detail on the actual AoA setup, including location of the sensors, controls and warnings, along with details of the proposed “fix” (see the end of the article).
The OPs bulletin TBC-19 is particularly interesting as it clearly indicates that Boeing knew back in Nov 18 that the failure of a _single_, relatively delicate, unreliable AoA sensor could cause MCAS to trim to the “nose down limit”.
This is in contrast to the cause of a “normal” “runaway stabiliser” which requires simultaneous failure of redundant, typically very reliable, mechanical switches (as I understand it, the electronic trim “inputs”/”switches” actually have paired NC/NO switches, both of which have to operate at the same time for input to be generated).
I believe this means that Boeing must also have realised that the original certification analysis which classified the severity of AoA failure as “Major” was incorrect, with the correct level being at least “Hazardous” and potentially “Critical” (see here for meaning of the classification https://en.wikipedia.org/wiki/DO-178B).
It seems their response bets on the crew always spotting the AoA malfunction as the underlying cause of repeated nose-down (in the absence of any visual indication of trim position, AoA input, or MCAS input), then disabling the system, _and_ manually (not electronically) dialing in enough up trim correction to keep the plane in the air.
This last point is critical to realize. Once the trim down limit is reached, even if you activate the cutout, the plane is going to hit the ground unless you manually correct the trim down which has been applied by the MCAS.
I cannot conceive that this is a defensible position to have taken, and I think it entirely justifies the grounding of the entire fleet until the behaviour is radically changed.
The bulletin, which is supposed to inform pilots of how to cope with a potentially catastrophic failure, is also rather misleading when it states:
“Note: Initially, higher control forces may be needed to overcome any stabilizer nose down trim already applied”
A more accurate statement might be:
“Note: It may be _impossible_ to overcome automatically applied stabilizer nose down trim via force on the control column. Unless this situation is corrected, the plane will be left in an uncontrolled dive.
It is therefore critical that either:
a) the trim down is corrected electrically followed by the cutouts being activated less than 5 seconds after release of the electronic trim switches
b) the cutouts are activated and then the trim is immediately corrected using the manual wheels”
If Colin’s penultimate paragraph is correct, it’s hard to see how anyone can defend Boeing’s actions.
Beyond this, I can’t see how anyone can justify the use of small, delicate, revolving, easily damaged sensors who’s correct function can’t be confirmed on the ground, with no redundancy, and use the output from that sensor to interfere with the pilot’s control of primary surfaces!
Adding AoA indicators and disagree lights is surely just papering over the cracks!
Made this comment before but it doesn’t seem to have been published!
Sorry, Steve, it got caught up in moderation and I was out. Future comments by you using this email address will get straight through!
If some of the comments on other forums about the preliminary report are right it is even worse than I was suggesting. The suggestion seems to be that at high speed (Such as after the plane starts diving) it is physically difficult/impossible to correct trim using manual controls if they are more than a degree or so out of neutral.
This fits with the bulletin comment about correcting the trim electronically before activating the cutout, but the bulletin does nothing to indicate how critical this step might be.
Irrespective of the Lion Air flight, if Boeing knew how critical it was to electronically correct the trim, but didn’t explain why, then responsibility for the ET302 crash lies entirely with them.
I think the FDR logs will be very insightful, and hope they are released in the preliminary report.
Colin is making me aware of the level – and rate – at which aviation has changed since I retired.
I am still a bit bewildered by an aircraft that needs very complicated, and as has been proven incomplete, electronic gizmos to prevent a large jet airliner to stall. Warning systems that could prevent an accident were offered as an ‘optional extra’. Which, of course, prevented the operators from becoming fully aware of the underlying problem that was swept under the carpet. As I pointed out, in my days there was the stick shaker and if the pilot still was asleep a pusher. It just quite literally shoved the controls forward but that was sufificient. It did not lead to an incontrollable nosedive.
The 737 in its original form was a very good aircraft. Not as strongly built perhaps as the 1-11, but that aircraft left the scene a long time ago. The 737 was further developed to keep up with modern technology. More powerful, more efficient engines. Modern electronics, bigger – more than twice the seating capacity from the original, basic -100 in fact.
But the question arises now: did Boeing go too far?
Was the FAA in collusion to approve a designation as an improved 737, rather than insisting that it required a separate rating? With, of course a longer time to get pilots on-line and resulting cost for operators.
So some suspicion remains:
Was there also a commercial decision to allow flaws in the design, flaws that may have been the result of over-developing the type, not to get the attention they should? And why did the authorities rubber stamp approval of, as we now know critical, safety features to be labeled as ‘optional extras’? Which of course meant that the training that might have enabled toe crew to solve the poblem before flying into the ground, was not recognised by operators.
Aviation has always benefited from lessons learned and became the safest mode of transport. No doubt that process will continue.
I am not too sure that I like the virtual monopoly of Boeing, but nevertheless the company will be here to stay. The problem will be solved, but this lesson was learned the hard way.
News stories have made clear that “collusion” wasn’t necessary — the FAA’s policy has become to allow Boeing personnel to make decisions that should be made by the FAA, if it were adequately staffed. I wonder how much certification of Airbus planes is done by Airbus personnel rather than governments.
NPR reports that Boeing has committed to several of the fixes we discussed: https://www.npr.org/2019/03/28/707641876/boeing-scrambles-to-restore-faith-in-its-737-max-airplane-after-crashes
Lots of the changes sound sensible, but there still seems to be a strong sense of denial about the seriousness of the problem (at least in public) amongst senior management:
“The 737 is a safe airplane,” Boeing’s Vice President of Product Strategy and Development Mike Sinnett said at a media briefing Wednesday. “The 737 family is a safe airplane family, and the 737 Max builds on that history of safety that we have seen for almost 50 years.”
It is very hard to reconcile this statement with the current MCAS behaviour, and the fact that two planes and all the passengers have been lost. In my experience it is generally better to admit faults, own the problem and do you best to honestly put them right. Continued denial of the seriousness of the problems will not help restore confidence…
The preliminary report in all its tragic detail is here.
The inability to use manual trim at speed when badly out of trim, combined with an inability to turn off MCAS whilst keeping electronic trim enabled (the cutouts disabling both), meant the incorrect MCAS trim couldn’t be corrected without risking reactivating electronic trim (and MCAS). This appears to have doomed the plane.
Worse, rereading the TBC19 bulletin issued after the lion air crash, in light of et302, there has to be a strong suspicion that Boeing knew that manual trim might not work unless electric trim was used to rebalance before the cutout was activated, but didn’t want to highlight this given the implications for the max 8s flight safety certification.
If they had only said that even a small degree of flap would disable the MCAS system…
I’ll be posting about this today with a summary and a link, so you probably will want to move discussion there. Just give me a few hours :)
This video gives a wider perspective on the industry pressures that incentivized Boeing to rush out a new model of plane with a nose up problem.
Been reading these above comments and have to agree with many of the above points. The FCC, the FAA, the US health system, EPA, and others are beginning to be like the saying “the inmates are running the asylum”. The country’s infrastructure, in particular its highway and bridge system are showing their age. This should have been foreseen and planned for. This has all been to the detriment of the public at large, who do not employ lobbyists. Now there’s a curse upon us.
I think the notion of “deregulation” being a large part of it is correct. The airline industry was getting along just fine until the latter 70’s as I recall (go ahead and correct this as needed. Then I think I remember American Airlines of all people came up with the $29 fare, I’d suppose to fill empty seats and steal the march on the competition. I was aghast, as no sane business would deliberately fail to recover costs. Nor could I understand why it was necessary to undercut the costs of bus or train fare. Does not the speed of aerial transport have a value of its own? As an airlines employee I could see the writing on the wall, not simply because of the burden on us “non-rev” passengers, which was to us bad enough. But also the beginning of the long downward slide of service in general, the different culture of the customer (I remember being paged by 2-way once to help break up a brawl amongst a soccer team. On board the airplane). Thus commenced the slide into mass-transit. Nowadays I much prefer “fly-by-Toyota”.
But that’s just my industry/agency. It seems the overseeing government agencies can not resist the urge to understaff. This combined with generally lowered wages is not any help for the industries nor country.
I’m of the opinion that, as mentioned above, there is too much emphasis on “short term” vs “ long term thinking. Many of our mistakes lately appear to be the result of cutting a corner now and disregarding the costs later.
It’s been my observation that most do not realise the profit margins of most businesses are actually rather low, probably a few percent or less. From what I’ve read, many of new management don’t seem to realise this either. So in their competitive drive to wax fat at the trough, they (hail the “corporate raider”) cut corners, manipulate stocks, underpay the short staffs (I’m looking at you especially, Walmart and any and all restaurants), and hire lobbyists to rescind/change the rules. So there is money left for bonuses. Running a sound business is secondary. The few I’ve known of have been driven out of business by the above practices. Judging by the posts I’ve read elsewhere on the web there are a lot of people who shouldn’t be in charge, but are. I’m not sure how we’d get get rid of this metastasized problem; it seems to have attained critical mass some time ago.
p.s. You’ve no idea how glad I am I’m a retiree.