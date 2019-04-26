Shoddy Software Is Eating The World, And People Are Dying As A Result
Two recent crashes involving Boeing 737 Max jets are still being investigated. But there is a growing view that anti-stall software used on the plane may have caused a "repetitive uncommanded nose-down", as a preliminary report into the crash of the Ethiopian Airlines plane puts it. Gregory Travis has been a pilot for 30 years, and a software developer for more than 40 years. Drawing on that double expertise, he has written an illuminating article for the IEEE Spectrum site, entitled "How the Boeing 737 Max Disaster Looks to a Software Developer" (free account required). It provides an extremely clear explanation of the particular challenges of designing the Boeing 737 Max, and what they tell us about modern software development.
Airline companies want jets to be as cost-effective as possible. That means using engines that are as efficient as possible in converting fuel into thrust, which turns out to mean engines that are as big as possible. But that was a problem for the hugely-popular Boeing 737 series of planes. There wasn't enough room under the wing simply to replace the existing jet engines with bigger, more fuel-efficient versions. Here's how Boeing resolved that issue -- and encountered a new challenge:
The solution was to extend the engine up and well in front of the wing. However, doing so also meant that the centerline of the engine's thrust changed. Now, when the pilots applied power to the engine, the aircraft would have a significant propensity to "pitch up," or raise its nose.
The solution to that problem was the "Maneuvering Characteristics Augmentation System," or MCAS. Its job was simply to stop the human pilots from putting the plane in a situation where the nose might go up too far, causing the plane to stall -- and crash. According to Travis, even though the Boeing 737 Max has two flight management computers, only one is active at a time. It bases its decisions purely on the sensors that are found on one side of the plane. Since it does not cross-check with sensors on the other side of the plane, it has no way of knowing if a sensor is producing wildly inaccurate information. It assumes that the data is correct, and responds accordingly:
In a pinch, a human pilot could just look out the windshield to confirm visually and directly that, no, the aircraft is not pitched up dangerously. That's the ultimate check and should go directly to the pilot's ultimate sovereignty. Unfortunately, the current implementation of MCAS denies that sovereignty. It denies the pilots the ability to respond to what's before their own eyes.
Like someone with narcissistic personality disorder, MCAS gaslights the pilots. And it turns out badly for everyone. "Raise the nose, HAL." "I’m sorry, Dave, I’m afraid I can’t do that."
The coders who wrote the MCAS software for the 737 Max don't seem to have worried about the risks of using sensors from just one side in the computer's determination of an impending stall. This major design blunder may have cost the lives of hundreds of people, and shows that "safety doesn’t come first -- money comes first, and safety's only utility in that regard is in helping to keep the money coming," according to Travis. But he points out that it also reveals something more general, and much deeper: the growing use of software code that is simply not good enough.
I believe the relative ease -- not to mention the lack of tangible cost -- of software updates has created a cultural laziness within the software engineering community. Moreover, because more and more of the hardware that we create is monitored and controlled by software, that cultural laziness is now creeping into hardware engineering -- like building airliners. Less thought is now given to getting a design correct and simple up front because it's so easy to fix what you didn’t get right later.
Every time a software update gets pushed to my Tesla, to the Garmin flight computers in my Cessna, to my Nest thermostat, and to the TVs in my house, I'm reminded that none of those things were complete when they left the factory -- because their builders realized they didn't have to be complete. The job could be done at any time in the future with a software update.
Back in August 2011, Netscape founder and VC Marc Andreessen wrote famously that "software is eating the world". He was almost right. It turns that shoddy software is eating the world, sometimes with fatal consequences.
Video Games are a great example of this phenomina
All you have to see to understand this debacle is major AAA video game releases of the last few years. Video Games have been pushed out the doors incomplete more often due to the ability to issue day one patches to fix the issues, or even with the expectation the issues can be fixed down the line. Anthem being a prime example, its a buggy mess with threadbare content. The developers planned to fix it all in post.
For all the bugs you can find in older games when patches were not an option, AAA releases were more willing to delay release than put out a buggy mess.
Re: Video Games are a great example of this phenomina
"Embedded" software such as that in cars and planes as well as just about any electronic device you can buy is held to a higher standard. Where there is no room for error the quality benchmark is much more stringent. Still, bugs get through the testing a lot more these days than they used to.
While it is true that there is no such thing as bug-free software there is no excuse for shipping incomplete and poorly tested product, even video games. The backlash against companies shipping non-life-threatening software is ramping up. That against companies shipping product that affects lives is far more violent. Hopefully we'll see this swelling wave of attention on software quality produce better results in the future. This trend desperately needs a reversal.
Don't they have pitch indicators?
First, NO, you can't judge pitch relative to ground even a low level. You must have an objective indicator. And what if cloudy or dark? -- That one sentence is enough to totally discount the rest. It's just simply trivially WRONG.
Now, my take on the crashes is
No evidence the coders didn't worry
I don't know of any evidence for this. Coders can't just run around making whatever major architecture changes they feel like. Such things have to be approved at higher levels, and for all we know the coders raised concerns and were told it wasn't important or would be fixed in a future update.
what's the problem & fix
hard to figure out the main point and purpose of this article:
Software Sucks ?
Software writers Suck ?
Boeing Sucks ?
The world now heavily depends upon computer software/firmware/hardware -- it's everywhere.
Much of it is defective/shoddy when initially used, but most is not critical to life or death issues. Important software is in a constant update evolution cycle.
Boeing screwed up and killed people. That is not rare. Airplanes have been crashing for over a century from design errors.
Human error and misjudgements killed people by the many millions throughout history.
What exactly do you want done ?
Re: what's the problem & fix
"What exactly do you want done ?"
Really?, I mean Really? are you effing serious?
When One builds a product with a SINGLE point of failure that will be FATAL it is a fatal DESIGN ERROR!!!
yeah, I know I'm shouting, but people don't listen, like you noted in your comment.
Yes, this looks like it was a single point failure.
Failures happen, it is just the things work - so a good design takes this into account and mitigates the danger with backups that are polled at regular intervals with some sort of comparison going on between them in order to ascertain their functional capability. This is, or at least it used to be, standard operating procedure for manned aircraft. When was this abandoned?
Being both a pilot and a software engineer myself, there has been a lot to absorb in both of these crashes. But as is the case with nearly all major aircraft accidents, there is more to it than just simply blaming poor software or hardware design.
The regulations involved in getting aircraft certified are voluminous, and getting approval requires a huge amount of work. Or at least that's the theory, and according to some reports the FAA delegated a lot of its oversight responsibilities back to Boeing itself. (A decent summary article is here.)
The design of the MCAS system feels inadequate to me, and it's getting a justifiable level of concern placed on it. But the natural follow-up question is how a flawed design like that was approved in the first place. If the FAA had been keeping closer tabs on things, it's possible that system would have never been put into production in its current form, and these accidents could have been avoided.
There is almost always a chain of failures that lead up to a crash, and "shoddy software", if you like, certainly played its part. But it wasn't the only link in that chain.
