james oberg logo

 

Loading

space shuttle

 

profile

Feb. 05, 2003 : "Will the Columbia catastrophe prove to have been an "accident" in the strict sense of the word?"

IEEE Spectrum // March 2003 // pp. 22-24

Commentary: The Shuttle Puzzle [written Feb 5, 2003]

Will the Columbia catastrophe prove to have been an "accident" in the strict sense of the word?

One word I’ve never applied to the Challenger shuttle disaster in 1986 is "accident". I consistently call it a "consequence", and when asked, detail the reasons why it should be considered a preventable result of a string of bad human choices. The wrongness of those choices ought to have been known when they were being made, since they clearly violated classic principles of sound engineering judgment. And some engineers of integrity, such as Roger Boisjoly – the NASA whistleblower who tried to alert the agency to the O-ring problem – did know and say these things, so convincingly that in order to reach the fatal decision to launch, officials were ordered to "take off their engineering hats and put on their managers hats".

Challenger also highlighted a key principle of such technological disasters, one that has been overlooked throughout the current media in their rush for a ‘pat answer’ and the subsequent identification of the guilty. Catastrophes of this magnitude are rarely due to one single oversight, to one stupid mistake. They are instead most often the result of a concatenation of circumstances, some of them true mistakes and others merely coincidences that made the mistakes far worse.

This synergism of ‘bad luck’ – it isn’t additive or even multiplicative, it’ s often horribly nonlinear – has the awful consequence of making 100%-reliable preventative measures difficult if not, in practice, impossible. Initially, it looks like it should have just the opposite effect, since if a chain of conditions must conspire, it can be interrupted at any step. But because modern complex systems have interactions and permutations that make even the old ‘Salesman Problem’ look like child’s play, it can sometimes be impossible to identify even one of the multiple links in ALL future ‘disaster chains’.

That’s why across-the-board standards need to be enforced at every point, because each point doesn’t come labeled with warnings like "If mixed with B and C, will kill you". We’re not that smart, and thinking we are is stupid – and sometimes fatal.

NASA Shuttle manager Ron Dittemore alluded to this issue in post-disaster press conferences when he claimed that the connection was still not clear between the observed tank insulation fragment, which detached and struck the left shuttle wing during liftoff, and the signs of left wing thermal effects and ultimate collapse. The chain of causation, he stated, was still hidden, and much more investigation – of debris, of telemetry, of videotapes of the descent, and of records of ground processing – was needed to find the still-missing pieces.

This was a valid point, and new clues soon appeared. Rainwater may have soaked through a surface break into the insulation foam, then turned into ice, increasing its weight (and impact force) tenfold. The actual structure of the wing may have been weakened through corrosion so that normally tolerable thermal stresses from the occasional lost tile became critical. The ‘mass properties’ (weight, center of gravity, moments) of this particular vehicle were reportedly unusually nose-heavy, and although this was still within verified safe ranges, it did add stresses to the elevons and wings. More clues are expected.

Now, this picture of the causation chain casts NASA’s past decisions regarding tiles into a harsher light. This includes the agency’s rejection of suggestions for the tiles – better protection to reduce the likelihood of their being struck by dangerous objects, stronger surfaces, and in-flight inspection and repair. When considered as a sole-source ‘cause’ of potential damage, these threats to tiles may well not have been severe enough to justify serious measures to counteract them. But if they are approached as a potential link in causation chains that are inherently unpredictable, then different decisions might well have been justified.

On Challenger, it wasn’t just the ‘O-Ring’. It was the first shuttle launch from a new pad so that onshore morning breezes blew across the shuttle’s base in a different pattern, carrying air cooled by passage along the cryogenic tank right across one bottom section of one of the Solid Rocket Boosters. Ambient air was too cold to adequately warm the section back up. The O-rings lost flexibility but did roughly seat themselves at ignition.

But unusually strong high-altitude wind shear forced the shuttle to gimbel its engines back and forth to stay on track, rocking and flexing the vehicle until the seal failed and internal fire broke through – at the precise point along the circumference that the flame hit one of the attachment struts between the booster and the cryogenic tank. When that strut failed, the booster swiveled on remaining struts and penetrated the upper section of the tank, leading to its rupture.

Similar sequences of small steps led to the Apollo-13 accident in April 1970 and to the loss of the unmanned Mars Climate Orbiter in 1999 [see ‘Why The Mars Probe Went Off Course, IEEE Spectrum, December 1999, pp. 34-39].

The steps that led to the Columbia disaster – paved, as is the metaphorical road to hell, with good intentions and judgments that each in isolation look acceptable – are still being worked out. Only then will we be able to determine whether this event deserves to be called an ‘accident of unforeseeable and unpreventable surprises – or is the grim consequence of incompetent engineering management.

 

oberg corner piece

home | profile | articles | books | lectures | jim speaks | humor
links | email

Copyright 2010 James Oberg. All Rights Reserved
Site Designed and Maintained by YoeYo.com

oberg corner piece