Pre-Columbia Criticism of NASA’s "Safety Culture" in the late 1990’s:
Chapter 8 // The Mir Safety Debate, from ‘Star Crossed Orbits: Inside the US-Russian Space Alliance”, James Oberg, 2002, McGraw-Hill, NY.
"...Quality must be considered as embracing all factors which contribute to reliable and safe operation. What is needed is an atmosphere, a subtle attitude , an uncompromising insistence on excellence, as well
as a healthy pessimism in technical matters, a pessimism which offsets the normal human tendency to expect that everything will come out right and that no accident can be foreseen— -- and forestalled— -- before it happens." Hyman Rickover
<snip>
In terms of discovering what could unexpectedly go wrong in space, Mir was certainly a gold mine. And as always, NASA did its best to put a positive spin on the sequence of shocks and unpleasant surprises.
"I learned far more in recovering from those things than if I had just had a steady constant program to execute," Mir veteran Michael Foale told a press conference after his flight. And White House science advisor John Gibbons, in the middle of listing the accidents, told reporters,: "My guess is that if everything had gone [as] we hoped, . . . we'd [have] come out on the short end of the stick in terms of relevant experience. . . . Each one of these mishaps has been absolutely invaluable in teaching us lessons that will come in very handy when we do further work in space like the space station."
More level-headed space veterans just shake their heads at such bold talk, since they know that emergencies in space are never good news and are never desirable. Further, noted one 20- year veteran of NASA's Mission Control, the claim that the crises taught NASA how to work out problems with the Russians merely shows that NASA approved the flights before it had adequately trained itself to work out such problems. "You learn to work with other control teams during rigorous pre-flight drills and emergency simulations," the veteran explained, "and this must be certified at the flight readiness review. Nobody had any right to allow those flights if this ability hadn't been learned before launch."
Charlie Harlan had been head of the safety office at the Johnson Space Center before retiring in 1997, after a career dating back to the days of Gemini and Apollo. On June 29, 1997, he wrote a long, calmly worded letter to NASA headquarters’s safety office to argue against the notion that unpleasant surprises were a good thing.
“High NASA officials and other pundits are quoted in the media as attempting to characterize the dangerous and potentially life threatening situations as ‘an opportunity to learn for ISS and how to work effectively with the Russians,’” he wrote. “[But] in the past, we have relied on training and simulations for this kind of opportunity rather than real life emergencies of the survival category. Of course we have to deal with real life emergencies at times, and we do learn from them, but we shouldn’t ever view them as an opportunity.”
Harlan concluded that the lesson to be learned from the surprises was that NASA had failed in its obligation to perform adequate hazard assessments: “Those of us in the safety profession would view these events as a failure of the management system and our Safety and Mission Assurance process,” he wrote. A healthy process, he added, “should be based on disciplined risk assessment methods, hazard elimination, risk mitigation, and continuous risk reduction throughout the life of the program.”
In Harlan’s assessment, which is shared by many of the other engineering safety experts that I talked to, the most disquieting thing about most of the problems on Mir was that they kept taking both countries' space experts by surprise. NASA safety panels had received detailed Russian input when they certified, before Jerry Linenger and Michael Foale's tours of duty, that nothing was likely to endanger them or threaten their missions. The Russians, with 25 years of space station living behind them, also overlooked these risks, demonstrating that their own process of extracting the correct lessons was flawed.
NASA's reaction to the 1997 accidents was mainly focused on how to prevent a recurrence. "They've firmly locked the doors behind all of the stolen horses," one former NASA safety official remarked. He meant that the two major and dozen minor accidents that actually occurred were only a small subset of all the potential accidents, thus making it wrong to believe that preventing their recurrence would protect the ISS against all other possible misadventures. Viewed from this angle, focusing on the history of Mir in 1997 and what had been learned from it could divert attention from a much wider-ranging hazard assessment.
This concern is well founded. In August 1997, astronaut Wendy Lawrence, then still a candidate for a lengthy Mir mission, explained to a television reporter why she was not worried by the string of space calamities. "I figure that everything that can go wrong has already gone wrong," she quipped.
Viktor Blagov, interviewed in his office at the Mission Control Center in Moscow, said much the same thing to an American news reporter later that year. "We have learnt to repair every system on board— - simple or complicated, we had so many malfunctions and in so many ways that, we can say, they can only be repeated,” Blagov boasted. “People are joking here— - we've lived through everything in that station, nothing new can happen as we've had it all."
The delusion that nothing new could go wrong in a system as complex as Mir was extremely dangerous. If the Russians had come to believe that they were invulnerable, they were in even greater danger than ever. Real ‘rocket scientists’ know that it isn’t just a question of a single systems failing. One system can cause another to fail, or a failure in one system can mask a failure in another. Those are the kinds of things that can happen in more vulnerable systems.
Harlan’s letter continued: “When NASA originally began the Shuttle/Mir program, no rigorous safety analysis or risk analysis was accomplished.”. He knew this because he had been there. “NASA decided based on the then understood historical performance of safe Mir operations to accept that record as a given. This was done by a subjective review process unlike the systematic safety and reliability analytical techniques utilized for U.S. human spaceflight. If you remember, at that time the Russians were not always forthright about their systems failures or some of the problems they had in the past.”
That was Charlie’s diplomatic way of describing the situation that I had also complained about in much more forceful terms. Where he said that the Russians were “not always forthright,”, I said that they were lying. The consequences were the same: NASA accepted the false data, and neither soft chiding nor harsh criticism altered NASA’s policy in the slightest.
Harlan continued: “NASA publicly has the appearance of trying to characterize the recent dramatic events of Mir operations in a way to minimize the idea that there is any safety concern for the crew as a result of the current Mir status. I believe that this stretches the limits of credibility. . . . Should it become clear to NASA that the safety risks for operations with Mir are increasing, NASA management should have the guts to challenge the political basis for this specific activity and offer alternative programs for cooperation with the Russian Space Program to the Administrator.””
Within NASA, other safety experts were echoing Harlan’s concerns. One was Blaine Hammond, an astronaut assigned to be the safety representative for his office.: “We have been extremely lucky so far,” he told his associates in July 1997. “We may not be so lucky next time and, in my personal opinion, there will be a next time, it’s just a matter of when and how bad.”
But when Hammond called the Mir “a disaster waiting to happen,”, he was cold-shouldered. “You’d have thought I was preaching heresy, the way people reacted to that,”, he told a reporter the following year, after he’‘d left NASA. “They would let me talk, but they didn’t act like they ever were going to take it forward. You’d see eyes rolling or you’d get the impression, ‘Geez, here he goes again.’”
On July 1, 1997, soon after the collision, chief astronaut Bob Cabana rebuked Hammond by e-mail for his negative statements about Mir. The message, Cabana explained, was to remind him of “the Flight Crew Operations Directorate position,” and it ended with the blunt request, “I would like to talk with you.”
“I was told that you stood up at a meeting and said, ‘it would be criminal for us to send Wendy to Mir,’”, Cabana complained. He then criticized Hammond for making such a statement without clearance. “Our primary goal right now is to help the Russians fix Mir and ensure that it’s done correctly. Your job is to make sure the system is supporting [our decisions], doing all the right things to fly safely, not to express [emotional] personal opinions that may or may not coincide with policy.”
It was clear that NASA’s policy had already been set: Stick with Mir unless something overwhelmingly negative shows up in the studies. This was evident from an e-mail that Cabana sent to Wendy Lawrence, the astronaut then training to replace Foale on Mir. “I think there is always the possibility that Mike could be the last American on Mir,” Cabana wrote on June 30. “This is definitely not what the program wants.” He added that if Mir was found to be “‘safe and stable‘,” “”I’m sure we will continue.”
“Not what the program wants...” is a telling phrase. NASA seemed to know the answer that it wanted in advance. Despite the public pretense of an agonizing, even-handed appraisal, the agency never seriously considered not sending the next astronaut to Mir. This is demonstrated by the curious fact that it wasn’t until September 24, the day before the shuttle launch to Mir, that shuttle operators even asked the mission design team what the dynamics issues (propelling and steering the shuttle into space) might be if the next American who was supposed to fly to Mir were to be taken off the shuttle before launch. The seat would have to be empty going “‘uphill’,” because it would be occupied by the returning American crew member (already on Mir) on the way back.
James van Laak told the New Yorker’s Peter Maas the same thing: “To be perfectly honest,” van Laak is quoted as saying about this decision, “there are plenty of people within the political system and within NASA who are pushing us to go, go, go, go, while at the same time, they are distancing themselves from any blame.”
Why wasn’t the issue being studied seriously? “The problem is lack of communication up and down the line,” Hammond later explained. “Nobody wanted to hear me, I felt I was left out in the cold,” he later told the New York Times. And it was more than just not wanting to hear things. Apparently there was also some effort expended to make sure that others didn’t hear things, either.
Hammond recalled a lively discussion of Mir dangers during a teleconference between Houston and NASA headquarters. The meetings, like all formal procedures, were taped. When a truly independent oversight team— -- NASA’s own Inspector General’s office— -- heard about these meetings, it requested the tapes. JSC officials reportedly replied that a person or persons unknown had broken into the vault where the tapes were kept and that unfortunately these 17 specific tapes were missing.
The suspicions of the skeptics— -- Hammond, Harlan, myself, and a number of other hold-outs— -- were that NASA was assuming that Mir was safe until proven otherwise because that was what the agency wanted to believe. This was the classic sin that led to the Challenger disaster in 1986, when a manager demanded that engineer Roger Boisjoly “prove it isn’t safe” when Boisjoly correctly voiced concern that the engine seals hadn’‘t been verified at the low temperatures they were facing on the day of the launch. It was the same philosophical flaw that appeared in 1999 when NASA managers, informed that their interplanetary navigation experts were worried that there was something funny about the flight path of a Mars-bound probe, demanded that the navigators prove that the probe was off course before they would allow the requested slight course change that would lead the probe to pass a little farther above the planet’s atmosphere. Without the prudent adjustment, the navigational problem, which turned out to be all too real, caused the probe to literally crash and burn.
Written proof of the same philosophical flaw was found in the briefing charts of a NASA safety panel, that met on September 10, 1997, to consider the launch of the next astronaut to Mir. Pete Rutledge, executive secretary, presented the chart that spelled out the panel’s position. “Despite concerns,” the chart said, “there is no hard evidence that Mir is currently unsafe.”
There it was in black and white. NASA’s view was that it wasn’t a question of showing that the agency understood the potential hazards of Mir and was taking trustworthy countermeasures. No, NASA’s arguments were based on willful ignorance. NASA felt that since it couldn’t find any dangers, it should assume that there weren’t any. This was the agency’s position even though the same review process had been used before Linenger’s flight, and before Foale’s flight.
Rutledge’s recommendations took diplomatic considerations one step further into what should have been a straightforward engineering assessment. “If and when Mir is deemed unsafe for a U.S. presence,” the chart said, “NASA should convince our Russian partners that Mir is unsafe for a continued human presence and press for abandoning the vehicle completely.” It was a macho thing: If the Russians wouldn’t quit, we wouldn’t either.
It’s worth making the point again. Normal space-flight safety standards, re-validated after the Challenger catastrophe in 1986, call for establishing a positive level of safety by cataloguing all of the potential hazards and estimating their cumulative probability in the face of countermeasures. In contrast, NASA's approach to Mir, as illustrated in both internal briefing charts and explicit public statements, was to assume safety unless danger could be proved (the motto seemed to be, "We will proceed unless somebody proves there is a reason to stop"). But most of the dangers facing the astronauts on Mir were unknown. Either Russia withheld relevant information, or else no one checked up on Mir’s key features. A full year later, in June 1998, a senior Russian space engineer was brought up to Mir on the last shuttle docking with one main task: to check Mir's soundness. Nobody had had any reliable information about it before.
NASA’s logic as described by NASA spokeswoman Peggy Wilhide was upside down, challenging doubters to find reasons not to go. “The bottom line was that the experts that we had asked, the majority of them, determined that there were no technical or safety reasons to discontinue the program,” she told the Associated Press.
In fact, could the safety teams be expected to find reasons not to go? In order to perform his safety assessments, Stafford relied on U.S. consultants and on his opposite number in Russia, a space engineering veteran named Professor Utkin. Utkin arranged briefings for Stafford’s people in Moscow, and assured them that he could find nothing hazardous about Mir.
This concerned me because Utkin’s track record in assessing space hardware hazards didn’t strike me as encouraging. A year earlier, he had been in charge of the accident investigation for the Mars-96 probe. The probe had crashed to Earth shortly after launch in December 1996, scattering half a dozen plutonium-powered batteries across the Chilean-Bolivian border. Utkin’s panel was given full technical data on the probe and its booster, yet even knowing as they did that it had actually failed, they could find no technical reason for it to have done so. Had they been asked prior to launch to predict the outcome of the mission, they would have concluded that they could find no reason for concern, even though in the real world, the launching turned out to be a disaster.
Nor, apparently, were many of NASA’s consultants given adequate technical information. Dr. Ronald Merrill’s report on the 1997 fire is a good example of this. Commenting on the fire, Merrill stated that “the smoke seen at the time of the fire proved to be water vapor and not a serious environmental issue.” I compared this view, based only on some documents that Stafford’s group had selected to show him, with Jerry Linenger’s written report two days after the event: “Breathing not possible without gas mask,” he wrote. “In my judgment, survivability without gas mask— -- questionable; serious lung damage without gas mask— -- certain.” Concluded Linenger, “It was impossible to get even a single breath between masks due to smoke density. I have experienced similarly dense smoke only during Navy firefighting training.” Linenger couldn’t see the fingers on the end of his hand through the smoke. I found it impossible to believe that Merrill had been shown Linenger’s full reports.
Other private analysts found different weaknesses in Stafford’s conclusions. “The full Stafford report . . . appears only to address past safety problems and not root causes of why safety issues occurred or how Russian flight safety assessments will be different in the future,” wrote Dennis Newkirk, author of the Almanac of Soviet Manned Space Flight and editor of the Internet’s ‘Russian Aerospace Guide‘. “The summary of the report also neglects cascading failures in which one systems failure, such as cooling, can eliminate multiple other systems leaving Mir with no backup other than the Soyuz.”
At the hearing, Roberta Gross, NASA’s Inspector General, described concerns expressed to her office about the impartiality (or lack of it) of the review boards. She recounted that she had received numerous comments from space workers to the effect that the review boards, which consisted of people whose real jobs were dependent on NASA funding, could not be impartial. The U.S.-/Russian panel headed by former astronaut Tom Stafford was specifically mentioned.
Ralph Hall, a Texas congressman, attacked Gross for relying on “‘anonymous hearsay’.” He was clearly overlooking the fact that the people who spoke to Gross’s team gave their names, but on condition that their identities not be revealed to their management (that’s how an effective Inspector General works in any bureaucracy). “I find many aspects of this testimony very troubling,”, he complained, citing the practice of repeating anonymous charges without Stafford being present to respond (he had been invited but had a scheduling conflict). “That’s an indictment of General Stafford and George Abbey,” Hall thundered. Such charges “smear the reputation of Tom Stafford,”, he complained, and he added,: “Making allegations that question his integrity— -- I think that’s disgraceful.”
But Gross’s reports clearly portrayed an atmosphere within NASA in which workers knew that disagreement was not tolerated. One non-Mir investigation examined a controversial decision in Houston to eliminate all Mac desktops and switch to an all-PC environment. “One employee stated that (s)he was told by the supervisor, ‘I would hate to lose a good engineer over this,’,” Gross wrote. “Another employee was told their value to the organization would be seriously questioned if they continued to question the decision. In another instance, a spokesperson sent an email stating ‘Our management is getting very tired of people who always know better. I know who signs my paycheck, do you?’” “
Further indications that this feeling was widespread came from a discussion forum on the private ‘NASA Watch’ Internet site. The topic was, “Do you feel free to openly express your thoughts at NASA without fear of retribution?” Some said they did, although few of those were from Houston. Otherwise, there was a lamentable consistency in the replies.
“In a healthy group, disagreement is not a threat,” wrote one. “Rather, disagreement is part of a process that results in the best decisions. Leadership assures things get done, even if disagreements persist after a decision. The problem in the agency is too many managers and no leaders to be found. Managers squelch disagreement. They take the easier route than a leader would. Somehow we got top heavy on managers and scarce on leaders.”
Another engineer expressed this concern: “I fear that the ‘yes’ men (and women) will say ‘yes’ one too many times, and there will be big problems that surface.”
“You know very well there’s no way to do that in this organization,” wrote a third. “We’ve become a parody of the worst of [Marshall Space Center, which supervised the shuttle propulsion system] just before Challenger— -- a bunch of ‘yes’ men without the guts to tell the emperor he isn’t wearing any clothes. Under this administration and this Administrator, NASA has become an agency of lies and half-truths, especially with regards to safety.”
Another wrote: “Do I feel free to speak out? Not a chance in hell! I’ve personally seen what happens to those that do. Around here we call such behavior ‘career limiting moves.’”. “ And another: “Many NASA and contractor employees have been in positions to see serious management mistakes, waste, breaches of the public’s trust or just plain corruption, but have not had the courage to speak up.” And yet another: “Typically, it is the employees that are of retirement age and who have no further career aspirations that are less apprehensive about speaking their minds. If you don’t fall into this category then I would say you probably are a little more reserved in what you say. ‘Rocking the boat’ and not ‘towing the party line’ can and do end careers. I’ve seen this happen more than once. We are creating ‘yes-people.’. This is unfortunate.”
Goldin, the head of NASA, was aware of these concerns, and regularly made announcements stating that the problems should be fixed. In mid-1998, for example, the ‘Senior Staff Meeting Minutes‘ for [June 1, 1998,] relate that “Mr. Goldin expressed his concerns that NASA does not have an environment that is conducive to openness in addressing safety issues raised by employees.” He was quoted directly: “Open communications and employee awareness and training are critical to produce an environment that permits information and concerns to flow and be appropriately addressed.” Somehow it never happened. Two years later, taking blame for the latest NASA failures with Mars, he confessed that “people were talking, but we weren’t listening.” He still didn’t get it: People weren’t talking because they were intimidated when they tried to pass bad news up the management chain.
<snip>
I had been invited to testify at the congressional hearings on September 18, to discuss my independent assessments of NASA’s safety review process.
I told the committee that while NASA officials kept insisting that “Mir was safe,”“, they had no engineering justification for making such an assertion. This was because the familiar process of ground-up safety assessment, which had worked so well in the past, had never been applied in this case. In order to objectively prove that something is safe, it is not enough to challenge others to “‘prove that it's not safe,”’, all the while withholding pertinent information, and then triumphantly conclude that the absence of any proof of danger is equivalent to a proof of safety. But that’s how NASA had done its Mir analysis.
<snip>
Copies of this book, autographed as desired, are available from the author, James Oberg, Rt 2 Box 350, Dickinson, Texas 77539 for $30 postpaid. Visit www.jamesoberg.com for more details on other books.
A detailed report of the safety culture flaws that destroyed the Mars Climate Orbiter in 1999 can be found at http://www.jamesoberg.com/mars_probe_spectrum_1999.html, my prize-winning IEEE Spectrum article on ‘Why the Mars Probe Went off Course’. Here’s the key excerpt:
“Even if what ruined the Mars Climate Orbiter mission can be overcome, it should not be forgotten. The analogies with the Challenger disaster are illuminating, as several direct participants in the flight have independently told Spectrum.
“In that situation, managers chose to cling to assumptions of "goodness" even as engineers insisted the situation had strayed too far into untested conditions, too far "away from goodness." The engineers were challenged to "prove it ISN'T safe," when every dictum of sound flight safety teaches that safety is a quality that must be established--and reestablished under new conditions--by sound analysis of all hazards. "Take off your engineering hat and put on your management hat" was the advice given to one wavering worker, who eventually went along with the launch decision.
“Similarly, various versions of the trajectory debate in the final days of the flight indicate that in the face of uncertainty, decision-makers clung to the assumption of goodness; assertions of trajectory trouble had to be proved rigorously. Just the opposite attitude should have ruled the debate.” |