Systemic Failure and Risk Management

“You don’t understand anything until you learn it more than one way.” – Marvin Minsky

By Glenn Mott

The other night, I happened to catch the documentary program “Retro Report” on PBS. Retro Report is a nonprofit news organization that produces mini documentaries that look at today’s news stories through the lens of historical context. Executive Producer Kyra Darnton describes “Retro Report’s” mission as providing, “context and perspective by going back and re-reporting and reanalyzing older stories, or stories that we think of as not relevant anymore.” In many ways, this is tantamount to what risk managers are capable of, and tasked with doing, after a catastrophic risk event.

Not every “Retro Report” is about risk management, of course, but many of them involve some aspect of risk, in retrospect and in context. The segment that caught my attention is titled “Risks after Challenger.” According to Diane Vaughan, author of the book The Challenger Launch Decision: Risky Technology, Culture, and Deviance at NASA, in the period leading up to the Challenger launch, NASA was still riding high on clout earned during the Apollo missions, which “gave it an aura of invincibility. We were taken as the international leader in the space race. And no one was really expecting anything to go wrong.”

Rooted in History

Many of us recall that tragic 1986 episode vividly, along with the Presidential Commission charged with investigating the disaster, and the public hearings that followed. The PBS documentary details how NASA managers not only knew about the risk of O-ring failure at temperatures similar to those on that fateful January launch day, but how NASA had willingly ignored the advice of its contractor Morton Thiokol, and Thiokol managers, in turn, capitulated to NASA over the advice of their own engineers.

As Vaughan relates the story, “At Thiokol the vice president was asking those engineers to stand up for what they said. Roger Boisjoly [a mechanical engineer] took the lead in the objections. He said, ‘I can’t prove it to you. All I know is that it’s away from goodness in our experience base.’ But the engineers at Thiokol didn’t have the data. So the vice president took the decision-making away from the engineers and asked the managers to decide.” Thiokol managers voted to reverse the recommendation and to launch the Challenger as planned. The subsequent Rogers Commission report contained the memorable phrase that the Challenger disaster was, “An accident rooted in history.”

Many of you may remember, as I do, the testimony of Nobel Laureate and theoretical physicist Richard Feynman during the televised hearings. To prove his point, he did something so simple that all the complexity of data and analysis that had blinded NASA’s managers to the problem was immediately clear to anyone watching from home. He submerged a sample of the O-ring material in a glass of ice water. The material lost its resilience at cold temperatures. As Feynman’s demonstration with a clamp and a glass of water made crystal clear, the cognitive, systemic, and practical disconnect between NASA’s engineers and executives, and between Thiokol’s managers and engineers, was far greater than anyone had supposed.

A few years later, NASA and the American space program again suffered a catastrophe with the loss of Columbia on reentry due to launch debris that had caused damage to tiles on the left wing. As Vaughn notes, “The similarity between Challenger and Columbia was the falling back on routine under uncertain circumstances.”

Normalization of Deviance

As I watched the documentary and the timeline of the shuttle disaster unfold, I recalled how Melanie Herman and Katharine Nesslage previewed this dilemma in a recent NRMC eNews column. In that column they write, “After many years of advising nonprofits across a colorful spectrum of missions, we’ve concluded that trying to find a completed set of risk management blueprints or best practices that can simply be followed like breadcrumbs is a costly and frustrating exercise that is fraught with risk.”

Again, Diane Vaughan: “This happens in many different kinds of organizations . . . their [NASA managers’] behavior was to a great deal determined by working in a very rule-oriented organization.” Vaughan continues, “We can never resolve the problem of complexity, but you have to be sensitive to your organization and how it works. While a lot of us work in complex organizations, we don’t really realize the way that the organizations we inhabit, completely inhabit us.”

As the above-mentioned NRMC column makes clear, risk managers must constantly monitor how the environment in which they operate is changing. They must be open to insights and ideas from different sectors, industries, and even different disciplines. To quote NRMC’s own best advice, “Keep in mind that while you may believe your quest is for best practices, a best practice in one organization could be a complete disaster in your environment. Be open to hearing about missteps as well as success stories.”

When reviewing the Challenger disaster, Diane Vaughan, a sociologist, coined the term “normalization of deviance,” and went on to apply the theory to organizations beyond the space agency. She says that it helps explain how over time, organizations come to accept risky practices as normal. “It’s widespread, with Katrina, with British Petroleum [when] early warning signs ignored [in the Deepwater Horizon disaster in Gulf of Mexico], the 2008 financial failure. You have a lot of heavy technology, derivatives and formulas, and there is a fine line between what is devious and what is a good business decision.” In 2017, Admiral William Moran of the U.S. Navy cited Vaughn’s term to explain two deadly collisions (the USS Fitzgerald and USS McCain) just over two months apart that summer while patrolling in the South China Sea.

Tearing down the risk function in your agency and rebuilding it differently may be the hardest thing you will need to do after a catastrophic risk event. Yet, every risk manager needs to be prepared to answer the question: “What would I do differently?” The alternative is adhering to a blueprint, following the rules of the system while ignoring the environment, and being unimpressed with any data to the contrary.

For more, see these resources:

The “Retro Report” documentary on PBS

“It’s Time to Banish Blueprints and Best Practices,” RISK eNews, Nonprofit Risk Management Center