Blog

Agile & Risk Management: The Role of Traditional Risk Management

This blog is the fourth in a series about agile and risk management. The first three blogs are:

  1. Three Key Agile Risk Management Activities
  2. Agile & Risk Management: The Mental Model of Uncertainty
  3. Agile & Risk Management: Antifragile

In this blog I discuss the assumptions that underlie traditional risk management and the role of traditional risk management during agile development.

Traditional Risk Management

Although there might be variant definitions, for our purposes here I’ll define traditional risk management as follows: Identify risks, qualitatively analyze risks, quantitatively analyze risks, plan for how to deal with (or mitigate) risks, and control (or do something about) risks.

This process is expressed in the figure below.

Traditional Risk Management

 

 

 

 

 

 

 

 

 

Traditional risk management often yields artifacts such as:

  • Comprehensive risk management plan
  • Risk register—typically a table where each row identifies a risk, its probability of occurrence, the exposure, mitigation steps, and the costs of those steps
  • Risk exposure graph—identified risks are plotted on a graph where the horizontal dimension is the probability a risk might occur and the vertical dimension is the cost (exposure) if it does.

Traditional Risk Management Assumptions

When applying a traditional risk management approach, people make three important assumptions that frequently won’t be true:

  1. Early on we can identify many uncertain events (the risks)
  2. We can identify ALL uncertain events
  3. We can accurately calculate the probability of uncertain events

Let me address each assumption in turn.

We Can Identify Uncertain Events Early On

I hope you will agree that on the first day of a new project we will have the least (worst) knowledge we will ever have about the project! We start out with limited information and, over time, our cumulative product knowledge grows (see figure below).

Front-end Risk Identification is Risky!

So, the assumption that we will be able to identify many of the risks and their mitigation steps upfront, when we have the worst possible knowledge we will ever have about the project, seems a bit unrealistic. Obviously, we can and should identify some of the risks as early as is practical, but we don’t want to hinge our success on our ability to identify risks early. Said another way, it is RISKY to operate in the danger zone of the above figure!

We Can Identify All Uncertain Events

The second assumption is that we can identify all uncertain events. This assumption is just not true; plus, often the ones we can’t identify are the most harmful. Nassim Taleb in his book “The Black Swan: The Impact of Highly Improbable Fragility” defines large-scale unpredictable (or very hard to predict) events of massive consequence as Black Swans. (Check out this Wikipedia description of Black Swans).

Here are a couple of example Black Swans.

  • Prior to Sept 11, 2001, everyone knew that airplanes don’t fly into buildings, until they did.
  • Prior to 2008 everyone knew that housing prices around the world had never fallen precipitously causing a global banking meltdown, until it happened.

Here is an example software-development Black Swan. A company developing software under the constraints of a regulatory agency was required to file periodic documents with the regulators. One person had the responsibility to complete and submit these forms. Unbeknownst to anyone, this person filled out the forms but chose to never submit them; instead he collected them in a box under his desk. The company was subsequently slapped with a large fine for failing to comply. Go ahead: predict that uncertain event (before it happens)!

We Can Accurately Predict Probabilities

A third assumption in traditional risk management is that we can accurately predict probability. The reality is that people are not very good at doing that. For example, we cannot say definitively that there is a 50% probability that a particular person might leave the company. To achieve some level of accuracy, often time people quantify risk on a scale of small, medium, and large. While this might actually prove more accurate, you are still forced to associate numeric values to each risk bucket in order to do math (e.g., Expected Monetary Value = probability times exposure).  Accuracy at the expense of any precision draws into question the validity of any resulting calculation.

In other words, though we can predict certain probabilities with some degree of accuracy, we are not very good at it. And there are times when we really can’t predict them with any accuracy at all. In those cases, we need to take economically sensible actions to limit exposure. Meaning, a better (and more agile) approach to risk management is to modify your exposure to classes of problems and learn to get out of trouble fast when unpredictable things happen.

Let me give you an example of a case where although we can predict an event of a given type might happen, we have no real ability to predict a specific event with a specific probability, but we can, however, deal with our exposure.

California Earthquake

We can predict that earthquakes will happen in California. However, we can’t predict the occurrence of a specific earthquake of a given magnitude. We also can’t change the probability of occurrence, even if we could predict a very specific earthquake. In other words, if Mother Nature is going to subject California to an earthquake on a particular day, in a particular location, of a particular magnitude, we will not be able to predict that event with any useful accuracy and there is nothing we can do to change the probability of it happening.

Why might we care about this? Because the data center that houses our critical servers is located in California and an earthquake could wipe out the servers and bring our business to a halt.

So what do we do? First stop worrying about predicting probabilities and start focusing on your exposure and what actions you could take to limit your exposure. In this case, we could make sure the data center is built in conformance with California earthquake codes (or stronger), or we could decide to not build the data center in California (if we are deciding whether or not to put it there), or we could establish a redundant data center outside of California.

More Sophisticated Risk Management Process Does Not Address These Problems

The big takeaway here is that more sophisticated process (or more elaborate risk-management machinery) will not address the problems inherent in these three assumptions.

The better approach, as I mentioned earlier, is to modify your exposure and learn to get out of trouble fast. Is that starting to sound agile-like to you?

So, What is the Role of Traditional Risk Management During Agile Development?

All of this is not to say that agile projects should throw all traditional risk management processes out the window. Instead, agile projects should embrace the minimum (dare I say, barely sufficient) amount of traditional risk-management process that is practical for dealing with risks in their particular environment. Of course, domains where human lives are at risk might choose to employ a more intense risk-management process than environments where the failure is not catastrophic.

Let’s use an example to explore the application of traditional risk management in agile. In the picture below, I take the mental model of uncertain events that I introduced in my blog post “Agile & Risk Management: The Mental Model of Uncertainty,” and apply it to the following uncertain event: Our vendor might fail to deliver a component on a promised date.

Uncertainty Model of Vendor Failing to Deliver

First, we can easily identify the uncertain event (this is not a Black Swan)—vendors do frequently fail to meet a promised date. We might even be able to derive a not-to-horribly-wrong probability that the vendor will be late. The consequences of not getting the component when promised are likely well understood and reasonably calculable. For example, if we don’t get the component when promised we can’t complete and ship our product; every month we are not in the marketplace has a cost of delay of $1m in lifecycle profits.

Now, what can we do about this uncertain event?

Candidate Action #1

One action we might take is to send one of our employees to the vendor to help expedite getting the component done. We could capture this action in a traditional risk register as shown here.

Risk Prob Exposure Mitigation Action Cost
Vendor fails to deliver Component X 50% $1m/month Send Barbara to Vendor to help expedite $15k
... ... ... ... ...

The entry in the table would be read as: “The risk is that the Vendor fails to deliver Component X when promised. We believe there is a 50% probability this will happen and if it does, our exposure is $1m/month in lifecycle profits. The action we will take to mitigate this risk is to send Barbara to the Vendor to help expedite the completion of Component X. The cost of that action is estimated to be $15k.”

This example candidate action can be handled using standard, lightweight, risk management practices. And it should, since using this technique is a sensible, understandable, and a low-overhead approach.

Candidate Action #2

A second action we might take is to pay an expedited charge to move the development of our Component X to the head of the Vendor’s development queue. Perhaps the underlying cause of the risk isn’t that the development of our component might take longer than expected, but instead that 15 other jobs in the Vendor’s queue in front of us might take longer, therefore delaying the development and deliver of our Component X. So, paying for head-of-queue privileges might adequately address this risk.

We would capture this action is a similar way to Action #1.

Risk Prob Exposure Mitigation Action Cost
Vendor fails to deliver Component X 50% $1m/month Pay more money to get head of queue privileges $25k
... ... ... ... ...

Now we have seen two examples of where using traditional risk management machinery would be helpful for addressing risk on an agile development project (or for any project!). So even though Agile approaches don't formally define risk management practices doesn't mean we should ignore them. Scrum doesn't define technical practices like test automation, continuous integration, refactoring, etc., but we'd better do them when appropriate!

Conclusion

We will use traditional risk management approaches during agile development when they are the lightweight, sensible way to capture and communicate the details surrounding particular uncertain events.

However, trying to overlay agile development with heavyweight traditional risk management (and all of its assumptions) is unnecessary and likely quite harmful. Doing so fails to improve a project’s risk profile and only gives the illusion that we have control of the uncertain events in the complex world in which we must operate.

We should use traditional risk management machinery to supplement the three principal approaches to risk management inherent in agile development.

  1. Manage risk via the product backlog
  2. Applying agile properly to avoid some uncertain situations (to eliminate the possibility of uncertain events)
  3. Apply core agile principles as a fundamental way to address uncertain events

In the next blog in this series, “Agile & Risk Management: Managing Risk Via the Product Backlog,” I will discuss the first of these three approaches: how we use the product backlog to manage risk.