Rehabilitating Risk Matrices
This is a risk matrix used for reporting “Enterprise Risks”. Risks — possible events that will have a negative impact on the (financial) goals of the company — are allocated to a cell in the matrix according to the “likelihood” the event will occur (vertical) and its impact on the financial goal of the company, say operating profit (horizontal).
We can manage risks by reducing the likelihood they occur or ameliorating the consequences if they do, so it makes a lot of sense to decompose the analysis of risks into whether or how often they occur and the consequence if they do.
But is the risk matrix the best way of doing this? Can we do better? In the following, I point out some of the shortcomings of the matrix approach and suggest an alternative, the risk plane.
Interpreting the matrix
We will interpret the “likelihood” as the probability that the event will happen at least once in a given time period (usually a year). In that case we can turn probability into expected frequency — i.e. how many times events occur a year (on average).
The “impact” is a little tricky and one of the big challenges with matrices. What do we do if events have ranges of outcomes that cross the thresholds between cells? One possibility is to take an average of the possible impacts and select the impact column according to the range into which it falls.
This is not the only way to interpret impact, but it has the advantage that it is mathematically meaningful then to multiply the frequency by the impact to give an occurrence or frequency-weighted expected impact, that corresponds to the amount, averaged (strictly speaking “expected” in the mathematical sense), this risk costs you a year.
If we follow this line of argument, then we can calculate the range of expected impact represented by each cell in the matrix. This is shown here.
There’s an obvious and foreseeable problem in the outer rows and columns. Because the frequency of the top row can be arbitrarily low (and the bottom arbitrarily high), there’s no way of comparing “overall” risk — by which I mean a measure that captures both frequency and impact — across the columns in these rows. All cells in the first row accept arbitrarily low risks and all the risks in the bottom row may be arbitrarily high. A risk in the top right cell could be lower in terms of overall impact than a risk in the top left. Exactly the same considerations apply to the first and last columns.
Even more concerning, the ranges in all the central nine cells overlap. For example, the range of the yellow 4,2 cell is almost identical to that of the red 3,4 cell. Adjacent cells overlap substantially (compare 3,2 and 3,3 for example). And there is even range overlap between cells two and three cells apart (e.g. 3,2 and 3,5).
If instead of simply accepting the prescribed colouring we set the colours according the range of expected overall impact. The matrix looks something like the following.
Part of the problem here is that we are multiplying frequency and impact, whilst trying to impose a linear colour scale. We can fix this by imposing strict logarithmic scales on both axes.
This cleans up the dog’s breakfast a little, but we still have arbitrarily low and high ranges in the outer cells and the cells still overlap (though somewhat more predictably and less pathologically than before).
Vying with vagueness: a philosophical digression.
Once we have explicitly identified likelihood with (expected) frequency, interpreted impact as conditional expected impact (i.e. the expected impact under the condition that the event has happened) and imposed strict logarithmic scales, most of our residual woe arises from the insistence on ranges.
I suspect this insistence is born from a belief that because we rarely have data to set a frequency with very much confidence, it would — the argument goes — be at best delusive and at worst downright duplicitous to quote a single number, and more honest and representative of our state of uncertainty only to quote one of a set of prescribed ranges.
With all respect for the honest intention of this position, I argue it’s misguided. Our very real uncertainty is already captured in the uncertainty in the occurrences and severities of the events we describe. Rather than throwing in the towel on uncertainty and despairing of ever being able to set a parameter with more accuracy than to within an order of magnitude, we can treat parameter uncertainty in its rightful place in the discussion around how we use data and expert opinion to assay uncertainty. Of course, we must do all we can to validate and revise these parameters in the light of data, but this is only possible if we drop the insistence on ranges.
The manifold miseries of matrices
The miseries of matrices are manifold — even house-trained matrices like the one above
- Frequency and impact are captured to within an order of magnitude, their product to within two.
- Aggregating two risks leaves room for error of up to four orders of magnitude in the overall risk.
- Because of the overlaps and the semi-infinite ranges in cells, a matrix can not be used to rank or prioritize risks.
- Five broad categories discourages discussion on data and encourages a subjective sloppiness where risks are allocated according to intuition rather than based on data or expertise.
Shedding the shackles and removing the ranges
Removing the ranges solves all these issues at a stroke. The price is that we are now required to provide a frequency parameter for the occurrence each risk and an expected impact for the consequence — the latter will often require some sense of the range of outcomes and how this range is weighted.
These demands, I would enthusiastically welcome as decided advantages. No longer should we hide behind the thick wide columns of the matrix. The information we are required to provide exactly drives the analytical behaviour demanded by informed risk assessment: What have we seen before? Can we find categories that allow us to use data to understand exposure? How bad can it get? What’s the best we can hope to get off with? (For rarer events, we will need to synthesize potential causes and consequences and turn our scrutiny to them, but this is also good practice.)
Our assessments of frequency can be validated and revised and instead of arguing whether impact best belongs as a 2 or a 3, where everyone knows its both, we can have an informed discussion about the range of possible impacts and their relative likelihood. Aggregation becomes trivial and prioritization is substantially simplified.
From matrix muddle to the perspicacious plane
Removing the ranges in no way compromises the fundamental accessibility of the risk matrix as a visual communication tool. Our risk matrix has become a risk plane and looks like this
Risks are placed vertically according to their frequency and horizontally according to the expected conditional impact. As long as we stick to the logarithmic scales, lines of constant overall risk are straight lines. Moreover, we need not limit ourselves to the expected impact, we can easily illustrate the full range of outcomes.
If a risk matrix is required for reporting, we can just put the lines back. There is more information than before, because the position within a cell now matters, but the information that was there before is still clear to all. The basic data that we have gathered and the process we are driven to do so are however fundamentally different and open the door to an enormous wealth of possibility.