The perils of extreme percentiles

Graeme Keith
6 min readNov 11, 2020

--

Discussions around extreme percentiles abound. The oil and gas exploration community expends enormous effort exhausting itself over appropriate values for the 99th and 1st percentiles of the distributions that describe the uncertainty in the size of oil fields.

I argue here that these discussions are at best distracting and at worst directly value-eroding.

The core of my criticism

My primary protestations are two fold

  • These perilous percentiles are very little related to the important parameters of their distributions.
  • Extreme percentiles are extremely difficult to assess with any kind of accuracy

Relevance

Uncertain variables are often approximated or represented by normal or log-normal distributions. Variables that can be decomposed into additive (or multiplicative) factors will always look roughly normal (or log normal) close to the mean.

Resource distributions, for example, are often taken to be approximately log normal. Recoverable resource is the product of several uncertain factors. The logarithm of the recoverable resource is thus the sum of the logarithms of these factors and, because the distribution of a sum of random variables always tends to a normal distribution, the logarithm of the recoverable resource is approximately normally distributed.

The problem is that although the convergence to a normal distribution is very fast in the centre of the distribution, it is much slower at the extremities. Real distributions are rarely infinite and their tails tend consequently to be thicker than predicted. This is particularly true if the component variables are themselves skewed or lumpy or, critically, if there is dependence between them.

This phenomenon is often hard to see. The figures here show three representations of a typical resource distribution with the normal distribution to which it tends. The horizontal lines correspond to the P99, P90, P10 and P01.

The vertical scale of the probability density function is set by the middle of the distribution, so the deviations in the tails look small. Indeed, they are small in absolute terms, but in the relative terms relevant for parameterization, they are substantial.

The cumulative distribution function actually shows these deviations, but because our minds compare the shapes of the curves rather than point for point values, the deviations are not so clear.

It is only the log-probit scale that reveals the substantial deviations in the tails of the distribution. The P99 extrapolated from the central part of the distribution lands at about 2 mmboe, which is about 4 times the actual distribution P99, an error of around 400%.

If we try to parameterize our resource distribution with a P99 then we are parameterizing an approximation to the distribution far beyond the point at which it at all approximates the distribution.

Assessment

Humans are notoriously inept at assessing low probability events. Even practitioners trained in establishing confidence intervals struggle disproportionately when those intervals are large: 90%, 95%, what we’re effectively asking for when we try to elicit at P1 or P99, 98%, or when we say the confidence interval between a minimum and maximum value should be 99%.

When asked for even relatively prudent percentiles like P90 or P10, experts tend to grotesquely under-estimate their ignorance; statistical analysis of the elicited estimates tend to correspond better to P70 and P30 (or worse). This intuition can be trained, but the training sets must be large enough that the errors can be discovered, recognized and internalized, so that practitioners learn to recognize the sense of how outlandish a wide percentile feels relative to their experience.

Analytically speaking, we need around 100 data points to fix a P90 to an acceptable level of accuracy (standard deviation of estimate equal to mean). If this seems high, consider that our best estimate of the P90 in a list of 100 data points is the tenth smallest in the list, but there’s actually a very reasonable probability of the P10 landing anywhere between the fifth and the fifteenth; of course how badly that goes wrong in relative terms depends on how steep the distribution is at that point.

To fix a P99 to the same order of accuracy requires 10,000 data points. Ten thousand. P99 estimates tend not to look so bad because they are all “small” in some sense, but although the difference in absolute terms between 1 and 9 million barrels is relatively modest, in the relative terms that set the parameters of the distribution, this is an order of magnitude.

We can reasonably train our intuition to assess P90 values. We have no hope of training our intuition to assess P99s. And circumstances where we have data to justify an estimate are extremely rare.

Value erosion in oil and gas exploration

If P99s are required to be set to certain values, as is required by some exploration workflows, then we use a precious degree of freedom that is needed to characterize both the scale of the prospect, but also — crucially — the uncertainty around that scale.

Specifically, it becomes impossible to have a reasonably high mid-range value without entailing a high variance and thus a large and often entirely spurious upside. Conversely, prospects with a modest mid-range value, but a realistic upside potential can not easily be represented if the P99 is nailed or firmly tied to some specified value.

The figure here shows the relationship between a P50 and the P10:P90 ratio entailed by that P50 given that the P99 is fixed to some small value, here 1 and 10 mmboe. With the P99 set to 1 mmboe, the P10:P90 ratios of all but the smallest prospects are extremely high (much higher indeed than many field size distributions). The ratios for large prospects are more reasonable with the P99 fixed at 10, but then small prospects are forced into unrealistically small ratios, overlooking an upside or missing a downside. Note also, just how significant those 9 mmboe are in terms of the assessment of uncertainty!

The next figure shows the effect of this uncertainty on establishing expected value of the distributions. Here, again, the difference is substantial. The resulting distribution is very sensitive to the chosen value.

In my presentation at the Rose & Associates Risk Coordinator Workshop in January, I presented a model for calculating the value erosion incurred by over and under-estimating both mid-range volumes, but also the uncertainty around those volumes.

Far and away the most toxic bias is volume vagueness — over-estimating the P10:P90 ratio. I have reproduced the figure from that presentation here. The confidence parameter on the horizontal axis measures how much the P10:P90 ratio is over-estimated. The vertical axis shows the realized value of the exploration investment as a proportion of the value realized were decisions made under faithful assessments. (For details, see the presentation). Not only is poor probabilistic prediction eroding your return — it’s actually costing you money in addition to what you are spending.

--

--

Graeme Keith

Mathematical modelling for business and the business of mathematical modelling. See stochastic.dk/articles for a categorized list of all my articles on medium.