# Top ten probabilistic pitfalls in oil and gas exploration

# Summary list

**Lookback**: No lookback or, what it sometimes worse, lookbacks that are so naive you’d have been better off without them. See this article.**P99 and P1**. These have no place in any volumetric methodology whatsoever. Ever. See this article.**Weakest link and unwarranted independence**: The practice of taking the smallest probability in a chain of dependent probabilities. An unnecessary oversimplification, as an account of dependencies between elements is simple and enormously fruitful.**The use of 50% probability in the absence of any information**. The problem with this is not that it’s wrong, but with the idea there is no prior information. What? Have we never drilled an exploration well before?**Naive use of historical statistics**: For example, directly applying historical percentiles to forward probability distributions. This is more challenging, because historical statistics are one of our best sources of information regarding uncertainty, but their correct utilization isn’t simple.**Log-normal GRV**: Our knowledge of GRV very rarely leaves us with anything that looks remotely like a lognormal distribution. Particularly pernicious when we start specifying percentiles.**Risk and uncertainty / “geting on the curve”**: Accounting for discrete uncertainties (e.g. down-dip fault seal) in probability of success, where it ought to be part of the volume distribution.**Data quality and ambiguity**: It is possible (common even) to have good data quality and still be none the wiser about an outcome. The subsurface is extremely under-determined by data. It is true, though, that it is impossible to be much wiser about outcomes without high-quality data.**Stretched beta and pert distributions**: Have far too little variance for resource distributions and encourage paramaterization by extrema (see 2)**Paramaterization by mean or mode**: Modes, mathematically, have the greatest error in evaluation and require the largest number of samples. Means often converge quite slowly owing to the effect of relatively rare tail values. Neither correspond well to our intuition regarding distribution centering. Use medians (but don’t add them)

# Introduction

The oil and gas exploration community is well aware of the challenges it has forecasting how many wells will successfully discover oil and how much they will discover when they do. Yet despite rigorous scrutiny, uncompromising introspection and substantial effort expended on improving exploration geoscience, we still struggle.

Without a clear understanding of the mathematics of uncertainty, we can not hope to capture the insights of high quality geoscience. Moreover, the mathematics of probability is inherently non-intuitive and the mathematics of how to incorporate historical data and new information is often deceptively subtle.

And yet, expertise in the quantification of uncertainty is often strikingly under-represented in the assessment of prospects and, critically, in defining the processes — also the quantitative components of the processes — by which prospects are to be assessed.

This want of scrutiny on the fundamental quantitative components of our processes has left dark corners in to which have crawled a cabal of problematic probabilistic practice. Principles introduced in good faith and understanding have suffered subtle interpretative shifts that have rendered them misleading and confusing. Simple heuristics have morphed in to normative practice and flexible rules of thumb have hardened into rigid dogma.

The following is a litany of the mathematical malpractice that has cropped up in my work advising companies on best practice in uncertainty quantification for oil and gas exploration. If one or several of these seems controversial, please treat it as an invitation to debate. You are warmly invited to contribute to the discussion in the comments.

# Lookback

The central challenge to establishing good quantitative practice is holding our probabilistic practice to account. There are organizational reasons why this is hard, but the fundamental feedback process, where predictions are compared with outcomes and deviations reported back with a view to improving predictions, is itself no straightforward matter.

I have written about lookback analysis and the elicitation of systematic bias at length here. Unsound, or just simply naive lookback analysis — e.g. neglecting ranges, relying on cumulative probability and volume with all its inherent shortcomings — is not directly damaging in itself, but it fundamentally undermines our ability to understand what is going wrong and how to fix it.

# P99

I have written at length, here, on why discussions on P99 and P1 are, at best, a distraction and, at worst, directly (and potentially enormously) value-eroding. In short, there are two reasons.

First, the probability distributions we use in exploration are approximations that work well in the centre of the distribution. These approximations break down toward the tails, long before the 1st or 99th percentiles.

Secondly, these statistics are effectively unknowable — at least they require so many data to establish with any reasonable confidence that we can never expect either to calculate or intuit sensible values for them.

So using P99 or P1 amounts to trying to control a distribution with a parameter you can’t know at a position where it has no relevance to the distribution you’re trying to control.

# Weakest link and unwarranted independence

A discovery is the confluence of a great number of geological fortuities; source, migration, timing and so on. We model these, rightly, as a probabilistic chain, but there is then a widespread tendency either to assume them to be wholly independent — assess them independently and multiply their probabilities together — or entirely dependent. The latter results in the weakest link approach — choose the lowest probability component in the chain and ascribe its probability to the whole chain.

Unwarranted assumptions of complete dependence or independence are not usually that damaging, partly because both approximations are often used in a full workflow and their pernicious effects work against each other and partly because practitioners compensate intuitively for the error.

The pity of weakest link is that it’s really an unnecessary (vicious) simplification. (See here on *vicious *and *virtuous *simplification.) A coherent account of dependence is quite straightforward, opens up for rich, insightful discussions about the relationship between risk elements and the data we have on them and, crucially, leaves you much the wiser about the risk elements that most impact your chance of finding oil.

# Non-informative prior

It is absolutely true that in the absence of any information whatsoever, a binary outcome must be ascribed a probability of 50% (though it’s not at all true that a probability of 50% implies a lack of information — only that any information is perfectly ambiguous). What is not true is that we have no prior information.

We are almost certainly drilling in an oil province or in an area we suspect might be an oil province. If the former, then we have the historical record for the province — outstanding data on which to build a prior, a probability from which to start. If the latter then we have those areas in the world that led us to believe this might be a good place to look. A bigger set, slightly more challenging to work with, but nonetheless and excellent place to start.

# Historical statistics

Don’t get me wrong, historical statistics are the starting point for all sound probabilistic forecasting. They’re also the basis for some of the most reliable prediction methods.

But past performance is a subtle guide to the future. Failure to account for the deterioration of plays (our tendency to drill the biggest closures first, coupled with the fact that we don’t put prospects back in the sample bag when we’re finished with them) is common. Far more pernicious (and subtle) is the conflation of historical statistics (percentiles or means of field-size distributions) with the equivalent statistics of forward probability distributions.

Historical statistics can be used, indeed must be used, but their use requires a little reflection.

# Lognormal GRV

Our knowledge of GRV very rarely leaves us with anything that looks remotely like a lognormal distribution, so when we start taking that approximation (more a representation, really) too seriously — specifying percentiles for example — we open the door to potentially substantial errors.

There’s a subtlety here though. A careful look at the mathematics around volume calculations shows that a log-normal distribution is the most appropriate leading order approximation to the uncertainty around GRV, but only in so far as the logarithmic mean and variance (the mean and variance of the logarithm of the GRV) are predicted correctly. So methods that rely on these parameters are OK, but methods that hang too slavishly on the exact form of the distribution — especially out in the tails — are doomed to fail.

# Risk and uncertainty

We expend a great deal of energy discussing “What is it we are risking?” And there is a comparable and corresponding confusion around the relationship between volumetric distributions and the discrete outcomes — whether a fault seals for example — on which they depend.

The mathematics on these issues — at least seen from the perspective of the portfolio analysis that should be the basis of drilling decisions — is crystal clear. The issue of what to risk is simple. The relationship between risk and volume uncertainty is a little subtle, but failure to articulate it clearly leads in some cases to massive consistent over-prediction and in others to gross misrepresentations of prospects in the portfolio (throwing the volume baby out with the risk bathwater). If you ever hear yourself talking about “getting on the curve”, you should take a close look at how you define this relationship.

# Precisely vague

Uncertainty in a continuous probability distribution is easy to understand and simply corresponds to the width of the distribution. But uncertainty in a discrete success / failure case is more subtle as the single probability number must both contain information about what the data are telling you about the success or otherwise of the well, as well as how clearly that message is being told.

A common confusion is the conflate ambiguity with data quality. Yet, high quality data can easily allow for two well-supported, but conflicting hypotheses, and low quality data can be completely unambiguous.

# Stretched beta and pert distributions

For resource distributions, these generally have too little variance, but the main menace comes from parameterizing with maximum and minimum values. These can rarely be known from either experience or data with any accuracy and, as I argue in the article, the dynamics of the tails of a distribution are most often governed by something other than those that determine the all-important centre. And yet, key statistics of these distributions are very sensitive to the choice of these parameters.

# Parameterizing distributions with modes

Modes are surprisingly difficult to determine accurately. The fact that, by definition, the distribution is flat at the mode means that both sample and simulation modes hop about and settle only very slowly. The same problem plagues the development of intuition, which is exacerbated by the fact that we tend to recall medians rather than modes when we think back over outcomes.

Pert distributions, parameterized by modes then shoe-horned to a pre-specified mean to variance ratio, are particularly pernicious in these respects. They have their place, just not exploration prospecting.