You couldn’t really do ecology if you didn’t know how to construct even the most basic mathematical model — even a simple regression is a model (the non-random relationship of some variable to another). The good thing about even these simple models is that it is fairly straightforward to interpret the ‘strength’ of the relationship, in other words, how much variation in one thing can be explained by variation in another. Provided the relationship is real (not random), and provided there is at least some indirect causation implied (i.e., it is not just a spurious coincidence), then there are many simple statistics that quantify this strength — in the case of our simple regression, the coefficient of determination (R2) statistic is a usually a good approximation of this.
In the case of more complex multivariate correlation models, then sometimes the coefficient of determination is insufficient, in which case you might need to rely on statistics such as the proportion of deviance explained, or the marginal and/or conditional variance explained.
When you go beyond this correlative model approach and start constructing more mechanistic models that emulate ecological phenomena from the bottom-up, things get a little more complicated when it comes to quantifying the strength of relationships. Perhaps the most well-known category of such mechanistic models is the humble population viability analysis, abbreviated to PVA§.
Let’s take the simple case of a four-parameter population model we could use to project population size over the next 10 years for an endangered species that we’re introducing to a new habitat. We’ll assume that we have the following information: the size of the founding (introduced) population (n), the juvenile survival rate (Sj, proportion juveniles surviving from birth to the first year), the adult survival rate (Sa, the annual rate of surviving adults to year 1 to maximum longevity), and the fertility rate of mature females (m, number of offspring born per female per reproductive cycle). Each one of these parameters has an associated uncertainty (ε) that combines both measurement error and environmental variation.
If we just took the mean value of each of these three demographic rates (survivals and fertility) and project a founding population of n = 10 individuals for 1o years into the future, we would have a single, deterministic estimate of the average outcome of introducing 10 individuals. As we already know, however, the variability, or stochasticity, is more important than the average outcome, because uncertainty in the parameter values (ε) will mean that a non-negligible number of model iterations will result in the extinction of the introduced population. This is something that most conservationists will obviously want to minimise.
So each time we run an iteration of the model, and generally for each breeding interval (most often 1 year at a time), we choose (based on some random-sampling regime) a different value for each parameter. This will give us a distribution of outcomes after the 10-year projection. Let’s say we did 1000 iterations like this; taking the number of times that the population went extinct over these iterations would provide us with an estimate of the population’s extinction probability over that interval. Of course, we would probably also vary the size of the founding population (say, between 10 and 100), to see at what point the extinction probability became acceptably low for managers (i.e., as close to zero as possible), but not unacceptably high that it would be too laborious or expensive to introduce that many individuals.
So far so good — the outcome (probability of extinction) is a useful guide to maximise the probability of introduction success. But what if we want to determine how sensitive this probability of extinction is to change in the model’s parameters? For example, even though we can most easily vary the size of the founding population, we might also be able to influence survival probability by, say, controlling predators in at the introduction site. Or, we could try supplementary feeding to increase the number of offspring that the average female produced per breeding cycle. The question now is whether spending the time, money and effort to influence one parameter is more important on the outcome (extinction probability) than influencing another. In our case, we can ask whether supplementary feeding is more important than predator control, or whether these interventions are negligible compared to introducing more individuals in the first place.
And so sensitivity analysis was born to solve just this sort of problem.
If you have never before done a sensitivity analysis, I wager you can imagine how it might proceed. The simplest way is known as a single-parameter perturbation analysis where we do just that — vary the value of one parameter while keeping those of all the others fixed, and then relating (correlating) the variation in that parameter to variation in the outcome (extinction risk).
This might sound reasonable, but the problem is that the complex universe represented by our admittedly simplistic model is rendered even less realistic by this approach. There are probably few cases where only one parameter varies while all the others keep more or less the same (fixed) values. In reality, parameters often co-vary in complex, sometimes non-linear ways, so that you get a misleading estimate of the relationship of the variation in the outcome to variation in just one parameter. Thus, global sensitivity analyses were created.
Put simply, a global sensitivity analysis varies all (or at least, the main) parameters in a model simultaneously according to the linkages defined explicitly between them in the model. This gets around the covariation and non-linearity issues. The variation in the output over all iterations can then be related to the iteration-specific values of each parameter within a multivariate correlation model (we call this step emulation, and the model used to emulate, the emulator).
The cleverest among you will now be thinking: “Hang on a minute, what if your model has many more parameters than four? Won’t there by a stupidly large number of parameter values and outcomes to test?”. You’re correct — if your model had, say, 20 parameters or more, you can understand that you will have an exponentially increasing and intractably large number of possible combinations of parameter values to test.
The real question then is how to trade off the number of iterations per set of parameter combinations with an adequate sampling of the parameter space (i.e., the full range of plausible values for each parameter).
Because we have had to deal with this problem many times without an obvious solution, our even cleverer postdoctoral fellow, Thomas Prowse, has just published a paper in Ecosphere where we show that the most important thing to do is an adequate sampling of the parameter space rather than iterating each combination many times. In fact, you can usually get away with a single iteration per parameter value set!
This counter-intuitive result means that with a sufficient parameter-space sampler (such as a Latin hypercube algorithm), you can really streamline your global sensitivity analysis and find which parameters most influence your model predictions. We also provide some R code to help you along with your own analyses. With this nice validation of the approach, streamlining efficient sensitivity analyses for ecological models has become a lot simpler.
§The popularity of PVA in conservation biology justifies my use of this initialism here.