A common objective of correlative species distribution model is to be able to project the potential distribution of the target species into a new environmental space. This can be a new geographic space (e.g., invasive species) or projected future conditions.

One should be very careful in interpreting results if extrapolating to areas with conditions that fall outside the range of reference environmental variation. There are several methods to visualize this uncertainty. On this blog I have for example mentioned the multi-environmental similarity tool in Maxent (also implemented in R in amongst others the dismo package and as an addon for GRASS GIS), which allows you to create maps that provides the similarity of each point to a set of reference points (Elith et al. 2010) and thus provide a quick overview of areas with ‘novel’ conditions.

A disadvantage of this and other methods is that they only consider the ranges of the individual predictors, and ignore the correlation structure of the covariates used to build the model. In reality, it is not unlikely that at a given locations values of each univariate factor fall within the original range of values, but the combination of environmental conditions is new.

This is one of the main points of the recently published article ‘*Here be dragons: a tool for quantifying novelty due to covariate range and correlation change when projecting species distribution models*‘ by Mesgaran et al. (2014) in Diversity and Distributions. The authors propose a new method to measure similarity between reference and projected domains by accounting for both the deviation from the mean and the correlation between variables, which includes a multivariate similarity index based on the mahalanobis distance between all covariates.

In short, they compute the mahalanobis distances of any point in the reference data to the centre of the environmental space (*r*) formed by all reference data points. The maximum distance (*Dmax*) represents the ‘boundary’ of the reference environmental space. Now, any new point that falls outside the boundary of this environmental space represents a novel combination of covariates. To determine if a points falls outside the reference environmental space one just needs to calculate the distance to *r* and compare this to the *Dmax*. A *Dnew > Dmax* thus represents a novel combination of covariate (Mesgaran et al propose to use the ratio of *Dnew* and *Dmax*, which they call the multivariate combination novelty index (NT2). NT2 > 1 represent novel multivariate conditions.

Anyway, you better check out the above cited article for the details. They provide a clear description and helpful graphs. What the authors also provide is the software to implement compute the NT2 (and a few other things), the ExDet tool. The software provides an user friendly GUI and runs really fast. But for me it would be even better if I could integrate it in my normal workflow, which means, running it from the command line and preferably in conjunction with GRASS GIS. The main part I would be interested in is to be able to calculate this NT2. And the main part of doing that is to be able to calculate the Mahalanobis distance for set of reference raster layers.

In GRASS GIS there is no such function as far as I am aware, but in R there is, with the perhaps unsurprising name mahalanobis. Below I’ll give an example how to use this to calculate the NT2 in R and create the map below:

I’ll use the sample data provided on their website. It consists of one set of layers describing the conditions in Austria (reference set) and another set describing the conditions in South Africa (projection data).

# Names of the reference (ref) and projection (pro) data ref <- c("AusBio13.asc", "AusBio14.asc", "AusBio5.asc", "AusBio6.asc") pro <- c("SaBio13.asc", "SaBio14.asc", "SaBio5.asc", "SaBio6.asc") # Import the data in R using the read.asciigrid function # of the sp package refdat <- prodat <- list() for(i in 1:4){ refdat[[i]] <- read.asciigrid(fname=ref[i])@data prodat[[i]] <- read.asciigrid(fname=pro[i])@data } refdat <- do.call(cbind, refdat) prodat <- do.call(cbind, prodat) # Calculate the average and covariance matrix of the variables # in the reference set ref.av <- colMeans(refdat, na.rm=TRUE) ref.cov <- var(refdat, na.rm=TRUE) # Calculate the mahalanobis distance of each raster # cell to the environmental center of the reference # set for both the reference and the projection data # set and calculate the ratio between the two. mah.ref <- mahalanobis(x=refdat, center=ref.av, cov=ref.cov) mah.pro <- mahalanobis(x=prodat, center=ref.av, cov=ref.cov) mah.max <- max(mah.ref[is.finite(mah.ref)]) nt2 <- as.data.frame(mah.pro / mah.max) # Create and plot the raster layer NT2 <- read.asciigrid(fname=pro[1]) NT2@data <- nt2 library(raster) NT2rast <- raster(NT2) plot(NT2rast, col=rainbow(100), ylim=c(-35,-20), xlim=c(15,35))

So, the ExDet tool is a very convenient tool, but if you rather do the calculations in R, the above example shows that that is fairly easy too.

**Citation**: Mesgaran, M.B., Cousens, R.D. & Webber, B.L. (2014) Here be dragons: a tool for quantifying novelty due to covariate range and correlation change when projecting species distribution models. Diversity & Distributions, 20: 1147–1159, DOI: 10.1111/ddi.12209

Great – this will be handy. You can simplify your refdat (and similarly prodat) creation with: library(raster); refdat <- as.matrix(stack(ref)). Not sure how efficiency compares with your approach, though.

Good idea

Hi this is probably not the most efficient coding, but for anyone else playing around with the tool here is some code for plotting the NT1 values (univariate extrapolation).

#——————————————————————–#

# NT1 – UNIVARIATE EXTRAPOLATION

#——————————————————————–#

# UD(ij) = min(

MaxMin <- matrix(NA, nrow=2, ncol=ncol(refdat))

colnames(MaxMin) <- colnames(refdat); rownames(MaxMin) = c("max", "min")

for(i in 1:ncol(refdat)){

MaxMin[1,i] <- max(refdat[,i], na.rm=TRUE) # max value of each variable in the reference matrix

MaxMin[2,i] <- min(refdat[,i], na.rm=TRUE) # min value of each variable in the reference matrix

}

UDs <- matrix(NA, nrow(prodat), ncol(prodat))

for(j in 1:ncol(prodat)){

for(i in 1:nrow(prodat)){

UDs[i,j] <- (min(c(

(prodat[i,j] – MaxMin[2,j]),(MaxMin[1,j] – prodat[i,j]), 0), na.rm=TRUE))/

(MaxMin[1,j] – MaxMin[2,j])

}

}

UDs <- data.frame(UDs)

NT1 <- rowSums(UDs, na.rm=TRUE)

# Create and plot the raster layer NT1

mah.proR <- read.asciigrid(fname=pro[1]) # coordinates from a single grid

NT1 <- data.frame(NT1)

mah.proR@data <- NT1 # assing data values from above

library(raster)

NT1rast <- raster(mah.proR)

plot(NT1rast, col=rainbow(100), ylim=c(-35,-20), xlim=c(15,35))