Jean-Pierre Rossi introduced a function to calculate the “Multivariate Environmental Similarity Surfaces” in R. Since then, the function has become part of the dismo package, which is a package maintained by Robert J. Hijmans and which offers a whole lot of functions for species distribution modelling.

The function calculates the MESS based on a point layer (the reference points) and a set of raster layers (the environmental data layers). It works great for smaller data sets, but you may run into trouble for very large data sets. So I needed to find a way to run this faster. And… I couldn’t. It is already a pretty neat piece of code, and I could not think of a better way of doing this in R. (~~I am working on~~ For an alternative to do this in GRASS, ~~but more about that later:~~ see here).

That is where I decided to try my luck on Stack Overflow. Stack Overflow is a programming Q & A site with a very active community, with questions often getting answered within hours. Or, in my case, within an hour! You can check out my question and the answer by Joshua Ulrich here. Below I will use the answer and expand on it to create an alternative way to run `messi`

. Obviously, due credit goes to Joshua Ulrich as well as to Jean-Pierre for the original code.

The key part of the `mess`

function in the dismo package is the `messi`

function, given below:

messi <- function(p, v) { niinf <- length(which(v <= p)) f <- 100 * niinf/length(v) if (f == 0) simi <- 100 * (p - min(v))/(max(v) - min(v)) if (0 < f & f <= 50) simi <- 2 * f if (50 <= f & f < 100) simi <- 2 * (100 - f) if (f == 100) simi <- 100 * (max(v) - p)/(max(v) - min(v)) return(simi) }

If you have an environmental layer `e`

, this is converted in a vector and the `messi`

function is run on every element of the vector to calculate the environmental dissimilarity. OK, I am not sure this is clear, check out the complete code otherwise. The point is that if you have a vector `e`

the `messi`

function is run `length(e)`

times. It doesn’t make a large difference for small vectors, but the overhead from R function calls really starts to add up with larger vectors.

The answer on Stack Overflow was to use the findInterval function to calculate `niinf`

(i.e., to replace the: `length(which(v <= p))`

in the function above. That also means the rest of the code needs to be adapted. One way is:

messi2 <- function(p,v){ f<-100*findInterval(p,sort(v))/length(v) a <- length(p) maxv <- max(v) minv <- min(v) opt1 <- 100*(p-minv)/(maxv-minv) opt2 <- 2*f opt3 <- 2 * (100-f) opt4 <- 100*(maxv-p)/(maxv-minv) simi <- ifelse(f==0, opt1, ifelse(f<=50, opt2, ifelse(f<100, opt3,opt4))) return(simi) }

So, how much faster is the new code? Let’s see:

set.seed(21) e <- rnorm(1e5) g <- rnorm(1e4) system.time(o1 <- sapply(X=e, FUN=messi, v=g)) user system elapsed 13.861 0.028 13.916 system.time(o2 <- messi2(e,g)) user system elapsed 0.084 0.000 0.087 all.equal(o1,o2) [1] TRUE

Not much further to say about those number…

**Update**: Robert Hijmans implemented the code above in the latest version of dismo.. and he managed to further improve the code, making it another 2-3 times faster!