The Multivariate Environmental Similarity Surfaces (MESS) is an index that represents how similar a point is to a reference set of points, with respect to a set of predictor variables (Elith et al (2010). The function was first implemented as part of the Maxent software package, but is now also available in R and GRASS. Below, I will compare how fast the different implementations are.
- In R, the function is implemented as the
messfunction in the dismo package.
- Also for in R, I adapted the mess function to include some changes described in this post.
You can downloaded the function (. [this code was used (and improved) by the above mentioned mess function in the dismo package
- In GRASS GIS, you can use the r.mess add-on. I created this add-on to be able to deal with larger data sets.
In the test below, I use the Bradypus data hosted in the dismo package. In R, you can simply prepare the data as follows:
filename <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='') bradypus <- read.table(filename, header=TRUE, sep=',') bradypus <- bradypus[,2:3] files <- list.files(path=paste(system.file(package="dismo"),'/ex', sep=''), pattern='grd', full.names=TRUE ) predictors <- stack(files) predictors <- dropLayer(x=predictors,i=9) reference_points <- extract(predictors, bradypus)
To test the r.mess function, I first had to import the raster layers and the point file in GRASS GIS. To capture the execution time of the
mess2 functions, I used the system.time function in R. For r.mess, I use the default bash’s built-in time function.
system.time(mess.out <- mess(x=predictors, v=reference_points, full=TRUE)) user system elapsed 11.704 0.028 11.795 system.time(mess.out2 <- mess2(x=predictors, v=reference_points, full=TRUE)) user system elapsed 2.885 0.000 2.893
time r.mess ref_vect=reference_points env_var=bio1,bio12,bio16,bio17,bio5,bio6,bio7,bio8 output=mess_out3 digits=0.001 real 0m2.612s user 0m1.980s sys 0m5.336s
mess2 function is clearly the winner here. It runs five times faster then
mess, and about 2 times faster then
r.mess. The data set I used was fairly small (192 x 186), so let’s see how fast the different implementations are when using much larger data sets (3936 x 4772). I am only comparing
r.mess in the example below, simply because I wasn’t patient enough to wait for
mess to finish.
system.time(mess.out <- mess2(x=bioclims, v=sample_points, full=TRUE)) user system elapsed 736.971 45.899 788.125
time r.mess ref_vect=sample_points env_var=bio_1,bio_2,bio_3 output=TEST digits=0.001 real 1m15.484s user 1m38.146s sys 0m13.877s
Apart from the difference in time, I noticed that RAM usage use was jumping up to 9 GB when running mess2, while topping at 3.2 when running r.mess. In conclusion, this admittedly limited test suggests that for smaller data sets, you might want to go for mess2 in R (unless perhaps you data is already in a GRASS GIS database). For larger data sets, the r.mess function in GRASS is probably the better choice.