Computing MESS in R and GRASS – a speed comparison

The Multivariate Environmental Similarity Surfaces (MESS) is an index that represents how similar a point is to a reference set of points, with respect to a set of predictor variables (Elith et al (2010). The function was first implemented as part of the Maxent software package, but is now also available in R and GRASS. Below, I will compare how fast the different implementations are.

  • In R, the function is implemented as the mess function in the dismo package.
  • Also for in R, I adapted the mess function to include some changes described in this post. You can downloaded the function (mess2) here. [this code was used (and improved) by the above mentioned mess function in the dismo package
  • In GRASS GIS, you can use the r.mess add-on. I created this add-on to be able to deal with larger data sets.

In the test below, I use the Bradypus data hosted in the dismo package. In R, you can simply prepare the data as follows:

filename <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='')
bradypus <- read.table(filename, header=TRUE, sep=',')
bradypus <- bradypus[,2:3]
files <- list.files(path=paste(system.file(package="dismo"),'/ex', sep=''), pattern='grd', full.names=TRUE )
predictors <- stack(files)
predictors <- dropLayer(x=predictors,i=9)
reference_points <- extract(predictors, bradypus)

To test the r.mess function, I first had to import the raster layers and the point file in GRASS GIS. To capture the execution time of the mess and mess2 functions, I used the system.time function in R. For r.mess, I use the default bash’s built-in time function.

system.time(mess.out <- mess(x=predictors, v=reference_points, full=TRUE))
   user  system elapsed
 11.704   0.028  11.795

system.time(mess.out2 <- mess2(x=predictors, v=reference_points, full=TRUE))
  user  system elapsed
 2.885   0.000   2.893 

 

time r.mess ref_vect=reference_points env_var=bio1,bio12,bio16,bio17,bio5,bio6,bio7,bio8 output=mess_out3 digits=0.001

real	0m2.612s
user	0m1.980s
sys 	0m5.336s

The mess2 function is clearly the winner here. It runs five times faster then mess, and about 2 times faster then r.mess. The data set I used was fairly small (192 x 186), so let’s see how fast the different implementations are when using much larger data sets (3936 x 4772). I am only comparing mess2 and r.mess in the example below, simply because I wasn’t patient enough to wait for mess to finish.

system.time(mess.out <- mess2(x=bioclims, v=sample_points, full=TRUE))
   user  system elapsed
736.971  45.899 788.125 

 

time r.mess ref_vect=sample_points env_var=bio_1,bio_2,bio_3 output=TEST digits=0.001

real	1m15.484s
user	1m38.146s
sys 	0m13.877s

Apart from the difference in time, I noticed that RAM usage use was jumping up to 9 GB when running mess2, while topping at 3.2 when running r.mess. In conclusion, this admittedly limited test suggests that for smaller data sets, you might want to go for mess2 in R (unless perhaps you data is already in a GRASS GIS database). For larger data sets, the r.mess function in GRASS is probably the better choice.

3 thoughts on “Computing MESS in R and GRASS – a speed comparison

  1. Bipin

    Dear Jradinger ,
    I am new in R and Species distribution modelling but have good experience in GIS and Remote Sensing. I am using MESS to compute dengue risk map using dismo package in R but dont know how result can be exported from R and also how to interpret and export MESS index value to excel and or ARC GIS environment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s