R has some great packages for species distribution modelling. One of these packages is the dismo package.
Models objects created with one of the various distribution models available in dismo can be used to make prediction for any combination of values of the independent variables. To do this, you use the ‘predict‘ function. The predict function requires a model object and a RasterStack or dataframe with the independent variables.
So, what will be faster, a RasterStack or dataframe as input in the predict function? To try this out, I created a model with the bioclim function (from the dismo package), based on 220 presence points and 8 independent variables.
bc1 is the model object. Pred1 is a RasterStack with the raster layers of the independent variables. Pred2 is a dataframe with the independent variables
system.time( p <- predict(pred1, bc1))
user system elapsed
41.055 0.008 41.136
system.time( p2 <- predict(bc1, pred2))
user system elapsed
126.908 0.020 127.151
It looks like there is a clear winner here.
But what if I am using a model that is not part of the dismo package, a glm function for example? In the examples below, m1 is a gml model object, pred1 and pred2 are the same as above.
system.time( p3 <- predict(pred1, m1))
user system elapsed
1.656 0.048 1.698
system.time( p4 <- predict(m1, pred2))
user system elapsed
0.745 0.060 0.808
Here, using a dataframe seems to be the better option. I am not sure what’s behind these differences. But I do know that they can become quickly significant when running many models with large data sets.
