R has some great packages for species distribution modelling. One of these packages is the dismo package.

Models objects created with one of the various distribution models available in dismo can be used to make prediction for any combination of values of the independent variables. To do this, you use the ‘*predict*‘ function. The predict function requires a model object and a RasterStack or dataframe with the independent variables.

So, what will be faster, a RasterStack or dataframe as input in the predict function? To try this out, I created a model with the bioclim function (from the dismo package), based on 220 presence points and 8 independent variables.

bc1 is the model object. *Pred1* is a RasterStack with the raster layers of the independent variables. *Pred2* is a dataframe with the independent variables

system.time( p <- predict(pred1, bc1))

`user system elapsed`

41.055 0.008 41.136

system.time( p2 <- predict(bc1, pred2))

`user system elapsed`

126.908 0.020 127.151

It looks like there is a clear winner here.

But what if I am using a model that is not part of the dismo package, a glm function for example? In the examples below, *m1* is a gml model object, *pred1* and *pred2* are the same as above.

system.time( p3 <- predict(pred1, m1))

`user system elapsed`

1.656 0.048 1.698

system.time( p4 <- predict(m1, pred2))

`user system elapsed`

0.745 0.060 0.808

Here, using a dataframe seems to be the better option. I am not sure what’s behind these differences. But I do know that they can become quickly significant when running many models with large data sets.