A short note on the use of predict with the dismo or raster package

R has some great packages for species distribution modelling. One of these packages is the dismo package.

Models objects created with one of the various distribution models available in dismo can be used to make prediction for any combination of values of the independent variables. To do this, you use the ‘predict‘ function. The predict function requires a model object and a RasterStack or dataframe with the independent variables.

So, what will be faster, a RasterStack or dataframe as input in the predict function? To try this out, I created a model with the bioclim function (from the dismo package), based on 220 presence points and 8 independent variables.

bc1 is the model object. Pred1 is a RasterStack with the raster layers of the independent variables. Pred2 is a dataframe with the independent variables

p <- predict(pred1, bc1))

user system elapsed
41.055 0.008 41.136

p2 <- predict(bc1, pred2))

user system elapsed
126.908 0.020 127.151

It looks like there is a clear winner here.

But what if I am using a model that is not part of the dismo package, a glm function for example? In the examples below, m1 is a gml model object, pred1 and pred2 are the same as above.

p3 <- predict(pred1, m1))

user system elapsed
1.656 0.048 1.698

p4 <- predict(m1, pred2))

user system elapsed
0.745 0.060 0.808

Here, using a dataframe seems to be the better option. I am not sure what’s behind these differences. But I do know that they can become quickly significant when running many models with large data sets.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s