R/GRASS connection: more than the sum of its parts

On March 25, 2011March 25, 2011 By ecodiv.earthIn Data handling, GIS, GRASS GIS, Modelling, R computing environment

GRASS GIS is a very powerfull GIS offering an extensive set of tools for geospatial data management and analysis, image processing, and spatial modeling. The possibility to directly interact with R further improves the geospatial analysis capabilities of GRASS (see R Spatial View).

One of the advantages less often mentioned is that GRASS GIS can handle very large spatial data layers (especially raster data). It can thus be used to circumvent one of the drawbacks of R, its inability to deal with (very) large data sets (both analyses and visualization).

Suppose you want to create a species distribution model based on x number of sample points and 10 environmental variables. For building the model you can simple sample the raster maps at the locations of the vector points in GRASS using the v.what.rast function in GRASS. Next, you use the readVECT6() function in the R spgrass6 package to import the vector point layer in R (you need to run R from within GRASS).

Predictions from the fitted species distribution model can be easily mapped in the GRASS-R interface using the predict function and a data frame of raster variables that were used in the model. But this is where you might run into trouble if dealing with very large datasets. For example, 10 environmental layers of 500 MB each means you will need a lot of RAM to run a model in R.

One solution is to break up your data in smaller bits. Luckily this is very easy in GRASS using the r.tileset function. To access this function, look in the Raster menu –> Develop Raster map –> Tiling. Or type in r.tileset on the command line. The function produces an user-defined number of tiles (in the same or other projection then your current mapset).

It doesn’t actually produces the raster tiles themselves, but a text (file or console) with the coordinates of the bounding boxes of each tile and the number of columns and rows. By running the r.tileset through R (using system calls, you can run any GRASS command in R, as long as you start R on the GRASS GIS command line), you can capture those tile parameters and use this in a loop to import the environmental data layers and create the species suitability distribution maps tile by tile.

For example, if you have created a model gam1 based on 3 climatic variables, you can create a species distribution layers using something like:

require(spgrass6)
tilecoord <- system("r.tileset -g sourceproj=+init=epsg:4326 maxrows=1000 maxcols=1000 sf= ", intern=TRUE)
tilecoord <- gsub(";", " ", tilecoord)
for(i in 1:length(a)){
    system(paste("g.region", tilecoord[i]))
    r1 <- readRAST6("clim1@climate")
    r2 <- readRAST6("clim2@climate")
    r3 <- readRAST6("clim3@climate")
    predict = as.numeric(predict(gam1,newdata=list(clim1=r1@data, clim2=r2@data, clim3=r3@data),type="response"))
    r1@data <- predict
    writeRAST6(r1, "specdistr")
}

14 thoughts on “R/GRASS connection: more than the sum of its parts”

Johannes

Very nice post, specially the r.tileset function, something I wasn’t aware of.

March 26, 2011 at 4:07 pm Reply
1. pvanb
  
  Thanks, glad you liked it. Actually, I wasn’t aware of this function up to recently. I actually wrote a script in R to do something similar before coming across this function. And after reading the help file initially I wasn’t sure this was what I needed as the focus seems to be on creating tiles in a different projection than your current one, with some possible side effects. It was only after seeing one of the examples I realized that when using the same target and source projection, this is simply providing you with the tile parameters, with or without overlap (the overlap can be important for any analyses that takes into account neighboring cells).
  
  March 26, 2011 at 4:17 pm Reply
Clovis

This is a really useful post! I used to struggle to make my own tile manually to overcome the memory size limit of R. Hope some day R will be able to handle more than hundreds Mo…

By the way, i had a error message using the function:
raster map/current region mismatch detected in components:
cols rows origin.northing origin.easting
FALSE FALSE FALSE FALSE
set plugin=TRUE to override; continuing with plugin=FALSE

So far i didn’t figured out what mean this message. The function did the job anyway.
Thanks again

April 5, 2011 at 6:13 pm Reply
1. pvanb
  
  I am not sure what that message means, but have a look at the options of readRAST6, and especially the items on ‘plugin’ and ‘useGDAL’.
  
  My interpretation is (but I might be wrong) that the message means the raster is not equal to the current region settings. The function will therefore use r.in.bin, and most important, respect the current region. The latter is obviously what makes the above described script work.
  
  But you could try out different settings for above-mentioned arguments to see what difference it makes. If you find out more, I would like to hear about it.
  
  April 5, 2011 at 9:22 pm Reply
Clovis

It was indeed related to the plugin…(i should have looked better to help by the way!). To my understanding, by default it check wether it should use the mapset region or the g.region definition (auto detection). By setting it to FALSE it used the current region and TRUE force to use the original raster definition.

But this is good to know because tuning plugin to FALSE is obviously less time consuming!

Another little difference between your code and my test: in the predict sentence, you change the variable name on the fly when creating a list containing the exploratory variables. I tried several things (quotes, list and data.frame) but in any case the column names don’t change wich make impossible to run the model. I got around this problem by creating the variable dataset and then changing the column names, but your way is by far the best.

Have you already encounter such problem?

Thanks

April 6, 2011 at 7:16 pm Reply
1. pvanb
  
  The idea to use a list instead of a dataframe (as mentioned in the help) came from an example I saw in a tutorial. I can’t remember where I got that tutorial from, but it seems to work for me. What if you try the example that comes in the gam help?
  
  data(gam.data) gam.object <- gam(y ~ s(x,6) + z, data=gam.data) predict(gam.object) data(gam.newdata) predict(gam.object, gam.newdata, type="terms")
  
  Now create two variables from the gam.newdata dataframe and use those in the predict function instead of the dataframe:
  
  A <- gam.newdata$x B <- gam.newdata$z predict(gam.object, list(x=A, z=B), type="terms")
  
  Does that work for you? Or am I overlooking something (very well possible)?
  
  April 6, 2011 at 9:01 pm Reply
Clovis

Ok, I fixed my problem.

Some explanations:
The original problem with column names is that when using the predict function in the sample code you provided I got this message:

> predict = as.numeric(predict(mod,newdata=list(Var1=r1@data,Var1=r2@data)))
Erreur dans eval(expr, envir, enclos) : objet ‘Var1’ introuvable

which suggest that it doesn’t find the name of the variable used to calibrate the model…

When checking the gam example I noted a subtle difference. The list created in the example is a list of vector, whereas on my case it was a list of dataframe..( class(r1@data) –> “data.frame”).

Thus everything gets perfect with this little adjustement:

> predict = as.numeric(predict(mod,newdata=list(Var1=r1@data[,1],Var1=r2@data[,1])))

April 7, 2011 at 9:45 am Reply
Jens

Hi, very nice! Do you know if it is possible to do the prediction from R on GRASS rasters directly within the LOCATION, i.e. without loading them into R? My rasters are each 1GB large (on a 30m resolution) and I need to downscale them, that´s a pity, since so many details get lost. However, loading all 10 rasters kills my system 🙂

August 25, 2011 at 5:26 pm Reply
1. pvanb
  
  I don’t think so. But as you mention I’m your next comment, r.tileset might help you out. Of course there is an overhead for having to read / write multiple times, but at least you can avoid memory problems
  
  August 25, 2011 at 6:24 pm Reply
Jens

ach….I forgot the r.tileset thing, which is nice, but I did not understand it fully. Tileset splits each one of my datasets into let´s say 10 tiles. Your example showed 3 climatic variables but not 3 tiles, did it? or does the loop glue the tiles together at the end? I will try…but could you answer me please a very easy question? How can I start R from the GRASS command line?? Just typing R in the shell does not work? Sorry for that trivial question but that would he me at the moment a lot. thanks for the great inspiring blog.

August 25, 2011 at 5:33 pm Reply
1. pvanb
  
  It is a while ago I wrote that example, but from what I see now, te code does not give unique names to the output raster tiles when writing to GRASS. It should, so next you can use r.patch to patch (glue) them together. But I will have another look and improve the code later when I find the time.
  
  As for running R from within GRASS, it should work without problems in Linux. It should on Windows too, but it is a long time I used GRASS on Windows so I am not really sure what could be wrong.
  
  August 25, 2011 at 7:06 pm Reply
  1. oldixl
    
    Thanks! If I find it out, I will post it on my blog 😉
    
    August 25, 2011 at 7:39 pm
Pingback: If you run out of memory, split up your data | Ecostudies
pvanb

If you are a Window user and are using or want to use R and GRASS in combination, this announcement will make you happy: http://gfoss.blogspot.nl/2012/09/nice-new-wingrass-7-with-r-integration.html

September 5, 2012 at 8:34 am Reply

Share this:

Related

14 thoughts on “R/GRASS connection: more than the sum of its parts”

Leave a comment Cancel reply