R/GRASS connection: more than the sum of its parts

GRASS GIS is a very powerfull GIS offering an extensive set of tools for geospatial data management and analysis, image processing, and spatial modeling. The possibility to directly interact with R further improves the geospatial analysis capabilities of GRASS (see R Spatial View).

One of the advantages less often mentioned is that GRASS GIS can handle very large spatial data layers (especially raster data). It can thus be used to circumvent one of the drawbacks of R, its inability to deal with (very) large data sets (both analyses and visualization).

Suppose you want to create a species distribution model based on x number of sample points and 10 environmental variables. For building the model you can simple sample the raster maps at the locations of the vector points in GRASS using the v.what.rast function in GRASS. Next, you use the readVECT6() function in the R spgrass6 package to import the vector point layer in R (you need to run R from within GRASS).

Predictions from the fitted species distribution model can be easily mapped in the GRASS-R interface using the predict function and a data frame of raster variables that were used in the model. But this is where you might run into trouble if dealing with very large datasets. For example, 10 environmental layers of 500 MB each means you will need a lot of RAM to run a model in R.

One solution is to break up your data in smaller bits. Luckily this is very easy in GRASS using the r.tileset function. To access this function, look in the Raster menu –> Develop Raster map –> Tiling. Or type in r.tileset on the command line. The function produces an user-defined number of tiles (in the same or other projection then your current mapset).

It doesn’t actually produces the raster tiles themselves, but a text (file or console) with the coordinates of the bounding boxes of each tile and the number of columns and rows. By running the r.tileset through R (using system calls, you can run any GRASS command in R, as long as you start R on the GRASS GIS command line), you can capture those tile parameters and use this in a loop to import the environmental data layers and create the species suitability distribution maps tile by tile.

For example, if you have created a model gam1 based on 3 climatic variables, you can create a species distribution layers using something like:

require(spgrass6)
tilecoord <- system("r.tileset -g sourceproj=+init=epsg:4326 maxrows=1000 maxcols=1000 sf= ", intern=TRUE)
tilecoord <- gsub(";", " ", tilecoord)
for(i in 1:length(a)){
    system(paste("g.region", tilecoord[i]))
    r1 <- readRAST6("clim1@climate")
    r2 <- readRAST6("clim2@climate")
    r3 <- readRAST6("clim3@climate")
    predict = as.numeric(predict(gam1,newdata=list(clim1=r1@data, clim2=r2@data, clim3=r3@data),type="response"))
    r1@data <- predict
    writeRAST6(r1, "specdistr")
}
About these ads

14 thoughts on “R/GRASS connection: more than the sum of its parts

    1. pvanb Post author

      Thanks, glad you liked it. Actually, I wasn’t aware of this function up to recently. I actually wrote a script in R to do something similar before coming across this function. And after reading the help file initially I wasn’t sure this was what I needed as the focus seems to be on creating tiles in a different projection than your current one, with some possible side effects. It was only after seeing one of the examples I realized that when using the same target and source projection, this is simply providing you with the tile parameters, with or without overlap (the overlap can be important for any analyses that takes into account neighboring cells).

      Reply
  1. Clovis

    This is a really useful post! I used to struggle to make my own tile manually to overcome the memory size limit of R. Hope some day R will be able to handle more than hundreds Mo…

    By the way, i had a error message using the function:
    raster map/current region mismatch detected in components:
    cols rows origin.northing origin.easting
    FALSE FALSE FALSE FALSE
    set plugin=TRUE to override; continuing with plugin=FALSE

    So far i didn’t figured out what mean this message. The function did the job anyway.
    Thanks again

    Reply
    1. pvanb Post author

      I am not sure what that message means, but have a look at the options of readRAST6, and especially the items on ‘plugin’ and ‘useGDAL’.

      My interpretation is (but I might be wrong) that the message means the raster is not equal to the current region settings. The function will therefore use r.in.bin, and most important, respect the current region. The latter is obviously what makes the above described script work.

      But you could try out different settings for above-mentioned arguments to see what difference it makes. If you find out more, I would like to hear about it.

      Reply
  2. Clovis

    It was indeed related to the plugin…(i should have looked better to help by the way!). To my understanding, by default it check wether it should use the mapset region or the g.region definition (auto detection). By setting it to FALSE it used the current region and TRUE force to use the original raster definition.

    But this is good to know because tuning plugin to FALSE is obviously less time consuming!

    Another little difference between your code and my test: in the predict sentence, you change the variable name on the fly when creating a list containing the exploratory variables. I tried several things (quotes, list and data.frame) but in any case the column names don’t change wich make impossible to run the model. I got around this problem by creating the variable dataset and then changing the column names, but your way is by far the best.

    Have you already encounter such problem?

    Thanks

    Reply
    1. pvanb Post author

      The idea to use a list instead of a dataframe (as mentioned in the help) came from an example I saw in a tutorial. I can’t remember where I got that tutorial from, but it seems to work for me. What if you try the example that comes in the gam help?

      data(gam.data)
      gam.object <- gam(y ~ s(x,6) + z, data=gam.data)
      predict(gam.object)
      data(gam.newdata)
      predict(gam.object, gam.newdata, type="terms")

      Now create two variables from the gam.newdata dataframe and use those in the predict function instead of the dataframe:

      A <- gam.newdata$x
      B <- gam.newdata$z
      predict(gam.object, list(x=A, z=B), type="terms")

      Does that work for you? Or am I overlooking something (very well possible)?

      Reply
  3. Clovis

    Ok, I fixed my problem.

    Some explanations:
    The original problem with column names is that when using the predict function in the sample code you provided I got this message:

    > predict = as.numeric(predict(mod,newdata=list(Var1=r1@data,Var1=r2@data)))
    Erreur dans eval(expr, envir, enclos) : objet ‘Var1′ introuvable

    which suggest that it doesn’t find the name of the variable used to calibrate the model…

    When checking the gam example I noted a subtle difference. The list created in the example is a list of vector, whereas on my case it was a list of dataframe..( class(r1@data) –> “data.frame”).

    Thus everything gets perfect with this little adjustement:

    > predict = as.numeric(predict(mod,newdata=list(Var1=r1@data[,1],Var1=r2@data[,1])))

    Reply
  4. Jens

    Hi, very nice! Do you know if it is possible to do the prediction from R on GRASS rasters directly within the LOCATION, i.e. without loading them into R? My rasters are each 1GB large (on a 30m resolution) and I need to downscale them, that´s a pity, since so many details get lost. However, loading all 10 rasters kills my system :)

    Reply
    1. pvanb Post author

      I don’t think so. But as you mention I’m your next comment, r.tileset might help you out. Of course there is an overhead for having to read / write multiple times, but at least you can avoid memory problems

      Reply
  5. Jens

    ach….I forgot the r.tileset thing, which is nice, but I did not understand it fully. Tileset splits each one of my datasets into let´s say 10 tiles. Your example showed 3 climatic variables but not 3 tiles, did it? or does the loop glue the tiles together at the end? I will try…but could you answer me please a very easy question? How can I start R from the GRASS command line?? Just typing R in the shell does not work? Sorry for that trivial question but that would he me at the moment a lot. thanks for the great inspiring blog.

    Reply
    1. pvanb Post author

      It is a while ago I wrote that example, but from what I see now, te code does not give unique names to the output raster tiles when writing to GRASS. It should, so next you can use r.patch to patch (glue) them together. But I will have another look and improve the code later when I find the time.

      As for running R from within GRASS, it should work without problems in Linux. It should on Windows too, but it is a long time I used GRASS on Windows so I am not really sure what could be wrong.

      Reply
  6. Pingback: If you run out of memory, split up your data | Ecostudies

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s