Terrain attribute selection in environmental studies

Terrain attribute selection

Exploring species-environment relationships is important for amongst others habitat mapping, biogeographical classification, conservation, and management. And it has become easier with (i) the advance of a wide range of tools, including many open source tools, and (ii) availability of more relevant data sources. For example, there are many tools with which it is relatively easy to create a wide range of derived terrain variables using digital elevation (DEM) or bathymetric (DBM) models. However, the ease of use of many of these tools, especially when used by non-experts, may lead to the selection of arbitrary or sub-optimal set of variables. In addition, derived variables will often be highly correlated (Lecours et al. 2017).

The paper by Lecours et al. (2017) focuses on terrain attributes and provides a framework for the selection of the best sub-sets of terrain attributes. In addition, the paper aims to explore the relationship between the importance of these groups and terrain complexity. To this end, they compare a large number of terrain attributes that can be derived from digital terrain models using a range of different commercial and open source tools.

Based on their analysis they come with a recommended set of attributes, which include the 1) relative difference to mean value, 2) local standard deviation, 3) easterness, 4) northerness, 5) local mean, and (6) slope. Together these variables were found to account for most of the main terrain properties and the variation in these properties.

It is thereby  important to stress that for any real-life application, the selection of variables should foremost be based on their relevancy for the intended targets. For example, based on the ecology of a species, measures such as terrain wetness index could be more important for some species, while slope could be more important for others. Unfortunately, the lack of species specific information provides an impediment for an informed variable selection. In such cases, framework as proposed by Lecours et al. may help at least help to avoid covariation / multicollinearity and redundancy when selecting sets of explanatory variables.

Different tools to compute terrain attributes

An additional objective of Lecours et al. (2017) was to explore existing GIS software to compute available local terrain attributes. They compared 11 different commercial and open-source software. The open source tools they included are Diva-GIS, SAGA GIS, uDig and QGIS. For QGIS I suppose the authors looked at the GDAL tools available (QGIS provides also access to e.g., GRASS GIS, SAGA GIS, and Orfeo toolbox, as well as a large number of addons). Missing from this list, to my surprise, is GRASS GIS.  GRASS GIS is a well established open source GIS tools, especially in the academic world, and it offers an interesting set of tools for the computation of the main topographic attributes. These include:

  • r.slope.aspect – slope, aspect, curvatures, first and second order partial derivatives
  • r.param.scale – elevation, slope, aspect, profile curvature, plan curvature, longitudinal curvature, cross-sectional curvature, maximum curvature, morphometric features
  • r.neighbors – average, median, mode, minimum, maximum, range, standard deviation, variance, diversity, interspersion)
  • r.topidx – topographic wetness index

In addition there are a number addons that compute different topographic or terrain attributes, such as:

In short, in GRASS GIS you can compute all main terrain attributes, and some more. And if there is no tool to compute your favourite terrain attribute. In that case, there is always the versatile r.mapcalc function. With its powerful syntax, including a neighbourhood modifier, it offers a very flexible tool to define your own functions and neighbourhood filters.

References

Lecours, V., Devillers, R., Simms, A. E., Lucieer, V. L. & Brown, C. J. Towards a framework for terrain attribute selection in environmental studies. Environmental Modelling & Software 89, 19–30 (2017).

New release RQGIS

From an announcement from the QGIS mailing list: RQGIS has released a new version of RQGIS! RQGIS establishes an interface between R and QGIS, i.e. it allows the user to access the QGIS geoalgorithms from within R. With the new release, it is possible to run the most recent QGIS releases (>=2.18.2) with RQGIS.

RQGIS establishes an interface between R and QGIS, i.e. it allows the user to access QGIS functionalities from within R. It thus serves a similar goal as rgrass7 which provides an interface between R and GRASS GIS and rsaga, which provides access to geocomputing and terrain analysis functions of SAGA GIS from within R.

RQGIS provides access to all functions available through QGIS’s processing framework, which includes among others GRASS GIS and SAGA GIS (others are GDAL, the Orfeo Toolbox, TauDEM and tools for LiDAR data). Thus, RQGIS RQGIS offers an (partial) replacement of rgrass7 and rsaga packages. However, note that the latter two packages offer various other advantages.

I personally find rgrass7 and rsaga great tools that allow easy access to respectively GRASS and SAGA. I have less experience with RQGIS, but I can already tell that what I particularly like about RQGIS is that it makes it easier to combined different tools in an unified interface (which is of course the same reason I like the processing toolbox in QGIS). Take for example the find_algorithms function. This allows you to search for algorithms across all available toolboxes using keywords, similar to the search box in the processing toolbox in QGIS itself. I will certainly explore this package more!

You can find more information on the github page https://github.com/jannes-m/RQGIS, including installation instructions and some use examples. See also the blog post of one of the RQGIS authors.

 

 

 

 

 

 

Saving space on your HD – null file compression in GRASS GIS 7.2

The GRASS GIS development team recently released a new stable major release, GRASS GIS 7.2. The release brings more than 1900 fixes and improvements since the previous stable release 7.0.5. You’ll find a detailed overview of all the changes and improvements on this GRASS wiki page.

One important library change in a GRASS library is support for NULL file compression using the r.null function. This may not sound terribly exciting to all of you, but for those that have GRASS databases with large (number of) raster layers, this may save considerable space on the hard disk.

For now, compression of the NULL file is not enabled by default. Instead it must be explicitly turned on. You can do this by running on the command line:

export GRASS_COMPRESS_NULLS=1

This will only set this environmental variable for the current session. If you want to turn it on permanently, you can add the line above to the bashrc file in linux or the env.bat in Windows:

$HOME/.grass7/bashrc # on Linux
$HOME/.grass7/env.bat # on Windows

For more information about setting GRASS GIS and environmental variables, see this GRASS wiki page.

NULL file compression can be managed with r.null and the -z flag. Note that after the NULL file of a raster layer is compressed, it can only be opened with GRASS GIS 7.2.0 or later!

r.null -z map=your_raster_layer

To avoid having to do this for each layer individually, you can run r.null over all raster layers in a mapset or location using a loop. The example below will compress the null files of all raster layers in the location LOC of the grass database GDB. You can alternatively add a loop to include all locations in database.

LOC=latlon
GDB=/home/paulo/Data/GRASSdb
g.mapset mapset=PERMANENT location=$LOC dbase=$GDB
MPS=`g.mapset -l`
for j in $MPS; do
    g.mapset mapset=$j location=$LOC dbase=$GDB
    MAPS=`g.list type=raster pattern=* mapset=$j`
    for i in $MAPS; do
        r.null -z map=$i
    done
done

To give you an example of how much this could save you, after running this on a database of 160 GB, the database was reduced to 116 GB (-27.5%). Another database of 118 GB was reduced to 90 GB (-24%). Of course, your millage may vary considerably depending on the type of data contained in your database.

Let me warn again though, after compressing your null files, you will only be able to open them in GRASS GIS 7.2 or higher! And although I assume this is safe, backing up your data before running is of course never a bad idea.

 

The first release candidate of GRASS GIS 7.2.0 is out

I have been using the development version for some time now, and all I can say is that you definitely should give the new GRASS GIS 7.2.0RC1 release a try. It is, in my experience, very stable, and it provides more than 1900 stability fixes and manual improvements compared to the stable releases 7.0.x.

It also features a number of new modules. A favourite of mine is the new g.gui.datacatalog which makes it so much easier to browse, modify and manage GRASS maps across map sets and locations. Also very welcome is the new d.legend.vect module which can be used to display a vector legend in the active graphics frame. And for those that are into space-time analyses, there are also a number of new modules for the temporal framework.

For more information about all the improvements and changes, see the detailed announcement. And while you are at it, don’t forget to check out the add-ons, some great new ones have been added and updated in the last few months.

And last but not least, a big thanks to the developers!

 

 

 

A GRASS GIS addon to upload raster values and labels to a point layer

In GRASS GIS you can upload raster values at positions of vector points to the attribute table of that vector point layer using the function v.what.rast. If you also interested in the raster category labels, you can have a look at r.what, which lets you query a raster map  on their category values and category labels.

However, the results of r.what are written to a text file. If you want to upload raster values and labels to the attribute table of a point vector map, you can use  v.in.ascii to import the text file created with r.what as a point vector layer in GRASS GIS.

Fairly straightforward, but wouldn’t it be even more convenient if you you had an option in r.what.rast to also upload the category labels? Well, that option isn’t there yet, so for the time being, I have written a simple addon,  v.what.rastlabel, that fills in the gap, i.e., it let’s you upload the values and labels of one or more raster layers at positions of vector points to the attribute table of that vector point layer.

The addon is available from github. If you are running GRASS 7.2 or above, you can simply install it using g.extension :

g.extension v.what.rastlabel url=https://github.com/ecodiv/v.what.rastlabel

If you are still running GRASS GIS 7.0, see this page on the GRASS GIS wiki on how to install scripts. After installing the addon, you can run it by typing v.what.rastlabel on the command line.

screenshot-from-2016-09-14-15-17-04

 

Climate data sets, which one to select?

For species or vegetation modelling, one of the first choices to make is the selection of explanatory variables, which in most cases will include climatic or bioclimatic data sets. One of the most widely used global climate data sets in biogeographic and ecological research is from Worldclim (Hijmans et al., 2005). Alternative global rainfall data sets are from TAMSAT TARCAT (Maidment et al., 2014) and CHIRPS (Funk et al., 2014). The Worldclim data layers are based on an interpolation of average monthly climate data from weather stations. The other two data sets combine weather station data with satellite observations to improve accuracy where in situ rainfall measurements are sparse. All three data sets are available from the KITE resources website as part of the Africlim dataset (Platts et al. 2015).

Uncertainty in data sets based interpolation of weather station data can be highly uncertain, especially in mountainous and poorly sampled areas (Hijmans, et al., 2005). This is certainly an issue in eastern Africa, which is a topographically diverse region with a relative poor coverage of weather stations. On the other hand, rainfall estimates based on satellite imagery have issues as well. I am not a climatologists and I don’t find it easy to determine which data set I should use. But I can of course start by comparing the data sets. Below, I compare the long-term average annual rainfall data. Note that the Worldclim data set is representative for the time period 1950-2000, while the other two data sets are based on data from 1983-2012.

Click on image to enlarge /  open in slide-show

The images above show the mean annual rainfall. It is immediately evident that the average rainfall distribution as estimated by the TAMSAT data set deviates considerably from the other two estimates. Especially the low rainfall estimates for three of the five s0-called water towers of Kenya (Mount KenyaAberdare Range and the Mau Forest range) and Mount Kilimanjaro in Tanzania raise question marks.

In GRASS GIS it is easy to quickly compare two maps using the bivariate scatterplot tool in the Map display toolbar. Just select two raster layer and select the tool. You can further tweak the graph using the plot and text settings, and export it as png image or print it. Note that if you print it to file, you’ll get a PS (postscript) file, which you can further edit in e.g., Inkscape.

Click on image to enlarge /  open in slide-show

Below you see the scatterplots of Worldclim versus TAMSAT, Worldclim versus CHIRPS and TAMSAT versus CHIRPS (click on images to enlarge). They illustrate that there are large discrepancies in the estimated mean annual rainfall, and a R2 are between 0.73 and 0.8.

Click on image to enlarge /  open in slide-show

Another convenient tool, available from the toolbar in the Map display toolbar, is the profile analysis tool. With this tool you can display the values of one or more raster layers along a line which you can draw on the map canvas. This is particularly handy to see how two or more maps differ.

Click on image to enlarge /  open in slide-show

Below you can see the rainfall values along a transect I drew across the Kenyan highlands. The peaks in the graph are where the transect crosses Mount Kenya, the Aberdares and the Mau forest complex. The blue, red and green lines give the values of respectively the Worldclim, TAMSAT and CHIRPS dataset. The rainfall profile of TAMSAT suggests there is not much differences in annual rainfall between the mountain tops and the lowlands in between. The Worldclim and CHIRPS profiles are more alike, but with the Worldclim providing considerably higher estimates for the mountain peaks then CHIRPS.

Kenyan highlands
Mean annual rainfall values along a transect across the Kenyan highlands

It would be good to find out more about the differences between the Worldclim and CHIRPS estimates. For example, are these differences all due to data errors (in one or both data layers) or was the period 1983 – 2012 in fact drier than the 1950 – 2000 period? But that is a question I might get into later. For now it seems clear, to me at least, that the TAMSAT data has some issues, especially for the Kenyan highlands, suggesting it to be unsuitable for use in ecological or biogeographic studies in east Africa.

References

  • Funk, Chris, Pete Peterson, Martin Landsfeld, Diego Pedreros, James Verdin, Shraddhanand Shukla, Gregory Husak, James Rowland, Laura Harrison, Andrew Hoell & Joel Michaelsen. The climate hazards infrared precipitation with stations—a new environmental record for monitoring extremes. Scientific Data 2, 150066.
  • Hijmans, R.J., S.E. Cameron, J.L. Parra, P.G. Jones and A. Jarvis, 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25: 1965-1978.
  • Maidment, R., D. Grimes, R.P.Allan, E. Tarnavsky, M. Stringer, T. Hewison, R. Roebeling and E. Black (2014) The 30 year TAMSAT African Rainfall Climatology And Time series (TARCAT) data set. Journal of Geophysical Research 119 (18), 10,619–10,644.
  • Platts PJ, Omeny PA, Marchant R (2015). AFRICLIM: high-resolution climate projections for ecological applications in Africa. African Journal of Ecology 53, 103-108.

 

Use R to get gbif data into a GRASS database

Introduction

GBIF

The Global Biodiversity Information Facility (GBIF) is an international open data infrastructure that allows anyone, anywhere to access data about all types of life on Earth, shared across national boundaries via the Internet. GBIF provides a single point of access through http://www.gbif.org/ to species records shared freely by hundreds of institutions worldwide. The data accessible through GBIF relate to evidence about more than 1.6 million species, collected over three centuries of natural history exploration and including current observations from citizen scientists, researchers and automated monitoring programs.

There are various ways to import GBIF data, including directly from the website as comma delimited file (csv) and using the v.in.gbif addon for GRASS (I’ll post an example using this addon at a later stage). Here, however, I’ll use the rgbif package for R to obtain the data. In the link section some tutorials are listed that illustrate the use of other R packages. Continue reading “Use R to get gbif data into a GRASS database”