Data exploration in GRASS GIS – boxplots

I am currently working on some exercises for which I need data about municipalities in the Netherlands. A good place to look for such data is the CBS (Dutch Central Bureau of Statistics). One data layer is vector layers of the dutch municipalities and neighborhoods, which include demographic data.

One of the first things I normally do when exploring new data is to look at the distribution of the data. For example by creating a histogram using the d.vect.colhist addon (see my earlier post). But what if I want to compare the distribution of different groups or samples? In such a case I find boxplots more convenient. However, there is no tool in GRASS GIS to create boxplots, so I had a look at the d.vect.colhist addon code and adapted the code to create boxplots instead of histograms.

An example

Let’s for example look at the average population densities of the municipalities.

The average population density (number of inhabitants / km2) per municipality in 2017. Source: CBS.

What if I want to compare the distribution of the average population density per provinces Dutch provinces? You can install the addon (see the end of this post) and run d.vect.colbp on the command line or the console. This will open a window with different tabs.

In the first tab, you can define a column in the attribute table to plot (here BEV_DICHTH, which is the column with the population density) and a column that will be used to group the data (here provincie, which gives the names of the provinces the municipality belongs to). As you can see in the screenshot above, you have a few options to change the plot (layout). In this case, I choose to rotate the x-axis labels so they do not overlap. The resulting plot looks like:

The distribution of the average population densities of the Dutch municipalities per province.

You can of course also use the command line. In this case I will plot the boxplots horizontally using the ‘h flag’.

d.vect.colbp -h map=gemeenten@CBS column=BEV_DICHTH \
    where="AANT_INW > 1" plot_output=example_1.png \
    group_by=provincie order=ascending --overwrite

With will give you the plot below.

The distribution of the average population densities of the Dutch municipalities per province.

The add-on does not provide further options to change the appearance of the plot, as the main idea is to use this for quick exploration of your data, similar to the other plotting tools in GRASS GIS. However, you can save the plot as a svg file, and further edit it in e.g., Inkscape.

Testing the add-on

I probably will try to integrate the option to create boxplots in the existing d.vect.colhist add-on, but for now it is available as a separate addon on Github for testing. For GRASS GIS version 7.2 and above, you can use g.extension; simply type on the command line:

g.extension d.vect.colbp \
    url=https://github.com/ecodiv/d.vect.colbp

I am sure there is a lot to improve as this was a rather quick hack, so any feedback will be most welcome. If you try it out and run into problems, please let me know (suggestions for improvements are of course also welcome).

Advertisements

Draw a histogram of vector attribute column in GRASS GIS

GRASS GIS has convenient tools to draw histograms of raster values. As similar tool to draw a histogram of values in a vector attribute table lacks. But you can easily add this functionality by installing the d.vect.colhist addon by Moritz Lennert. Read this short post on Ecodiv.earth tutorials.

bitmap

Update r.vif add-on for GRASS GIS

I just updated the r.vif add-on. The add-on let’s you do a step-wise variance inflation factor (VIF) procedure. As explained in more detail here, the VIF  can be used to detect multicollinearity in a set of explanatory variables. The step-wise selection procedure provides a way to select a et of variables with sufficient low multicollinearity.

The update should make the computation of VIF much faster. For very large raster layers it is possible to have the VIF computed based on a random subset of raster cells. There is also a low-memory option. This allows one to run this add-on with much larger data sets. But, as explained in the r.vif manual page, it also runs considerably slower.

A GRASS GIS addon to upload raster values and labels to a point layer

In GRASS GIS you can upload raster values at positions of vector points to the attribute table of that vector point layer using the function v.what.rast. If you also interested in the raster category labels, you can have a look at r.what, which lets you query a raster map  on their category values and category labels.

However, the results of r.what are written to a text file. If you want to upload raster values and labels to the attribute table of a point vector map, you can use  v.in.ascii to import the text file created with r.what as a point vector layer in GRASS GIS.

Fairly straightforward, but wouldn’t it be even more convenient if you you had an option in r.what.rast to also upload the category labels? Continue reading “A GRASS GIS addon to upload raster values and labels to a point layer”

Update of the r.forestfrag addon for GRASS GIS

Some time ago I came across this post from Sylla Consult about a script to calculate forest fragmentation index suggested by Riitters et al. (2000). Obviously, it can be used for any land cover type, so perhaps landscape fragmentation index would be a better name. Anyway, the script r.forestfrag.sh is available from the GRASS-addons page.

Unfortunately, it only worked with GRASS 6.4. Because I mostly work in GRASS 7.0 I adapted the script to make it work on GRASS 7.0.  I also added some additional options and changes: Continue reading “Update of the r.forestfrag addon for GRASS GIS”