Creating polar diagrams with confidence intervals in R

The function r.polar in GRASS GIS allows you to create simple polar diagrams of e.g., the slope aspect or wind directions. These polar diagrams make it easy to spot directional bias, for example in the distribution of the slope aspect within vegetation Fb in Figure 1. The next step is to test whether the observed bias is significant.

Figure 1: Vegetation map of the mount Kenya - Aberdares region in SW Kenya.

From figure 2, it is clear that Fb occurs mostly on western and eastern slopes.

Figure 2: Distribution of slope aspect within the vegetation type Fb

In contrast, figure 3 shows that most slopes in the area are facing east, and to a lesser extent west. So does this mean that Fb occurs significantly more often on western slopes then what one would expect based on the overall aspect frequency distribution in the study area?

Figure 3: Distribution of slope aspect in the area mapped in figure 1

This can be tested using a Monte Carlo permutation test. In this test the observed aspect frequency distribution within Fb is compared with the aspect frequency distribution of randomly selected points from the whole study area, where the number of randomly selected points is equal to the number of points in Fb. This is repeated many times after which the lower and upper confidence interval boundaries are calculated based on the 0.05 and 0.95 percentiles of these random frequency distributions.

# Libraries
 library(spgrass6); library(plotrix);library(CircStats)

# Prepare data: get data from GRASS database and create separate data frame with cells falling within Fb. Note that all aspect values are rounded to the nearest integer, thus creating 360 bins.

mydata <- readRAST6(vname=c("vegetationmap","aspect"), useGDAL=F)
 names(mydata@data) <- c("vegetation", "aspect")
 mydata <- na.omit(
 mydata$aspect <- round(mydata$aspect)

ds <- subset(mydata, vegetation==4)
 ads <-$aspect)))

# Create confidence intervals: 1000 distributions based on random selection of raster cells. From these the 0.05 and 0.95 percentiles are calculated
 AS <- data.frame()
 for(i in 1:1000){
 ASD <- mydata[sample(1:nrow(mydata), dim(ds)[1], replace=FALSE),]
 ASD <-$aspect))
 AS <- rbind(AS,ASD)

ads2 <- aggregate(AS$Freq, by=list(AS$Var1), FUN=function(x){t(quantile(x, probs=0.05))})
 ads3 <- aggregate(AS$Freq, by=list(AS$Var1), FUN=function(x){t(quantile(x, probs=0.95))})

# Plot the confidence intervals: Note that the aspect categories as created by GRASS represent the number degrees of east and they increase counterclockwise: 90deg is North, 180 is West, 270 is South and 360 is East. This values are transformed to normal degrees below


polar.plot(ads3[,2], 450-as.numeric(as.character(ads3[,1])), rp.type="p", line.col="blue", poly.col="blue", start=90, clockwise=T, show.grid.labels=FALSE, lwd=1, radial.lim=c(0,8000))
polar.plot(ads2[,2], 450-as.numeric(as.character(ads2[,1])), rp.type="p", line.col="blue", poly.col="white", start=90, clockwise=T, show.grid.labels=FALSE, lwd=1, add=TRUE)

# Plot the observed distribution
 polar.plot(as.numeric(ads$Freq), 450-as.numeric(as.character(ads$Var1)), rp.type="p", line.col="red", lwd=0.5, start=90, clockwise=T, add=T, show.grid=TRUE)

Plotting the observed frequency distribution on top of the confidence intervals (figure 4) shows that the distribution of Fb is significantly biased towards western slopes, while it occurs significantly less on the southern slopes.

Observed frequency distribution of the slope aspect within vegetation Fb (red) and the 0.05 and 0.95 lower and upper confidence limits of the expected frequency distribution (blue).

I hope the example above demonstrates a few things which are relatively easy to do in R:

  • import data from GRASS in R
  • Use R to create polar plots
  • Use R to create a Monte Carlo permutation test.

Note that there are R packages dedicated to permutation and bootstrapping tests. I am not sure though how to do the above using functions from these packages. I am not saying it is not possible, just that I didn’t find out yet how. Also, the method described above isn’t terribly fast, so if you are dealing with very large data sets, use either smaller sub-samples, or be patient. And of course, if you know better ways of doing this, let me know in the comments below.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s