Sum up values of neighbouring polygons – a GRASS GIS approach

How to sum up values of neighbouring polygons in QGIS? This question was asked on gis.stackexchange, with two interesting answers. One answer explains how to use Spatialite and SQL to achieve this. The other answer, explained in more detail here, presents a script using pyqgis and shapely. In this post I am using an alternative approach, using GRASS GIS.

As an example, I’ll have a vector layer EA with the national boundaries of 7 countries in eastern Africa. In the attribute table, there is a column id, with the country ID and a column pop which gives the total human population for each country. I want to find out for each country the total population of all neighbouring countries.

GRASS GIS is a topological GIS, so finding all neighbouring polygons is easy, just use the following three commands:

v.category EA out=EAc layer=2 type=boundary option=add
v.db.addtable EAc layer=2 col="left integer,right integer"
v.to.db EAc option=sides col=left,right layer=2 type=boundary

You have now created a new vector map EAc with two layers using the v.category function  (layers are a characteristic of the vector feature geometries file and allow several parallel, but different groupings of features in a same map). The first layer is copied from the original layer and contains the unique ids of the country polygons. It links to the attribute table EAc with country ids and human population. The second layer contains the unique ids of the boundary lines. This links to the attribute table EAc_2 which you created with v.db.addtable and which you populated with the id’s of the polygons at the left and the right of each boundary line using the v.to.db command.

The attribute table is linked to the boundary lines (cat). In the column 'left' the ID of the polygon at the left of the boundary is given, in the column 'right' the ID of the polygon at the right.
Attribute table linked to the boundary lines. The ‘cat’ column contains the ID of the boundary lines. In the column ‘left’ the ID of the polygon at the left of the boundary is given, in the column ‘right’ the ID of the polygon at the right of the boundary (these polygon IDs are taken from the first attribute table EAc). When there is no polygon at the left or right, a -1 is given.

The next step is to combine the two attribute tables. This can be done with the SQL statement below (run from within GRASS using the db.execute command), which creates a new table tmp with a column with country IDs and a column with the sum of the human population in neighbouring countries.

db.execute sql="CREATE TABLE tmp AS
SELECT ID, sum(pop) as population FROM (

SELECT DISTINCT EAc_2.left as ID, EAc.pop as pop
FROM EAc_2
LEFT JOIN EAc ON EAc_2.right = EAc.cat
WHERE EAc_2.left > -1 AND EAc_2.right > -1

UNION

SELECT DISTINCT EAc_2.right as ID, EAc.pop as pop
FROM EAc_2
LEFT JOIN EAc ON EAc_2.left = EAc.cat
WHERE EAc_2.left > -1 AND EAc_2.right > -1

) GROUP BY ID"

To understand what this SQL statement let’s break it up:

The two SELECT DISTINCT statement join the population values of the countries to the country IDs in the column ‘left’ and ‘right’ respectively. Because a boundary between two countries can consist of multiple boundary lines, you may end up with duplicate rows. These are removed using the DISTINCT clause. The WHERE statement is to filter out all boundaries bordering one area only.

The results of the two SELECT DISTINCT statements are combined using the UNION statement. This is wrapped in another SELECT statement in which the sum() function in combination with the GROUP BY statement is used to sum for each country all population values of the neighbouring countries.

The last step is to join the table tmp to the attribute table of the original vector layer using v.db.join.

v.db.join map=EA@ConsStat layer=1 column=cat otable=tmp ocolumn=ID

To be honest, it wasn’t as straightforwards as I though it would be, but perhaps I am missing a more obvious / easy way to do this. Anyway, below the whole code in one go:

v.category EA out=EAc layer=2 type=boundary option=add
v.db.addtable EAc layer=2 col="left integer,right integer"
v.to.db EAc option=sides col=left,right layer=2 type=boundary
db.execute sql="CREATE TABLE tmp AS
SELECT ID, sum(pop) as population FROM (
SELECT DISTINCT EAc_2.left as ID, EAc.pop as pop
FROM EAc_2
LEFT JOIN EAc ON EAc_2.right = EAc.cat
WHERE EAc_2.left > -1 AND EAc_2.right > -1
UNION
SELECT DISTINCT EAc_2.right as ID, EAc.pop as pop
FROM EAc_2
LEFT JOIN EAc ON EAc_2.left = EAc.cat
WHERE EAc_2.left > -1 AND EAc_2.right > -1
) GROUP BY ID"
v.db.join map=EA@ConsStat layer=1 column=cat otable=tmp ocolumn=ID

One thought on “Sum up values of neighbouring polygons – a GRASS GIS approach

  1. Some points about this process.
    1. MySQL users will need to choose different “left” and “right” column names. These are reserved, so you will get an error in adding the column.

    2. The first three commands may behave differently depending on Grass-GIS version. In Grass-GIS 6.4, to achieve the desired outcome – a table containing all boundary categories with left-right information, the following revision was required:
    v.category mymap output=mymap2 type=boundary option=add cat=1 step=1
    v.db.droptable mymap2 -f
    v.db.addtable mymap2
    v.db.addcol mymap2 columns=”lleft int, lright int”
    v.to.db mymap2 option=sides columns=lleft,lright

Leave a comment