K-fold cross validation in GRASS GIS

A common technique to estimate the accuracy of a predictive model is k-fold cross-validation. In k-fold cross-validation, the original sample is randomly partitioned into a number of sub-samples with an approximately equal number of records. Of these sub-samples, a single sub-sample is retained as the validation data for testing the model, and the remaining sub-samples are combined to be used as training data. The cross-validation process is then repeated as many times as there are sub-samples, with each of the sub-samples used exactly once as the validation data (Table 1).

table_1
Table 1. Illustration of data partitioning in a 4-fold cross-validation, with training data used to train the model, and test data to validate the model.

The k evaluation results can then be averaged (or otherwise combined) to produce a single estimation. The advantage of this method is that all observations are used for both training and validation, and each observation is used for validation exactly once.

Functions for modelling and machine learning in e.g., R and Python’s Scikit-learn often contain build-in cross-validation routines. But it is also fairly easy to build such a routine yourself. This tutorial shows how one can easily build a k-fold cross-validation routine in GRASS GIS, e.g., to evaluate the predictive performance of two interpolation techniques, the inverse Distance Weighting and bilinear spline interpolation.

elevation_v1
Figure 1. A) Elevation map of North Carolina. B) Elevation estimation based on inverse distance weighting interpolation of the elevation at 150 random sample points. C) Residue map with the differences between A and B. D) Relative differences between A and B, computed as (A-B)/A. Map C and D are overlaid with the 150 sample locations.

This tutorial is available on https://tutorials.ecodiv.earth.

One thought on “K-fold cross validation in GRASS GIS

  1. Pingback: K-fold cross validation in GRASS GIS – GeoNe.ws

Leave a comment