If you run out of memory, split up your data

Most functions of GRASS can handle very large data sets. With a few functions you might run out of RAM though. One of these is the r.random.cell. This is a function that generates a random sets of cells with spatial dependence, i.e., that are at least a user-defined distance apart.

When this process takes up too much RAM, the solution is to divide the raster layer in a set of tiles, and run the r.random.cell function for each tile separately. Having done that, you can simply patch the tiles together again.

I wrote before how you can use r.tileset to create the tiles. However, in order to maintain the minimum distance between points, the tiles need to be the same distance from each other. I wasn’t sure how to do this with r.tileset, so I created a small script in R to create the tiles. The key is spgrass6 plugin for R, which can be used to read the region settings.

# Open required libraries
opr library(spgrass6)

# Determine tiles parameters (starting lower left corner)

system("g.region save=region.backup")
rs <- system("g.region -p -g", intern=T)
scoord <- as.numeric(strsplit(rs[2],split="=")[[1]][2]) # min south
wcoord <- as.numeric(strsplit(rs[3],split="=")[[1]][2]) # min west
nsres <- as.numeric(strsplit(rs[5],split="=")[[1]][2]) # y resolution
ewres <- as.numeric(strsplit(rs[6],split="=")[[1]][2]) # x resolution
nrw <- as.numeric(strsplit(rs[7],split="=")[[1]][2]) # number of rows
ncl <- as.numeric(strsplit(rs[8],split="=")[[1]][2]) # number of columns

#This will give you the raster resolution, lower left corner coordinates and number of rows and columns. This can be used to calculate tile sizes:

cpt <- ceiling(ncl/ntc) # number of columns per tile
rpt <- ceiling(nrw/ntr) # number of rows per tile
cptl <- ncl-cpt*(ntc-1) # number of columns last tile
rptl <- nrw-rpt*(ntr-1) # number of rows last tile

scoor <- scoord-rpt*nsres
ncoor <- scoord

# Now, you can run a loop. In each round, you set the region extend based on the extend of the tile. The tile size is reduced at all sides with 0.5 x the required minimum distance between points. This ensures a minimum distance between points in this tile and points in the next tile. Note, sizes of the last row and column are possibly smaller.

for(i in 1:ntr){
wcoor <- wcoord-cpt*ewres
ecoor <- wcoord
scoor <- scoor+rpt*nsres
if(i != ntr){
ncoor <- ncoor+rpt*nsres
ncoor <- ncoor+rptl*nsres
s <- scoor + 0.5*mdr
n <- ncoor - 0.5*mdr

for(j in 1:ntc){
wcoor <- wcoor+cpt*ewres
if(j != ntc){
ecoor <- ecoor+cpt*ewres
ecoor <- ecoor+cptl*ewres
w <- wcoor + 0.5*mdr
e <- ecoor - 0.5*mdr

# Write tiles as raster file
system(paste("g.region -a n=",n, " s=", s, " e=", e," w=", w, sep=""))

# Create random points
system(paste("r.random.cells output=tmp_99199_",i, "_", j, " distance=", mdp, " seed=26", sep=""))

Now you can patch the tiles together and discard them afterwards

# Patch tiles to one raster layer
system("g.region region=region.backup")
b system(paste("r.patch input=",b," output=", opr, sep=""))
system("g.mremove -f rast=tmp_99199_*")
system("g.remove region=region.backup")

Patching the tiles together still takes a long time, so I need to find a way to do this more efficient. But at least my computer is not running out of memory again.


4 thoughts on “If you run out of memory, split up your data

    1. pvanb

      Hi Markus, thanks for the feedback. The new Wiki pages have some great tips and information for whoever has to deal with (really) large data sets, thanks!

      I am actually using GRASS 7 for a while already. The main (initial) reason was its much better handling of vector data, another thing that greatly improved in version 7. Of course, there are plenty of improvements making it worthwhile. I am actually not sure whether I used grass64 or grass70 when I was working with r.random.cell, but I certainly will give it another try.

      By the way, I thought grass64 already had (partially) large file support (if configures with the -enable-largefile switch)? Does it mean that all modules in grass 7 are now LFS compliant?

      1. You are right that GRASS64 already has (partially) large file support, esp. for raster data. GRASS 70 aims at a full raster/vector support. It will not be complete but all relevant modules should be fine. The others will be updated once remaining limitations are discovered (we speak about 1.3 million lines of C to be checked…). So, if you find a limitation in GRASS 7, please report!

  1. Pingback: Massive data in GRASS GIS: try GRASS 7.0 | Ecostudies

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s