If you run out of memory, split up your data

Most functions of GRASS can handle very large data sets. With a few functions you might run out of RAM though. One of these is the r.random.cell. This is a function that generates a random sets of cells with spatial dependence, i.e., that are at least a user-defined distance apart.

When this process takes up too much RAM, the solution is to divide the raster layer in a set of tiles, and run the r.random.cell function for each tile separately. Having done that, you can simply patch the tiles together again.

I wrote before how you can use r.tileset to create the tiles. However, in order to maintain the minimum distance between points, the tiles need to be the same distance from each other. I wasn’t sure how to do this with r.tileset, so I created a small script in R to create the tiles. The key is spgrass6 plugin for R, which can be used to read the region settings.

#-------------------------------------------------------------------------------
# Open required libraries
#-------------------------------------------------------------------------------
opr library(spgrass6)

#-------------------------------------------------------------------------------
# Determine tiles parameters (starting lower left corner)
#-------------------------------------------------------------------------------

system("g.region save=region.backup")
rs <- system("g.region -p -g", intern=T)
scoord <- as.numeric(strsplit(rs[2],split="=")[[1]][2]) # min south
wcoord <- as.numeric(strsplit(rs[3],split="=")[[1]][2]) # min west
nsres <- as.numeric(strsplit(rs[5],split="=")[[1]][2]) # y resolution
ewres <- as.numeric(strsplit(rs[6],split="=")[[1]][2]) # x resolution
nrw <- as.numeric(strsplit(rs[7],split="=")[[1]][2]) # number of rows
ncl <- as.numeric(strsplit(rs[8],split="=")[[1]][2]) # number of columns

#This will give you the raster resolution, lower left corner coordinates and number of rows and columns. This can be used to calculate tile sizes:

cpt <- ceiling(ncl/ntc) # number of columns per tile
rpt <- ceiling(nrw/ntr) # number of rows per tile
cptl <- ncl-cpt*(ntc-1) # number of columns last tile
rptl <- nrw-rpt*(ntr-1) # number of rows last tile
    
scoor <- scoord-rpt*nsres
ncoor <- scoord

# Now, you can run a loop. In each round, you set the region extend based on the extend of the tile. The tile size is reduced at all sides with 0.5 x the required minimum distance between points. This ensures a minimum distance between points in this tile and points in the next tile. Note, sizes of the last row and column are possibly smaller.

for(i in 1:ntr){
wcoor <- wcoord-cpt*ewres
ecoor <- wcoord
scoor <- scoor+rpt*nsres
if(i != ntr){
ncoor <- ncoor+rpt*nsres
}else{
ncoor <- ncoor+rptl*nsres
}
s <- scoor + 0.5*mdr
n <- ncoor - 0.5*mdr
        
for(j in 1:ntc){
wcoor <- wcoor+cpt*ewres
if(j != ntc){
ecoor <- ecoor+cpt*ewres
}else{
ecoor <- ecoor+cptl*ewres
}
w <- wcoor + 0.5*mdr
e <- ecoor - 0.5*mdr
            
# Write tiles as raster file
system(paste("g.region -a n=",n, " s=", s, " e=", e," w=", w, sep=""))

# Create random points
system(paste("r.random.cells output=tmp_99199_",i, "_", j, " distance=", mdp, " seed=26", sep=""))
}
}

Now you can patch the tiles together and discard them afterwards

# Patch tiles to one raster layer
system("g.region region=region.backup")
b system(paste("r.patch input=",b," output=", opr, sep=""))
system("g.mremove -f rast=tmp_99199_*")
system("g.remove region=region.backup")
}

You can check out the whole code here. Patching the tiles together still takes a long time, so I need to find a way to do this more efficient. But at least my computer is not running out of memory again.

About these ads

About pvanb

I am a tropical forest ecologist with a focus on spatial and temporal patterns and processes at population and ecosystem level. I am furthermore very interested in issues related to conservation and sustainable use of biodiversity and natural resources under current and future climates. I have worked in the Middle East (Syria and Lebanon) and South America (Brazil) and in Eastern Africa (Kenya).
This entry was posted in Data handling, GIS, GRASS GIS, R computing environment and tagged , , . Bookmark the permalink.

4 Responses to If you run out of memory, split up your data

  1. markusN says:

    You may want to try GRASS 7 at this point since it is being made large file support (LFS) compliant. And if r.random.cell still does not “behave”, please submit a bug report.

    See also the new Wiki category: http://grass.osgeo.org/wiki/Category:Massive_data_analysis

    • pvanb says:

      Hi Markus, thanks for the feedback. The new Wiki pages have some great tips and information for whoever has to deal with (really) large data sets, thanks!

      I am actually using GRASS 7 for a while already. The main (initial) reason was its much better handling of vector data, another thing that greatly improved in version 7. Of course, there are plenty of improvements making it worthwhile. I am actually not sure whether I used grass64 or grass70 when I was working with r.random.cell, but I certainly will give it another try.

      By the way, I thought grass64 already had (partially) large file support (if configures with the -enable-largefile switch)? Does it mean that all modules in grass 7 are now LFS compliant?

      • markusN says:

        You are right that GRASS64 already has (partially) large file support, esp. for raster data. GRASS 70 aims at a full raster/vector support. It will not be complete but all relevant modules should be fine. The others will be updated once remaining limitations are discovered (we speak about 1.3 million lines of C to be checked…). So, if you find a limitation in GRASS 7, please report!

  2. Pingback: Massive data in GRASS GIS: try GRASS 7.0 | Ecostudies

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s