Alex M Chubaty
02 May 2019
Lots of great resources:
A note about DataCamp:
Notes available from https://github.com/achubaty/r-talks/tree/master/spatial-data-best-practices
install.packages(c("assertthat", "crayon", "raster", "rgdal", "sp"))
library(sp)
library(raster)
library(crayon)
Every project should be self-contained and use relative file paths.
myProject/
|_ data/ ## shared data could be symlinked
|_ outputs/
|_ presentations/
|_ publications/
|_ scripts/
|_ src/
Use RStudio projects!
it's great if others can use your code, but it's more important that you can reuse it
lintr
packagedf1 <- data.frame(A = 1:26, B = sample(letters))
head(df1)
assertthat::assert_that(is(df1$B, "character")) ## why?
Especially on a shared system 😄
Remove intermediate objects to free RAM.
Save them to disk for retrieval later if you need them.
Have the script quit the rsession when done to free RAM.
exit <- Q <- function(save = "no", status = 0, runLast = TRUE) {
q(save = save, status = 0, runLast = TRUE)
}
To best use RAM, CPU, disk, and network when others aren't likely using them:
Sys.sleep()
Identify bottlenecks + high resource use
startTime <- Sys.time()
reallyLongFunction()
endTime <- Sys.time() - startTime
message(cyan("reallyLongFunction() took ", endTime, "s to run."))
Identify bottlenecks + high resource use
f <- "~/data/LandCoverOfCanada2005_V1_4/LCC2005_V1_4a.tif"
r <- raster::raster(f)
object.size(r)
17112 bytes
raster::inMemory(r)
[1] FALSE
Also, use OS tools for monitoring memory use.
Identify bottlenecks + high resource use
Garbage collection is handled by R and your OS will try to free RAM that is no longer being used.
However, you can make scripts more efficient by frequently removing transient/intermediate objects from memory:
reproducible::Cache()
)Many raster
operations use tempfiles behind the scenes, which will clutter your temp drive (a big problem when the machine isn't rebooted frequently).
## make a note of this location, restart R session, and rerun
raster::tmpDir()
Use rasterOptions()
to set tmpdir
manually and be sure to cleanup at the end of your script/session.
maxMemory <- 5e+7
scratchDir <- file.path("/tmp/scratch/MPB")
rasterOptions(default = TRUE)
options(rasterMaxMemory = maxMemory, rasterTmpDir = scratchDir)
raster::tmpDir()
...
unlink(raster::tmpDir(), recursive = TRUE)
> ?dataType
Data type | min value | max value |
---|---|---|
LOG1S | FALSE (0) | TRUE (1) |
INT1S | -127 | 127 |
INT1U | 0 | 255 |
INT2S | -32,767 | 32,767 |
INT2U | 0 | 65,534 |
INT4S | -2,147,483,647 | 2,147,483,647 |
INT4U | 0 | 4,294,967,296 |
FLT4S | -3.4e+38 | 3.4e+38 |
FLT8S | -1.7e+308 | 1.7e+308 |
Cluster operations on Windows use more RAM than they should because Windows can't fork.
Always close your clusters when done to ensure that RAM gets freed.
cl <- raster::beginCluster()
...
raster::endCluster(cl) ## in a function, put this in `on.exit()`
RasterStack
converting to RasterBrick
writeRaster()
changes the projectionfactorValues()
)reproducible::fixErrors()
within reproducible::prepInputs
install.packages("sf") ## will supercede `sp`
install.packages("stars") ## will supercede `raster`
## fast shapefile operations
devtools::install_github("s-u/fastshp")
## fast raster operations
install.packages("fasterize")
install.packages("velox")
## more wrappers for GDAL in R
install.packages("gdalUtils")
meow::meow()