High Performance Computing Service – Part 2: Get your files ready

To use the HPC service I need:

  1. input files (e.g. data.rda)
  2. some routines (e.g. myroutine1.R, myroutine2.R, myroutine3.R)
  3. a batch script

The input files are:

  • data_C1.rda
  • data_C2.rda
  • data_C3.rda

Each of the above contains the following objects:

  • Classes of Topographic Index (topidxclasses),
  • Delay function (delay),
  • Precipitation time series (rain),
  • Evapotranspiration time series (ET0),
  • Observed streamflow discharge time series (Qobs).

For more info on the above input files, read help file of topmodel package (?topmodel)

Let’s prepare the R routines:

# myroutine1.R

# This routine runs an hydrological model called "topmodel" 1000 times using parameter sets
# randomly sampled from a uniform distribution, using data from catchment "C1". 
# The result is a vector containing Nash-Sutcliffe efficiencies.

library(Hmisc)
library(topmodel)
load("$WORK/input/data_C1.rda")

runs<-10000
vch   <- 1000
dt    <- 1
Srmax <- runif(runs, min=0, max=2)
td    <- runif(runs, min=0, max=3)
k0    <- runif(runs, min=0, max=0.01)
CD    <- runif(runs, min=0, max=5)
qs0   <- runif(runs, min=3e-5, max=5e-5)
lnTe  <- runif(runs, min=-4, max=0)
m     <- runif(runs, min=0, max=0.2)
Sr0   <- runif(runs, min=0, max=0.02)
vr    <- runif(runs, min=100, max=1500)

param.set<-cbind(qs0,lnTe,m,Sr0,Srmax,td,vch,vr,k0,CD,dt)
## returns an array of (runs) Nash Sutcliffe efficiencies; one for each parameter set:
eff1<-topmodel(param.set, topidxclasses, delay, rain, ET0, Qobs = Qobs)

# myroutine2.R

# This routine runs an hydrological model called "topmodel" 1000 times using parameter sets 
# randomly sampled from a uniform distribution, using data from catchment "C2". 
# The result is a vector containing Nash-Sutcliffe efficiencies.

library(Hmisc)
library(topmodel)
load("$WORK/input/data_C2.rda")

runs<-10000
vch   <- 1000
dt    <- 1
Srmax <- runif(runs, min=0, max=2)
td    <- runif(runs, min=0, max=3)
k0    <- runif(runs, min=0, max=0.01)
CD    <- runif(runs, min=0, max=5)
qs0   <- runif(runs, min=3e-5, max=5e-5)
lnTe  <- runif(runs, min=-4, max=0)
m     <- runif(runs, min=0, max=0.2)
Sr0   <- runif(runs, min=0, max=0.02)
vr    <- runif(runs, min=100, max=1500)

param.set<-cbind(qs0,lnTe,m,Sr0,Srmax,td,vch,vr,k0,CD,dt)
## returns an array of (runs) Nash Sutcliffe efficiencies; one for each parameter set:
eff2<-topmodel(param.set, topidxclasses, delay, rain, ET0, Qobs = Qobs)

# myroutine3.R

# This routine runs an hydrological model called "topmodel" 1000 times using parameter sets 
# randomly sampled from a uniform distribution, using data from catchment "C3". 
# The result is a vector containing Nash-Sutcliffe efficiencies.

library(Hmisc)
library(topmodel)
load("$WORK/input/data_C3.rda")

runs<-10000
vch   <- 1000
dt    <- 1
Srmax <- runif(runs, min=0, max=2)
td    <- runif(runs, min=0, max=3)
k0    <- runif(runs, min=0, max=0.01)
CD    <- runif(runs, min=0, max=5)
qs0   <- runif(runs, min=3e-5, max=5e-5)
lnTe  <- runif(runs, min=-4, max=0)
m     <- runif(runs, min=0, max=0.2)
Sr0   <- runif(runs, min=0, max=0.02)
vr    <- runif(runs, min=100, max=1500)

param.set<-cbind(qs0,lnTe,m,Sr0,Srmax,td,vch,vr,k0,CD,dt)
## returns an array of (runs) Nash Sutcliffe efficiencies; one for each parameter set:
eff3<-topmodel(param.set, topidxclasses, delay, rain, ET0, Qobs = Qobs)

Let’s prepare the batch file:

The batch file contains the following important information:

  • the maximum expected running time (walltime, on the second line)
  • the number of cpus to utilize for each job (cpus, on the third line)
  • the memory to allocate for the job (mem, on the third line)
  • the module to use (I’ll use the module that access the R version 2.15, fourth line)
  • the list of routines to call and files where to read any printed message (e.g. error messages)

# batch.sh

#!/bin/sh  
#PBS -l walltime=50:00:00 
#PBS -l select=1:ncpus=1:mem=600mb

module load R/2.15 intel-suite/2012
R CMD BATCH --slave $HOME/myroutine_C1.R $WORK/outputs/Rout/C1.Rout
R CMD BATCH --slave $HOME/myroutine_C2.R $WORK/outputs/Rout/C2.Rout
R CMD BATCH --slave $HOME/myroutine_C3.R $WORK/outputs/Rout/C3.Rout
Advertisements

3 Comments

  1. Hi Claudia,

    First of all, congratulations and thank you very much, you have an amazing and useful work in your blog/web.

    I understand that your work with the HPC in the Imperial College was long time ago (more than 2 years…), but currently I am working with R and the HPC and I have one doubt (for the moment…). I am trying to work with some scripts with R but I am not able to obtain the results. I do not know if I should finish the R script with save.image or save and the folder where I want to send the output. I have tried to use your syntax but in my file .Rout I have obtained the script and the time used.

    I hope you can help me.
    Thanks in advance,

    Oscar

    Reply

    1. Hi Oscar,
      thanks for your comment. If the result of your script is a table called “myTable”, then at the end of your R script you should simply save the table to a file, for instance:
      saveRDS(myTable, “/work/yourusername/myTable.RDS”).
      Remember to save the files you produce in the work directory, where you have more storage available.
      Let me know how that goes.
      Regards,
      Claudia

      Reply

      1. Hi Claudia,

        thanks for your answer and help!

        Yes, with save() is working fine. Now I am going to test with real data probably I will write you if I have any problem (if you do not mind…)

        Thanks again!!

        Oscar

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s