The new “hddtools”, an R package for Hydrological Data Discovery

The R package hddtools is an open source project designed to facilitate non programmatic access to on-line data sources. This typically implies the download of a metadata catalogue, selection of information needed, formal request for dataset(s), de-compression, conversion, manual filtering and parsing. All those operation are made more efficient by re-usable functions.

Depending on the data license, functions can provide offline and/or on-line modes. When redistribution is allowed, for instance, a copy of the dataset is cached within the package and updated twice a year. This is the fastest option and also allows offline use of package’s functions. When re-distribution is not allowed, only on-line mode is provided.

The package hddtools can be installed via devtools:

library(devtools)
install_github("r_hddtools", username = "cvitolo", subdir = "hddtools")
library(hddtools)

Data sources and Functions

 The Koppen Climate Classification map

The Koppen Climate Classification is the most widely used system for classifying the world’s climates. Its categories are based on the annual and monthly averages of temperature and precipitation. It was first updated by Rudolf Geiger in 1961, then by Kottek et al. (2006), Peel et al. (2007) and then by Rubel et al. (2010).

The package hddtools contains a function to identify the updated Koppen-Greiger climate zone, given a bounding box.

# Extract climate zones from Peel's map:
kgclimateclass(BBlonMin=-3.82,BBlonMax=-3.63,BBlatMin=52.41,BBlatMax=52.52,updatedBy="Peel")

# Extract climate zones from Kottek's map:
kgclimateclass(BBlonMin=-3.82,BBlonMax=-3.63,BBlatMin=52.41,BBlatMax=52.52,updatedBy="Kottek")

The Global Runoff Data Centre

The Global Runoff Data Centre (GRDC) is an international archive hosted by the Federal Institute of Hydrology (Bundesanstalt für Gewässerkunde or BfG) in Koblenz, Germany. The Centre operates under the auspices of the World Meteorological Organisation and retains services and datasets for all the major rivers in the world.

Catalogue, kml files and the product “Long-Term Mean Monthly Discharges” are open data and accessible via the hddtools.

# 1. GRDC full catalogue
grdcCatalogue() 

# 2. Filter GRDC catalogue based on a bounding box 
grdcCatalogue(BBlonMin = -3.82,
              BBlonMax = -3.63,
              BBlatMin = 52.43,
              BBlatMax = 52.52,
              mdDescription = TRUE) 

# 3. Monthly data extraction
grdcMonthlyTS(1107700,liveData=FALSE)

The Data60UK dataset

In the decade 2003-2012, the IAHS Predictions in Ungauged Basins (PUB) international Top-Down modelling Working Group (TDWG) collated daily datasets of areal precipitation and streamflow discharge across 61 gauging sites in England and Wales. The database was prepared from source databases for research purposes, with the intention to make it re-usable. This is now available in the public domain free of charge.

The hddtools contain two functions to interact with this database: one to retreive the catalogue and another to retrieve time series of areal precipitation and streamflow discharge.

# 1a. Data60UK full catalogue
data60UKCatalogue() 

# 1.b Filter Data60UK catalogue based on bounding box 
data60UKCatalogue(BBlonMin = -3.82, 
                  BBlonMax = -3.63,
                  BBlatMin = 52.43,
                  BBlatMax = 52.52) 

# 2. Extract time series 
data60UKDailyTS(62001)

NASA’s Tropical Rainfall Measuring Mission (TRMM)

The Tropical Rainfall Measuring Mission (TRMM) is a joint mission between NASA and the Japan Aerospace Exploration Agency (JAXA) that uses a research satellite to measure precipitation within the tropics in order to improve our understanding of climate and its variability.

The TRMM satellite records global historical rainfall estimation in a gridded format since 1998 with a daily temporal resolution and a spatial resolution of 0.25 degrees. This information is openly available for educational purposes and downloadable from an FTP server.

HDDTOOLS provides a function, called trmm, to download and convert a selected portion of the TRMM dataset into a raster-brick that can be opened in any GIS software. This function is a slight modification of the code published on Martin Brandt’s post (thanks Martin!).

# Generate multi-layer GeoTiff containing mean monthly precipitations from 3B43_V7 for 2012 (based on a bounding box)
trmm(fileLocation="~/",
     url="ftp://disc2.nascom.nasa.gov/data/TRMM/Gridded/",
     product="3B43",
     version=7,
     year=2012,
     BBlonMin=-3.82,
     BBlonMax=-3.63,
     BBlatMin=52.43,
     BBlatMax=52.52)

Please leave your feedback

I would greatly appreciate if you could leave your feedbacks either via email (cvitolodev@gmail.com) or taking a short survey (https://www.surveymonkey.com/s/QQ568FT).

Image credits to cilipmarketing: http://goo.gl/Mw2yhC

Advertisements

The new FUSE implementation is now 145 times faster!

Four of my previous posts were about the fuse implementation in RHydro. Since I published them I received many emails and requests for more info. It is clear the topic is of interest for many. I thought I would post a short note on a new FUSE implementation which is still now available as a separate package called “fuse” on GitHub.

# install/load dependent libraries
if(!require(zoo)) install.packages("zoo")
library(zoo)
if(!require(tgp)) install.packages("tgp")
library(tgp)
if(!require(qualV)) install.packages("qualV")
library(qualV)
if(!require(hydromad)) install.packages("hydromad",repos="http://hydromad.catchment.org")
library(hydromad)
if(!require(devtools)) install.packages("devtools")
library(devtools)

# install the fuse package directly from GitHub
install_github("ICHydro/r_fuse", subdir = "fuse")
library(fuse)

The functions are named as in RHydro, the only difference is that the list of model structures is now called internally and does not need to be passed as input. It is still compatible with hydromad and below you find few lines to run a test (also available as gist here).

# Load sample data
data(DATA)

# Set the parameter ranges
hydromad.options(fusesma=fusesma.ranges(),fuserouting=fuserouting.ranges())

# Set model
modspec <- hydromad(DATA, sma = "fusesma", routing = "fuserouting", mid = 1:1248, deltim = 1)

# Randomly generate 1 parameter set 
myNewParameterSet <- parameterSets( coef(modspec, warn=FALSE), 1, method="random")

# Run a simulation using the parameter set generated above
modx <-  update(modspec, newpars = myNewParameterSet)

# Generate a summary of the result
summary(modx) 

# Plot results 
hydromad:::xyplot.hydromad(modx, with.P=TRUE)

I thought a basic benchmark between RHydro and fuse packages would be interesting (here the gist).

The result is that fuse’s functions seem to run over 145 times faster than the corresponding functions in RHydro.

That’s a great news if you plan to do anything that requires hundreds/thousands of runs!
Plot of benchmark results
Here the detailed results of the benchmark:
> compare
Unit: seconds
                expr        min         lq     median        uq        max neval
 f(DATA, parameters) 423.230827 433.070465 446.845983 451.28512 461.818262    10
 g(DATA, parameters)   2.893856   2.988898   3.076531   3.59736   3.713473    10
My session info:

> sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 [4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ggplot2_1.0.0 microbenchmark_1.3-0 tgp_2.4-9 fuse_1.1.0 RHydro_2014-04.1 [6] qualV_0.3 KernSmooth_2.23-12 XML_3.98-1.1 deSolve_1.10-9 lhs_0.10 [11] sp_1.0-15 xts_0.9-7 zoo_1.7-11 loaded via a namespace (and not attached): [1] colorspace_1.2-4 digest_0.6.4 grid_3.1.1 gtable_0.1.2 lattice_0.20-29 MASS_7.3-33 [7] munsell_0.4.2 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.2 reshape2_1.4 scales_0.2.4 [13] stringr_0.6.2 tools_3.1.1

Image credits to Nick Chill: http://goo.gl/9vWVEb

FUSE model in RHydro package (part 4: HydroMAD compatibility)

This is the fourth of a series of tutorials on the FUSE implementation within the RHydro package. The script for this tutorial is available here. If you are interested in following the discussion related to this post and see how it evolves, join the R4Hydrology community on Google+!

If you want to know what FUSE is, how to prepare your data and run a simple simulation go to the first post of the series, for a basic calibration example (using 1 model structure) go to the second post, while for an example of multi-model calibration got to the third post.

RHydro-HydroMAD compatibility

HydroMAD is an excellent framework for hydrological modelling, optimization, sensitivity analysis and assessment of results. It contains a large set of soil moisture accounting modules and routing functions.

Thanks to Joseph Guillaume (hydromad’s maintainer), FUSE-RHydro is now compatible with HydroMAD, therefore using and calibrating FUSE becomes even easier! Joseph provided many of the examples below, many thanks for that too!

In this tutorial I will show how to:

  1. set up FUSE and its parameter ranges using the hydromad approach
  2. run a simulation
  3. calibrate FUSE using one of the hydromad’s algorithms

Recap from previous posts

Load the package and prepare your default list of models

library(hydromad) 
library(RHydro)
data(modlist)

Read sample data in

temp <- read.csv("dummyData.csv") 

Convert to date the first column and then convert to zoo object

temp[,1] <- as.Date(temp[,1],format="%Y-%m-%d") 
DATA <- read.zoo(temp)

Step 1:  set up FUSE and its parameter ranges using the hydromad approach

HydroMAD allows to specify a rainfall-runoff model of choice. This is achieved by using the hydromad() function and specifying the soil moisture accounting model and routing function to use.

Set the parameter ranges using hydromad.options

hydromad.options(fusesma fusesma.ranges(),
                 fuserouting fuserouting.ranges())
Set up the model
modspec <- hydromad(DATA,
                    sma = "fusesma"                    routing = "fuserouting", 
                    mid = 1:1248, 
                    modlist = modlist)
# Randomly generate 1 parameter set
myNewParameterSet <- parameterSetscoef(modspecwarn=FALSE), 
                                    1                                    method="random" )

Step 2: run a single simulation

Run a simulation using the parameter set generated above
modx <- update(modspec,
               newpars = myNewParameterSet)
Generate a summary of the result
summary(modx)
The instantaneous runoff is
U <- modx$U
The routed discharge is
Qrout <- modx$fitted.values
Plot the Observed vs Simulated value
hydromad:::xyplot.hydromad(modx)
Add the precipitation to the above plot
hydromad:::xyplot.hydromad(modx, with.P=TRUE)

Step 3: calibrate FUSE using one of hydromad’s algorithms

Hydromad provide the “fitBy” method to calibrate using a specified algorithm. As an example, the Shuffled Complex Evolution method can be used as shown below. Please note that the procedure is likely to take a LONG time.

modfit <- fitBySCE(modspec)

Get a summary of the result

summary(modfit)

If you want to use the latest version of fuse, the above steps can be adapted as follows:

library(hydromad) 
library(fuse)
data(modlist)

# Load data
temp <- read.csv("dummyData.csv") 
temp[,1] <- as.Date(temp[,1],format="%Y-%m-%d") 
DATA <- read.zoo(temp)

# Set the parameter ranges using hydromad.options
hydromad.options(fusesma = fusesma.ranges(),
                 fuserouting = fuserouting.ranges())

# Set up the model
modspec <- hydromad(DATA,
                    sma = "fusesma", 
                    routing = "fuserouting",
                    mid = 1:1248,
                    deltim = 1/24)

# Calibrate FUSE using one of hydromad’s algorithms
modfit <- fitBySCE(modspec)

# Get a summary of the result
summary(modfit)

What’s next?

This tutorials was just a brief introduction to a topic that can be explored in many different directions. From a technical point of view, it could be worth to invest some effort into code optimization and/or parallelisation. This would also have an impact on the scientific side facilitating experiments on sensitivity analysis, regionalisation of catchment characteristics and model structure variability.

Some of those ideas are already on their way, some others are just random thoughts. Therefore…watch this space!

FUSE model in RHydro package (part 3: ensemble)

This is the third of a series of tutorials on the FUSE implementation within the RHydro package.  The script for this tutorial is available here. If you are interested in following the discussion related to this post and see how it evolves, join the R4Hydrology community on Google+!

If you want to know what FUSE is, how to prepare your data and run a simple simulation go to the first post of the series, while for a basic calibration example (using 1 model structure) go to the second post.

 

Recap from previous posts

Load the package and prepare your data:

library(RHydro)
temp <- read.csv("dummyData.csv") 
DATA <- zooreg(temp[,2:4], order.by=temp[,1])
myDELTIM <- 1

 

FUSE ensemble

Very often hydrologists decide to use a particular hydrological model based on code availability, familiarity and experience rather than based on hydrological suitability. The real advantage of using FUSE is the possibility to work with an ensemble of multiple models so that uncertainties related to the model structure can be quantified.

The input that defines the model structure is called mid (model identification number) and its value ranges between 1 and 1248. When the most suitable model structure(s) is not known a priori,  the mid can be added to the list of parameters and calibrated.

Adding the full mid-range implies the need to increase significantly the sampling space. There are, however, 4 model structures (called parent models) from which all the other model combinations are derived. In this tutorial I will only consider those 4 model structures.

In this tutorial I will show how to:

A. define the update sampling space (parameter + mid ranges)

B. run a multi-model calibration

C. compare results

 

Step A: define the parameter ranges + mid range

The parent models are as follows:

60 = TOPMODEL,

230 = ARNOXVIC,

342 = PRMS

426 = SACRAMENTO

Therefore mid can be one of those 4 values:

mids <- c(60, 230, 342, 426)

The parameter ranges are defined as in the previous post.

require(tgp)
DefaultRanges <- data.frame(rbind(rferr_add = c(0,0),
                                  rferr_mlt = c(1,1), 
                                  maxwatr_1 = c(25,500), 
                                  maxwatr_2 = c(50,5000),
                                  fracten = c(0.05,0.95), 
                                  frchzne = c(0.05,0.95),
                                  fprimqb = c(0.05,0.95), 
                                  rtfrac1 = c(0.05,0.95), 
                                  percrte = c(0.01,1000), 
                                  percexp = c(1,20), 
                                  sacpmlt = c(1,250), 
                                  sacpexp = c(1,5), 
                                  percfrac = c(0.05,0.95), 
                                  iflwrte = c(0.01,1000), 
                                  baserte = c(0.001,1000), 
                                  qb_powr = c(1,10), 
                                  qb_prms = c(0.001,0.25), 
                                  qbrate_2a = c(0.001,0.25), 
                                  qbrate_2b = c(0.001,0.25), 
                                  sareamax = c(0.05,0.95), 
                                  axv_bexp = c(0.001,3), 
                                  loglamb = c(5,10), 
                                  tishape = c(2,5), 
                                  timedelay = c(0.01,5) )
)
names(DefaultRanges) <- c("Min","Max")
nRuns <- 100
parameters <- lhs( nRuns, as.matrix(DefaultRanges) )
parameters <- data.frame(parameters)
names(parameters) <- c("rferr_add","rferr_mlt","maxwatr_1","maxwatr_2","fracten","frchzne","fprimqb","rtfrac1","percrte","percexp","sacpmlt","sacpexp","percfrac","iflwrte","baserte","qb_powr","qb_prms","qbrate_2a","qbrate_2b","sareamax","axv_bexp","loglamb","tishape","timedelay")

Step B: run a multi-model calibration

Use the Nash-Sutcliffe efficiency as objective function and run the model 4*nRuns times (for each mid and  parameter set).

require(qualV)
indices <- rep(NA,4*nRuns)
discharges <- matrix(NA,ncol=4*nRuns,nrow=dim(DATA)[1])
kCounter <- 0
for (m in 1:4){
  myMID <- mids[m]
  for (pid in 1:nRuns){
  kCounter <- kCounter + 1
  ParameterSet <- as.list(parameters[pid,])
  # Run FUSE Soil Moisture Accounting module
  Qinst <-   fusesma.sim(DATA,
                         mid=myMID,
                         modlist,
                         deltim=myDELTIM,
                         states=FALSE, fluxes=FALSE, fracstate0=0.25,
                         ParameterSet$rferr_add, 
                         ParameterSet$rferr_mlt, 
                         ParameterSet$frchzne, 
                         ParameterSet$fracten, 
                         ParameterSet$maxwatr_1,
                         ParameterSet$percfrac,
                         ParameterSet$fprimqb, 
                         ParameterSet$qbrate_2a, 
                         ParameterSet$qbrate_2b, 
                         ParameterSet$qb_prms, 
                         ParameterSet$maxwatr_2, 
                         ParameterSet$baserte, 
                         ParameterSet$rtfrac1, 
                         ParameterSet$percrte, 
                         ParameterSet$percexp, 
                         ParameterSet$sacpmlt, 
                         ParameterSet$sacpexp, 
                         ParameterSet$iflwrte,  
                         ParameterSet$axv_bexp, 
                         ParameterSet$sareamax, 
                         ParameterSet$loglamb, 
                         ParameterSet$tishape,
                         ParameterSet$qb_powr)

    # Run FUSE Routing module
    Qrout <- fuserouting.sim(Qinst, mid=myMID, 
                             modlist=modlist, 
                             timedelay=ParameterSet$timedelay, 
                             deltim=myDELTIM)

    indices[kCounter] <- EF(DATA$Q,Qrout)
    discharges[,kCounter] <- Qrout
}
}

 

Step C: compare results

Deterministically, the best simulation according to the Nash-Sutcliffe efficiency, is the one with the maximum factor.

bestRun <- which(indices == max(indices))

This corresponds to the model ARNOXVIC, in fact:

bestModel <- function(runNumber){
 if (runNumber<(nRuns+1)) myBestModel <- "TOPMODEL"
 if (runNumber>(nRuns+1) & runNumber<(2*nRuns+1)) myBestModel <- "ARNOXVIC"
 if (runNumber>(2*nRuns+1) & runNumber<(3*nRuns+1)) myBestModel <- "PRMS"
 if (runNumber>(3*nRuns+1) & runNumber<(4*nRuns+1)) myBestModel <- "SACRAMENTO"
 return(myBestModel)
}
bestModel(bestRun)

plot(coredata(DATA$Q),type="l",xlab="",ylab="Streamflow [mm/day]", lwd=0.5)
for(pid in 1:(4*nRuns)){
 lines(discharges[,pid], col="gray", lwd=3)
}
lines(coredata(DATA$Q),col="black", lwd=1)
lines(discharges[,bestRun],col="red", lwd=1)

The plot below shows the observed streamflow in black, all the simulated results in grey and the “best” simulated streamflow in red.

As you can see, using multiple model structures inflates the uncertainty.

FUSE simulations (4 models)

FUSE simulations (4 models)

 

How the best simulation of each model structure compare to each other?

bestRun0060 <- which(indices[1:nRuns] == max(indices[1:nRuns]))
bestRun0230 <- nRuns + which(indices[(nRuns+1):(2*nRuns)] == max(indices[(nRuns+1):(2*nRuns)]))
bestRun0342 <- 2*nRuns + which(indices[(2*nRuns+1):(3*nRuns)] == max(indices[(2*nRuns+1):(3*nRuns)]))
bestRun0426 <- 3*nRuns + which(indices[(3*nRuns+1):(4*nRuns)] == max(indices[(3*nRuns+1):(4*nRuns)]))

plot(coredata(DATA$Q),type="l",xlab="",ylab="Streamflow [mm/day]", lwd=1)
lines(discharges[,bestRun0060], col="green", lwd=1)
lines(discharges[,bestRun0230], col="blue", lwd=1)
lines(discharges[,bestRun0342], col="pink", lwd=1)
lines(discharges[,bestRun0426], col="orange", lwd=1)

legend("top", 
        c("TOPMODEL", "ARNOXVIC", "PRMS","SACRAMENTO"), 
        col = c("green", "blue", "pink", "orange"),
        lty = c(1, 1, 1, 1))

Best simulation for each model structure

Best simulation for each model structure

The plot above shows that TOPMODEL seems the least affected by the initial conditions, while PRMS is the most affected.

What’s next?

FUSE-RHydro is also compatible with HydroMAD, therefore model calibration and assessment becomes even easier (see an example here)!

FUSE model in RHydro package (part 2: calibration)

This is the second of a series of tutorials on the FUSE implementation within the RHydro package. The script for this tutorial is available here. If you are interested in following the discussion related to this post and see how it evolves, join the R4Hydrology community on Google+!

If you want to know what FUSE is, how to prepare your data and run a simple simulation go to my previous post.

 

Recap from previous post

Load the package, prepare your data and select a model structure:

library(RHydro)
temp <- read.csv("dummyData.csv") # this file is available here: https://drive.google.com/file/d/0B8i_8AV-TwKTMXBnbmpUcUh6QVk/edit?usp=sharing
DATA <- zooreg(temp[,2:4], order.by=temp[,1])
myDELTIM <- 1
myMID <- 60

Sample parameter space using the Latin Hypercube Sampling method

The default parameter ranges are suggested in Clark et al. (2008) and are used to sample parameter sets.

Define sample domain (min and max for each parameter)

DefaultRanges <- data.frame(rbind(rferr_add = c(0,0), # additive rainfall error (mm)
                            rferr_mlt = c(1,1), # multiplicative rainfall error (-)
                            maxwatr_1 = c(25,500), # depth of the upper soil layer (mm)
                            maxwatr_2 = c(50,5000), # depth of the lower soil layer (mm)
                            fracten = c(0.05,0.95), # fraction total storage in tension storage (-)
                            frchzne = c(0.05,0.95), # fraction tension storage in recharge zone (-)
                            fprimqb = c(0.05,0.95), # fraction storage in 1st baseflow reservoir (-)
                            rtfrac1 = c(0.05,0.95), # fraction of roots in the upper layer (-)
                            percrte = c(0.01,1000), # percolation rate (mm day-1) 
                            percexp = c(1,20), # percolation exponent (-)
                            sacpmlt = c(1,250), # SAC model percltn mult for dry soil layer (-)
                            sacpexp = c(1,5), # SAC model percltn exp for dry soil layer (-)
                            percfrac = c(0.05,0.95), # fraction of percltn to tension storage (-)
                            iflwrte = c(0.01,1000), # interflow rate (mm day-1) 
                            baserte = c(0.001,1000), # baseflow rate (mm day-1) 
                            qb_powr = c(1,10), # baseflow exponent (-)
                            qb_prms = c(0.001,0.25), # baseflow depletion rate (day-1) 
                            qbrate_2a = c(0.001,0.25), # baseflow depletion rate 1st reservoir (day-1) 
                            qbrate_2b = c(0.001,0.25), # baseflow depletion rate 2nd reservoir (day-1) 
                            sareamax = c(0.05,0.95), # maximum saturated area (-)
                            axv_bexp = c(0.001,3), # ARNO/VIC b exponent (-)
                            loglamb = c(5,10), # mean value of the topographic index (m)
                            tishape = c(2,5), # shape param for the topo index Gamma dist (-) 
                            timedelay = c(0.01,5) )) # time delay in runoff (days)
names(DefaultRanges) <- c("Min","Max")

Sample from the above ranges using the Latin Hypercube Sampling method.

require(tgp)
nRuns <- 100 # this is the number of samples
parameters <- lhs( nRuns, as.matrix(DefaultRanges) )
parameters <- data.frame(parameters)
names(parameters) <- c("rferr_add","rferr_mlt","maxwatr_1","maxwatr_2","fracten","frchzne","fprimqb","rtfrac1","percrte","percexp","sacpmlt","sacpexp","percfrac","iflwrte","baserte","qb_powr","qb_prms","qbrate_2a","qbrate_2b","sareamax","axv_bexp","loglamb","tishape","timedelay")

Run simulations

Use the Nash-Sutcliffe efficiency as objective function and run the model nRuns times (1 for each parameter set).

require(qualV)
indices <- rep(NA,nRuns)
discharges <- matrix(NA,ncol=nRuns,nrow=dim(DATA)[1])
for (pid in 1:nRuns){
  ParameterSet <- as.list(parameters[pid,])
  # Run FUSE Soil Moisture Accounting module
  Qinst <- fusesma.sim(DATA,
  mid=myMID,
  modlist,
  deltim=myDELTIM,
  states=FALSE, fluxes=FALSE, fracstate0=0.25,
  ParameterSet$rferr_add, 
  ParameterSet$rferr_mlt, 
  ParameterSet$frchzne, 
  ParameterSet$fracten, 
  ParameterSet$maxwatr_1,
  ParameterSet$percfrac,
  ParameterSet$fprimqb, 
  ParameterSet$qbrate_2a, 
  ParameterSet$qbrate_2b, 
  ParameterSet$qb_prms, 
  ParameterSet$maxwatr_2, 
  ParameterSet$baserte, 
  ParameterSet$rtfrac1, 
  ParameterSet$percrte, 
  ParameterSet$percexp, 
  ParameterSet$sacpmlt, 
  ParameterSet$sacpexp, 
  ParameterSet$iflwrte,  
  ParameterSet$axv_bexp, 
  ParameterSet$sareamax, 
  ParameterSet$loglamb, 
  ParameterSet$tishape,
  ParameterSet$qb_powr)

  # Run FUSE Routing module
  Qrout <- fuserouting.sim(Qinst, mid=myMID, 
                           modlist=modlist, 
                           timedelay=ParameterSet$timedelay, 
                           deltim=myDELTIM)

  indices[pid] <- EF(DATA$Q,Qrout)
  discharges[,pid] <- Qrout
}

 

Plot the best simulation

Deterministically, the best simulation according to the Nash-Sutcliffe efficiency, is the one with the maximum factor.

bestRun <- which(indices == max(indices))[1] # this takes only the first simulation if more than one return the same efficiency value

plot(coredata(DATA$Q),type="l",xlab="",ylab="Streamflow [mm/day]", lwd=0.5)
for(pid in 1:nRuns){
 lines(discharges[,pid], col="gray", lwd=3)
}
lines(coredata(DATA$Q),col="black", lwd=1)
lines(discharges[,bestRun],col="red", lwd=1)

The plot below shows the observed streamflow in black, all the simulated results in grey and the “best” simulated streamflow in red.

Results FUSE basic calibration

Results FUSE basic calibration

 

What’s next?

The real advantage of using FUSE is the possibility to work with an ensemble of multiple models (see an example here). FUSE-RHydro is also compatible with HydroMAD, therefore model calibration and assessment becomes even easier (see an example here)!

 

FUSE model in RHydro package (part 1: simple simulation)

This is the first of a series of tutorials on the FUSE implementation within the RHydro package.  The script for this tutorial is available here. If you are interested in following the discussion related to this post and see how it evolves, join the R4Hydrology community on Google+!

What’s FUSE?

FUSE is an ensemble of numerous conceptual rainfall-runoff models developed by Clark et al. (2008) and used to simulate the streamflow discharge for a river catchment when areal averaged time series of precipitation (plus snowmelt) and evapotranspiration are available. The code was originally developed by Martyn Clark in Fortran, but a couple of years ago I implemented a version of the same code in R. The code is part of the RHydro package available from R-Forge. In this tutorial I’m going to show how to:

A. install RHydro,

B. prepare data for FUSE,

C. run a simulation.

 

Step A: install RHydro

RHydro is available from R-Forge and can beinstalled running the following command:

install.packages("RHydro"repos="http://R-Forge.R-project.org")

 

Step B: prepare your data

Prepare your data so that you have a zooreg object containing valid dates, P (precipitation+snowmelt time series), E (potential evapotranspiration time series) and Q (streamflow discharge time series). If you do not know how to do this, download my dummy data from here and adapt the content.

Screenshot of dummyData.csv

Screenshot of dummyData.csv

Save dummyData.csv in your working directory. Now, load important libraries and create the zooreg object.

# Load libraries
library(RHydro)
# Load data into a zooreg object
temp <- read.csv("dummyData.csv")
DATA <-  zooreg(temp[,2:4], order.by=temp[,1])

 

Step C: run a simulation

FUSE contains 1248 different model structures. If you want to use only one of them, you should specify the relative model identification number (mid). Let’s assume that:

  1. you want to run the model that corresponds to mid = 60,
  2. you want to use a specific set of parameters,
  3. you data is made of daily time series.

Then, you define the model inputs as in the example below:

# (1)
myMID <- 60
# (2)
myParameterSet <- list(rferr_add = 0, 
                       rferr_mlt = 1, 
                       frchzne = 0.5, 
                       fracten = 0.6828, 
                       maxwatr_1 = 106.6406, 
                       percfrac = 0.5, 
                       fprimqb = 0.5, 
                       qbrate_2a = 0.025, 
                       qbrate_2b = 0.01, 
                       qb_prms = 0.1294, 
                       maxwatr_2 = 3839.8, 
                       baserte = 50, 
                       rtfrac1 = 0.75, 
                       percrte = 703.128, 
                       percexp = 4.8594, 
                       sacpmlt = 10, 
                       sacpexp = 5, 
                       iflwrte = 500, 
                       axv_bexp = 0.7039, 
                       sareamax = 0.25, 
                       loglamb = 7.5, 
                       tishape = 3, 
                       qb_powr = 5, 
                       timedelay = 1.3355)
# (3)
myDELTIM <- 1

You can calculate the effective rainfall as follows:

Qinst <- fusesma.sim (DATA, 
                      mid=myMID, 
                      modlist, 
                      deltim=myDELTIM, 
                      states=FALSE, 
                      fluxes=FALSE, 
                      fracstate0=0.25, 
                      myParameterSet$rferr_add, 
                      myParameterSet$rferr_mlt, 
                      myParameterSet$frchzne, 
                      myParameterSet$fracten, 
                      myParameterSet$maxwatr_1,
                      myParameterSet$percfrac, 
                      myParameterSet$fprimqb, 
                      myParameterSet$qbrate_2a, 
                      myParameterSet$qbrate_2b, 
                      myParameterSet$qb_prms, 
                      myParameterSet$maxwatr_2, 
                      myParameterSet$baserte, 
                      myParameterSet$rtfrac1, 
                      myParameterSet$percrte, 
                      myParameterSet$percexp, 
                      myParameterSet$sacpmlt, 
                      myParameterSet$sacpexp, 
                      myParameterSet$iflwrte, 
                      myParameterSet$axv_bexp, 
                      myParameterSet$sareamax, 
                      myParameterSet$loglamb,
                      myParameterSet$tishape,
                      myParameterSet$qb_powr)

Now calculate the routed runoff:
Qrout <- fuserouting.sim(Qinst, 
                         mid=myMID, 
                         modlist=modlist, 
                         timedelay=myParameterSet$timedelay, 
                         deltim=myDELTIM)

Plot the routed discharge:

plot(Qrout,type="l")

If you are using my dummy data, the result of the last command should be as in the figure below.

Rplot

Routed streamflow discharge

The input fracstate0 (see fusesma function) refers to the initial conditions. The default value is 0.25 which means that at time = 0 the water storages already contain 25% of their maximum capacity. You can change this value to see how a variation of fracstate0 may affect the result of your simulation.

What’s next?

This tutorial assumes you want to use 1 model structure and 1 parameter set. which is a rather simplistic case. The majority of the time hydrologists might be interested in calibrating one specific model structure (see an example here). However, the real advantage of using FUSE is the possibility to work with an ensemble of multiple models (see an example here). FUSE-RHydro is also compatible with HydroMAD, therefore model calibration and assessment becomes even easier (see an example here)!