Seminar at King’s College London

On the 12th February 2016 I gave a seminar on “why and how open data and open APIs can improve research” for the Environmental Dynamics research group seminar series at the Department of Geography of King’s College London (slides available here). The audience was made of students, researchers and academics, all interested and participative, which made the experience rather enjoyable for me. More information on this series of seminars is on the King’s geocomputation blog.

During my seminar I showed how to assemble data requests using the National River Flow Archive’s  RESTful APIs, how to parse and convert server responses using the R language and the rnrfa package. Here are links to the demos and a web mapping/reporting application I built using the rnrfa package as backend tool. If you are interested, the source code of the web app is available as Rmd file from GitHub.

I mentioned that the NRFA APIs are experimental and periodical updates may temporarily break the package. An API update was deployed on the 18.02.2016, the package was updated accordingly and version 0.4.3 is now available on CRAN and GitHub.

Talk at the Royal Statistical Society

Today I presented part of the work done for my PhD and the PURE project at the event “Big Data and Spatial Analytics“, organised by the Business and Industrial Section of the Royal Statistical Society. It was a great opportunity to meet people interested in Big Data and geospatial analytics.

My slides are available on SlideShare.

I also presented a demo, the Rmarkdown file that I used to generate the dynamic report is available on GitHubGist.

You are very welcome to use it and share it with others!




Generate a Latin Hypercube of parameters for a given FUSE model

In the previous post I showed how to get information regarding model building options and used parameters for a given FUSE model. In this post I’ll show how to sample (for the given model) a set of 100 parameters uniformly using the Latin Hypercube Sampling method.

Each FUSE model uses different parameters therefore, in order to sample uniformly, we need to remove the unused parameters and then sample. Here is how I achieve this.

Install/load the fuse package:

    install.packages("fuse", repos="")

Now load devtools and source the gist below (it contains a function called GenerateFUSEParameters)


Choose one of FUSE’s models and get model/parameters info using the FUSEinfo function described in the previous post:

    mid <- 60 # This is TOPMODEL
    x <- FUSEinfo(mid)

Now a Latin Hypercube of 100 samples for the above model (mid=60) is defined as follows:

    parameters <- GenerateFUSEParameters(NumberOfRuns = 100, 
                                         params2remove = names(x)[which(x==FALSE)])

FUSE model and parameters information

A quick post to show how to find what model building decisions, options (name and ID number) and depending parameters are associated with a given FUSE model.

First of all, install/load the fuse package:

    install.packages("fuse", repos="")

Load devtools and source the gist below (it contains a function called FUSEinfo)


Choose one of FUSE’s models:

    mid <- 60 # This is TOPMODEL

Run the function FUSEinfo using mid as input:


The result of FUSEinfo is a dataframe containing 32 columns.

  rferr arch1 arch2 qsurf qperc esoil qintf q_tdh rferr_add rferr_mlt maxwatr_1 maxwatr_2 fracten
1    12    21    34    43    51    62    71    82     FALSE      TRUE      TRUE      TRUE    TRUE
  frchzne fprimqb rtfrac1 percrte percexp sacpmlt sacpexp percfrac iflwrte baserte qb_powr qb_prms
  qbrate_2a qbrate_2b sareamax axv_bexp loglamb tishape timedelay
1     FALSE     FALSE    FALSE    FALSE    TRUE    TRUE      TRUE

The first 8 columns contain the model building decisions: rferr (rainfall error), arch1 (upper soil layer), arch2 (lower soil layer), qsurf (runoff mechanism), qperc (percolation), esoil (evaporation), qintfl (interflow) and q_tdh (routing). See the table below for more information:

The remaining 24 columns list the parameters (see table below). If the value of a parameter is TRUE, it means that parameter is used by the model, FALSE means the parameter is not used.


The TextInFooter macro

This post is for my friend Sue and all the people that use Microsoft Word and want to add to the footer of their documents a reminder to the file location and the timestamp when the document was last saved. This can be achieved using a small VBA macro, and here is how to do it. 

Copy the content in the box below

Sub FileSaveAs()

    Dim i As Long
    Dim ThisPath As String
    Dim pName As String
    Dim TextInFooter As String
    Dim FullName As String

    ThisPath = ActiveDocument.Path
    pName = ActiveDocument.Name
    FullName = ThisPath & "\" & pName
    TextInFooter = "This file was saved in: " & FullName & " on the " & Now

    For i = 1 To ActiveDocument.Sections.Count
        With ActiveDocument.Sections(i)
            .Footers(wdHeaderFooterPrimary).Range.Text = TextInFooter
        End With
End Sub

To save the macro:

  1. Open an existing word document, then press ALT+F11.
  2. Right-Click on Normal in the file explorer in the left hand side panel. Click on INSERT, then click on MODULE.
  3. Copy the text at the bottom of this email and paste it in the empty window that opens on the top-right panel.
  4. Click on SaveNormal.

If you want to save a document without changing the footer, you either press the save button or CTRL+S. These will not run the macro.

If you want to save a document with path+date&time in the footer, you either press FILE -> SaveAs button or F12. These will run the macro.

Remember, you have to save the macro only once. After that, the macro will work on any new or existing document on your computer. However if you change computer or reset it, you will need to re-save the macro.

Split long time series into (hydrological) years in R

I have been recently working on a rather basic task: splitting long time series into years. Although this might sound trivial for calendar years, I had to think a bit to find a relatively elegant solution for hydrological years. Below is what I came up with, however if you are aware of a better way, please leave a comment!

For this exercise, we need to load only one library:

# Load library

Let’s generate a dummy time series:

# Generate dummy time series
from <- as.Date("1950-01-01")
to <- as.Date("1990-12-31")
myDates <- seq.Date(from=from,to=to,by="day")
myTS <- as.xts(runif(length(myDates)),

When working with standard calendar years (from Jan to Dec), splitting a time series into years is not too much of a problem:

# Split the time series into calendar years
myList <- tapply(myTS, format(myDates, "%Y"), c)

The result is a list of 41 time series, each of lenght = 1 year.

Any time series can be accessed, as usual, via its index:

plot( myList[[1]] )


Things become more interesting with non-standard calendars, such as hydrological years (starting on the 1st October and ending on the following 30th September).

The first step is to calculate the number of hydrological years, this is going to be:

the number of years in which we have records from Jan (index = 0) to September (index = 8) minus 1 (because we cannot count the first year).

# calculate the number of hydrological years
nHY <- length(split(myTS[.indexmon(myTS) %in% 0:8], f="years"))-1

Then we create an empty list and populate it with a series in which we append (or bind) the records from October to December of a generic year “counter”, to the records from Jan to Sep of the year “counter+1”.

# create an empty table , to be populate by a loop
myList <- list()

for ( counter in 1:nHY ){
 oct2dec <- split(myTS[.indexmon(myTS) %in% 9:11], f="years")[[counter]]
 jan2sep <- split(myTS[.indexmon(myTS) %in% 0:8], f="years")[[counter + 1]]
 myList[[counter]] <- rbind(oct2dec, jan2sep)

Again, any time series can be accessed via its index:

plot( myList[[1]] )


That’s all! The code in this post is also available as public gist.


The new “hddtools”, an R package for Hydrological Data Discovery

The R package hddtools is an open source project designed to facilitate non programmatic access to on-line data sources. This typically implies the download of a metadata catalogue, selection of information needed, formal request for dataset(s), de-compression, conversion, manual filtering and parsing. All those operation are made more efficient by re-usable functions.

Depending on the data license, functions can provide offline and/or on-line modes. When redistribution is allowed, for instance, a copy of the dataset is cached within the package and updated twice a year. This is the fastest option and also allows offline use of package’s functions. When re-distribution is not allowed, only on-line mode is provided.

The package hddtools can be installed via devtools:

install_github("r_hddtools", username = "cvitolo", subdir = "hddtools")

Data sources and Functions

 The Koppen Climate Classification map

The Koppen Climate Classification is the most widely used system for classifying the world’s climates. Its categories are based on the annual and monthly averages of temperature and precipitation. It was first updated by Rudolf Geiger in 1961, then by Kottek et al. (2006), Peel et al. (2007) and then by Rubel et al. (2010).

The package hddtools contains a function to identify the updated Koppen-Greiger climate zone, given a bounding box.

# Extract climate zones from Peel's map:

# Extract climate zones from Kottek's map:

The Global Runoff Data Centre

The Global Runoff Data Centre (GRDC) is an international archive hosted by the Federal Institute of Hydrology (Bundesanstalt für Gewässerkunde or BfG) in Koblenz, Germany. The Centre operates under the auspices of the World Meteorological Organisation and retains services and datasets for all the major rivers in the world.

Catalogue, kml files and the product “Long-Term Mean Monthly Discharges” are open data and accessible via the hddtools.

# 1. GRDC full catalogue

# 2. Filter GRDC catalogue based on a bounding box 
grdcCatalogue(BBlonMin = -3.82,
              BBlonMax = -3.63,
              BBlatMin = 52.43,
              BBlatMax = 52.52,
              mdDescription = TRUE) 

# 3. Monthly data extraction

The Data60UK dataset

In the decade 2003-2012, the IAHS Predictions in Ungauged Basins (PUB) international Top-Down modelling Working Group (TDWG) collated daily datasets of areal precipitation and streamflow discharge across 61 gauging sites in England and Wales. The database was prepared from source databases for research purposes, with the intention to make it re-usable. This is now available in the public domain free of charge.

The hddtools contain two functions to interact with this database: one to retreive the catalogue and another to retrieve time series of areal precipitation and streamflow discharge.

# 1a. Data60UK full catalogue

# 1.b Filter Data60UK catalogue based on bounding box 
data60UKCatalogue(BBlonMin = -3.82, 
                  BBlonMax = -3.63,
                  BBlatMin = 52.43,
                  BBlatMax = 52.52) 

# 2. Extract time series 

NASA’s Tropical Rainfall Measuring Mission (TRMM)

The Tropical Rainfall Measuring Mission (TRMM) is a joint mission between NASA and the Japan Aerospace Exploration Agency (JAXA) that uses a research satellite to measure precipitation within the tropics in order to improve our understanding of climate and its variability.

The TRMM satellite records global historical rainfall estimation in a gridded format since 1998 with a daily temporal resolution and a spatial resolution of 0.25 degrees. This information is openly available for educational purposes and downloadable from an FTP server.

HDDTOOLS provides a function, called trmm, to download and convert a selected portion of the TRMM dataset into a raster-brick that can be opened in any GIS software. This function is a slight modification of the code published on Martin Brandt’s post (thanks Martin!).

# Generate multi-layer GeoTiff containing mean monthly precipitations from 3B43_V7 for 2012 (based on a bounding box)

Please leave your feedback

I would greatly appreciate if you could leave your feedbacks either via email ( or taking a short survey (

Image credits to cilipmarketing: