Talk at the Royal Statistical Society

Today I presented part of the work done for my PhD and the PURE project at the event “Big Data and Spatial Analytics“, organised by the Business and Industrial Section of the Royal Statistical Society. It was a great opportunity to meet people interested in Big Data and geospatial analytics.

My slides are available on SlideShare.

I also presented a demo, the Rmarkdown file that I used to generate the dynamic report is available on GitHubGist.

You are very welcome to use it and share it with others!




The new FUSE implementation is now 145 times faster!

Four of my previous posts were about the fuse implementation in RHydro. Since I published them I received many emails and requests for more info. It is clear the topic is of interest for many. I thought I would post a short note on a new FUSE implementation which is still now available as a separate package called “fuse” on GitHub.

# install/load dependent libraries
if(!require(zoo)) install.packages("zoo")
if(!require(tgp)) install.packages("tgp")
if(!require(qualV)) install.packages("qualV")
if(!require(hydromad)) install.packages("hydromad",repos="")
if(!require(devtools)) install.packages("devtools")

# install the fuse package directly from GitHub
install_github("ICHydro/r_fuse", subdir = "fuse")

The functions are named as in RHydro, the only difference is that the list of model structures is now called internally and does not need to be passed as input. It is still compatible with hydromad and below you find few lines to run a test (also available as gist here).

# Load sample data

# Set the parameter ranges

# Set model
modspec <- hydromad(DATA, sma = "fusesma", routing = "fuserouting", mid = 1:1248, deltim = 1)

# Randomly generate 1 parameter set 
myNewParameterSet <- parameterSets( coef(modspec, warn=FALSE), 1, method="random")

# Run a simulation using the parameter set generated above
modx <-  update(modspec, newpars = myNewParameterSet)

# Generate a summary of the result

# Plot results 
hydromad:::xyplot.hydromad(modx, with.P=TRUE)

I thought a basic benchmark between RHydro and fuse packages would be interesting (here the gist).

The result is thatĀ fuse’s functions seem to run overĀ 145 times faster than the corresponding functions in RHydro.

That’s a great news if you plan to do anything that requires hundreds/thousands of runs!
Plot of benchmark results
Here the detailed results of the benchmark:
> compare
Unit: seconds
                expr        min         lq     median        uq        max neval
 f(DATA, parameters) 423.230827 433.070465 446.845983 451.28512 461.818262    10
 g(DATA, parameters)   2.893856   2.988898   3.076531   3.59736   3.713473    10
My session info:

> sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 [4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ggplot2_1.0.0 microbenchmark_1.3-0 tgp_2.4-9 fuse_1.1.0 RHydro_2014-04.1 [6] qualV_0.3 KernSmooth_2.23-12 XML_3.98-1.1 deSolve_1.10-9 lhs_0.10 [11] sp_1.0-15 xts_0.9-7 zoo_1.7-11 loaded via a namespace (and not attached): [1] colorspace_1.2-4 digest_0.6.4 grid_3.1.1 gtable_0.1.2 lattice_0.20-29 MASS_7.3-33 [7] munsell_0.4.2 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.2 reshape2_1.4 scales_0.2.4 [13] stringr_0.6.2 tools_3.1.1

Image credits to Nick Chill: