High Performance Computing Service – Part 1: Intro

Imperial College London, has many other universities, provides an excellent High Performance Computing Service for its staff and students.

It’s like a private cloud, with thousands of processors, which allows you to run highly demanding computational jobs. HPC service is particularly suitable for code which can be parallelised. There are many modules and libraries installed and you can use your own routines if written in a common programming/scripting language. For instance I used my R code.

Here is a short tutorial on how to use HPC service.

I first contacted the HPC Service Manager to activate an account with login credentials matching the college’s ones (e.g. my username is “user”).

Once the account is active, the system automatically creates a key pair to easily remote access (for more info read my previous post here) and the following file stores become available:
$HOME = /home/user (10GB and is intended for storing binaries, source and modest amounts of data)
$WORK = /work/user (at least 150GB which is intended for staging files between jobs and for long term data storage)
$TMPDIR = /tmp (to use only for temporary results)

Refer to them using environmental variables.

Next steps:

  • to prepare the scripts I’ll use to run my computations (more info on Part 2).
  • copy on the server the necessary input files, scripts and R packages, if not available online (see Part 3).

R CMD check tells me ‘no visible binding for global variable’

Today is definitely a lucky day!

Here is the solution to avoid the NOTE ‘no visible binding for global variable’ when running R CMD check.

Basically just add to the DESCRIPTION file of your package the following line (for instance after defining the license)

LazyData: yes

and add a line in the routine which calls the data.frame

e.g.

DATA                           # new line
P <- DATA[,"P"]
E <- DATA[,"E"]

In my case, that made the package checker happy!

Here is also an alternative solution.

Upgrade R

I’m using Ubuntu Oneiric and R 2.13 (check your version using the command: lsb_release -a).

Now I need to upgrade R to the latest version to be able to install the package lhs.

In the file:

sudo nano /etc/apt/sources.list

I added the line:

deb http://cran.ma.imperial.ac.uk/bin/linux/ubuntu oneiric/

(here the complete list of cran mirrors)

and saved it.

Then in terminal:

sudo apt-get update
sudo apt-get install r-base
sudo apt-get install r-base-dev

The first time you install a package in R 2.15, it asks: “Would you like to use a personal library?”, type y and then just copy all the packages

from ~/R/x86_64-pc-linux-gnu-library/2.13 to ~/R/x86_64-pc-linux-gnu-library/2.15

R-Forge projects and the svn repository

Any Rproject available from R-Forge can be downloaded using subversion (also called svn).

I suppose svn is already installed on your computer, if not just use Ubuntu Software Centre (more info: https://help.ubuntu.com/community/Subversion).

Use the “Anonymous Subversion Access”, go to terminal and type:

svn checkout svn://svn.r-forge.r-project.org/svnroot/projectname/

Each project can contain many packages. Each package is contained in a folder.

To install a package in R, go to terminal and type:

cd ~/projectname/pkg
R CMD check packagename

If there are no errors, you can install the package:

R CMD INSTALL packagename

To build the package tarball:

R CMD build packagename

If you make a change to the code and want to add your contribution to the repository, you should contact one of the administrators of the package to be added to the list of developers.

Once you have your own developer account you can commit your changes in this way:

  1. check out the last copy of the package using the Developer Subversion Access via SSH: svn checkout svn+ssh://developername@svn.r-forge.r-project.org/svnroot/projectname/
  2. go to the directory containing the project (e.g. cd ~/projectname)
  3. type svn status (you get a list of the files you have added(?) or deleted(!) in your copy)
  4. to apply the same changes to the repo, type: svn add path/example.R or  svn delete path/example.R
  5. once you have done this or if you have simply modified existing file, then you are ready to commit: svn commit

Writing tables into a PostgreSQL database using R

If you are using a PostgreSQL database to store your data and R to process it, then you may want to access and edit your DB directly from R.

This is possible using a package called “RPostgreSQL” available from CRAN.

# Start R

 > R

# Import the library (I assume the library is already installed)

 > library("RPostgreSQL")

# Choose the driver

 > drv <- dbDriver("PostgreSQL")

# Connect to your database (I assume the database already exists on the localhost)

 > con <- dbConnect(drv, host="localhost", user= "exampleuser", password="examplepassword", dbname="exampledb")

# Specify the schema you want to write to (optional, the public shema is the default one)

 > dbGetQuery(con, "SET search_path TO exampleschema")

# Load the table from a csv (1) or an existing R archive (2):
# 1

 > x <- data.frame(read.table("/home/user/table.csv",sep=",",header=TRUE))

# or
# 2

 > load("/home/user/table.rda")
 > x <- data.frame(table)

# Delete any existing table with the same name:

 > if(dbExistsTable(con,"table1")) {dbRemoveTable(con,"table1")}

# Finally write a new table:

 > dbWriteTable(con,"table1", x)

More details here.

Remote access to AWS instance

If you need help to set up an AWS instance, read my previous post.

From the navigation menu of your Amazon EC2 console, click on INSTANCES.

AWS Console

Tick the box to select the new instance and take a note of the web server’s address,

e.g. ec2-00-000-00-000.eu-west-1.compute.amazonaws.com.

Choose the instance

Now go to terminal and type (I assume you are familiar with Secure SHell):

 ssh -i ~/.ssh/webserverkey.pem ubuntu@ec2-00-000-00-000.eu-west-1.compute.amazonaws.com

(you should avoid to access an instance as root, for this reason the user “ubuntu” is automatically created).

If your terminal looks like the one below… well done! You successfully accessed your instance!

Access through terminal

You can also send files from your computer to the instance using Secury CoPy (scp), which syntax is exactly the same as ssh.

Imagine you want to transfer a text file called “test.txt” from your computer (user home folder, ~) to the amazon instance (ubuntu home folder, /home/ubuntu).

Then go to terminal and type:

 scp -i ~/.ssh/webserverkey.pem ~/test.txt ubuntu@ec2-00-000-00-000.eu-west-1.compute.amazonaws.com:/home/ubuntu/

That’s all!

More info here