Using R on Knot

Make sure that the following is included in your .bashrc

export PATH="/sw/bin/:$PATH"

This is for R version 3.0.2. For 3.2.3, use

export PATH="/sw/csc/R-3.2.3/bin:$PATH"

Then, you can launch R simply by typing R on the terminal. You can develop your R code in your local desktop or laptop using RStudio. However, to run on the cluster, you will need Rscript, as shown in the example below. Further useful information can be found here.

Installing Packages

To install packages you want to use, simply use the install.packages() function. You will be asked to create a folder on your home directory where the packages can be installed, simply say yes. Then choose your favorite CRAN mirror and installation will proceed. 

Installing your own version of R

If for some reason, you want to install your own version of R on your HOME directory, you can first download the source code from CRAN and compile it yourself:

wget https://cran.r-project.org/src/base/R-3/R-3.x.x.tar.gz
tar xvf R-3.x.x.tar.gz
cd R-3.x.x
./configure --prefix=$HOME/R
make && make install

After installation is complete, make sure that you update your $PATH in .bashrc (or .cshrc)

export PATH="/home/username/R/R-3.x.x/bin:$PATH"

Using multiple cores on R

There are several levels of parallelism that can be used in R. The simplest one is a shared memory parallelizm and uses the package doMC. Here is an example script (example.R) that uses doMC

# Dopar Example
require(doMC)
registerDoMC(cores=12)

# Example random matrix
m <- matrix(rnorm(25),5,5)
m_norm <- matrix(0,5,5)

# Normalize rows of a matrix in parallel
m_norm <- foreach(i=1:nrow(m), .combine = rbind) %dopar%
  (m[i,]/mean(m[i,]))

# Print normalized matrix
cat('Initial random matrix: \n')
m
cat('Row normalized matrix: \n')
m_norm

The %dopar% will evaluate the loop over the variable i in parallel. To run this script you can use

Rscript --vanilla example.R 

For jobs that run longer than a few minutes, use the queuing system. An example job submission script for this example will look like:

#!/bin/bash

#PBS -l nodes=1:ppn=12

#PBS -l walltime=2:00:00

cd $PBS_O_WORKDIR 

Rscript --vanilla example.R

Rmpi + SNOW

R on knot has the SNOW package installed which enables a simple network of workstations to be created on the cluster.  Read more about snow at http://www.sfu.ca/~sblay/R/snow.html . And R on knot has the Rmpi package installed - main documentation is at http://cran.r-project.org/web/packages/Rmpi/index.html .  In order to run R scripts that utilize the Rmpi directives you'll need to make sure that your login shell can see the right libraries for openmpi and intel (which openmpi was built with).  In Bash shell ( .bashrc ) you'll want the following lines:

export LD_LIBRARY_PATH=/opt/openmpi/lib/:/opt/intel/lib/intel64/:$LD_LIBRARY_PATH
export PATH=$PATH:/opt/openmpi/bin

An example of a snow program looks like Rsnow.R :

library(Rmpi)
library(snow)
# Initialize SNOW using MPI communication. The first line will get the
# number of MPI processes the scheduler assigned to us. Everything else
# is standard SNOW
np <- mpi.universe.size()
cluster <- makeMPIcluster(np)
# Print the hostname for each cluster member
sayhello <- function()
{
      info <- Sys.info()[c("nodename", "machine")]
      paste("Hello from", info[1], "with CPU type", info[2])
}
names <- clusterCall(cluster, sayhello)
print(unlist(names))
# Compute row sums in parallel using all processes,
# then a grand sum at the end on the master process
parallelSum <- function(m, n)
{
A <- matrix(rnorm(m*n), nrow = m, ncol = n)
row.sums <- parApply(cluster, A, 1, sum)
print(sum(row.sums))
}
parallelSum(500, 500)
stopCluster(cluster)
mpi.exit()

Rmpi - R with OpenMPI enabled functions

R on knot has the Rmpi package installed - main documentation is at http://cran.r-project.org/web/packages/Rmpi/index.html .  In order to run R scripts that utilize the Rmpi directives you'll need to make sure that your login shell can see the right libraries for openmpi and intel (which openmpi was built with).  In Bash shell ( .bashrc ) you'll want the following lines:

export LD_LIBRARY_PATH=/opt/openmpi/lib/:/opt/intel/lib/intel64/:$LD_LIBRARY_PATH
export PATH=$PATH:/opt/openmpi/bin

Rmpi has an master or lead process which handles the delegation of work to compute nodes and the retrieval of information from those work nodes.  A very simple 'hello.R' program in Rmpi would look like...

library(Rmpi)
mpi.spawn.Rslaves(needlog = FALSE)
mpi.bcast.cmd( id <- mpi.comm.rank() )
mpi.bcast.cmd( np <- mpi.comm.size() )
mpi.bcast.cmd( host <- mpi.get.processor.name() )
result <- mpi.remote.exec(paste("I am", id, "of", np, "running on", host)) 
print(unlist(result))
mpi.close.Rslaves(dellog = FALSE)
mpi.exit()

That would be submitted to the queue with a job file like helloR.job:

#!/bin/bash
​#PBS -l nodes=2:ppn=4                                                         
#PBS -l walltime=5:00:00
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE > nodes
mpirun -np 1 R --no-save < hello.R 

 

A more interesting example is  Rtest.R which I found on this site  http://www.umbc.edu/hpcf/resources-tara-2010/how-to-run-R.html :

library(Rmpi)
mpi.spawn.Rslaves(needlog = FALSE)
mpi.bcast.cmd( id <- mpi.comm.rank() )
mpi.bcast.cmd( np <- mpi.comm.size() )
mpi.bcast.cmd( host <- mpi.get.processor.name() )
result <- mpi.remote.exec(paste("I am", id, "of", np, "running on", host))
print(unlist(result))
# Sample one normal observation on the master and each slave
x <- rnorm(1)
mpi.bcast.cmd(x <- rnorm(1))
# Gather the entire x vector (by default to process 0, the master)
mpi.bcast.cmd(mpi.gather.Robj(x))
y <- mpi.gather.Robj(x)
print(unlist(y))
# Sum the x vector together, storing the result on process 0 by default
mpi.bcast.cmd(mpi.reduce(x, op = "sum"))
z <- mpi.reduce(x, op = "sum")
print(z)
mpi.close.Rslaves(dellog = FALSE)
mpi.exit()

The above can be run with a similar job submission script as the helloR.job file.