Using R on Knot, Braid, and Pod

Status
Yes

R on Pod:

Pod has R 3.5.1. built for it.  To use it simply type:

module load R/3.5.1

You might find it easier, if you'll be working with R a lot, to put that line in your .bashrc file and your .bash_profile file so it will load automatically at login.  To install packages, see the information below.  If you have trouble installing a package locally, please contact us and we'll see if we can install it globally. Follow the instructions on this page for how to write a job submission script. However, to run on the cluster, you will need Rscript, as shown in the example below. Further useful information can be found here. Then the command to actually run your Rscript is simply:

Rscript --vanilla example.R

which would go after all of the #SBATCH stuff in your job submission file.

For Knot and Braid usage:

R 3.4.3 & R 3.5.0 has been built for both knot and braid.  It's located in /sw/R/R-3.4.3/bin and /sw/R/R-3.5.0/bin .  To use it for your job submissions add export PATH="/sw/R/R-3.4.3/bin":$PATH to your .bashrc file and your .bash_profile file (or substitute 3.4.3 for 3.5.0).

Make sure that the following is included in your .bashrc

export PATH="/sw/bin/:$PATH"

This is for R version 3.0.2. For 3.2.3, use

export PATH="/sw/csc/R-3.2.3/bin:$PATH"

Then, you can launch R simply by typing R on the terminal. You can develop your R code in your local desktop or laptop using RStudio. However, to run on the cluster, you will need Rscript, as shown in the example below. Further useful information can be found here.

Installing Packages

To install packages you want to use, simply use the install.packages() function. You will be asked to create a folder on your home directory where the packages can be installed, simply say yes. Then choose your favorite CRAN mirror and installation will proceed. 

For example, if your job that you submit dies because of missing packages (you see errors like this in the .o file "there is no package called ‘pracma’")

then just start R on the command line on the login node, by typing R

then

 install.packages("pracma")

say yes to the 'install in your local directory' and choose a mirror (say the one in CA).  Once you've done that, you quit R on the command line and try to run your job again through the queue, and it should load these new packages.

Installing your own version of R

If for some reason, you want to install your own version of R on your HOME directory, you can first download the source code from CRAN and compile it yourself:

wget https://cran.r-project.org/src/base/R-3/R-3.x.x.tar.gz
tar xvf R-3.x.x.tar.gz

Then you'll have to follow the instructions on this page because of outdated packages required by a newer R.

http://pj.freefaculty.org/blog/?p=315

After installation is complete, make sure that you update your $PATH in .bashrc (or .cshrc)

export PATH="/home/username/R/R-3.x.x/bin:$PATH"

Using multiple cores on R

There are several levels of parallelism that can be used in R. The simplest one is a shared memory parallelizm and uses the package doMC. Here is an example script (example.R) that uses doMC

# Dopar Example
require(doMC)
registerDoMC(cores=12)

# Example random matrix
m <- matrix(rnorm(25),5,5)
m_norm <- matrix(0,5,5)

# Normalize rows of a matrix in parallel
m_norm <- foreach(i=1:nrow(m), .combine = rbind) %dopar%
  (m[i,]/mean(m[i,]))

# Print normalized matrix
cat('Initial random matrix: \n')
m
cat('Row normalized matrix: \n')
m_norm

The %dopar% will evaluate the loop over the variable i in parallel. To run this script you can use

Rscript --vanilla example.R 

For jobs that run longer than a few minutes, use the queuing system. An example job submission script for this example will look like:

#!/bin/bash

#PBS -l nodes=1:ppn=12

#PBS -l walltime=2:00:00

cd $PBS_O_WORKDIR 

Rscript --vanilla example.R

Rmpi + SNOW

R on knot has the SNOW package installed which enables a simple network of workstations to be created on the cluster.  Read more about snow at http://www.sfu.ca/~sblay/R/snow.html . And R on knot has the Rmpi package installed - main documentation is at http://cran.r-project.org/web/packages/Rmpi/index.html .  In order to run R scripts that utilize the Rmpi directives you'll need to make sure that your login shell can see the right libraries for openmpi and intel (which openmpi was built with).  In Bash shell ( .bashrc ) you'll want the following lines:

export LD_LIBRARY_PATH=/opt/openmpi/lib/:/opt/intel/lib/intel64/:$LD_LIBRARY_PATH
export PATH=$PATH:/opt/openmpi/bin

An example of a snow program looks like Rsnow.R :

library(Rmpi)
library(snow)
# Initialize SNOW using MPI communication. The first line will get the
# number of MPI processes the scheduler assigned to us. Everything else
# is standard SNOW
np <- mpi.universe.size()
cluster <- makeMPIcluster(np)
# Print the hostname for each cluster member
sayhello <- function()
{
      info <- Sys.info()[c("nodename", "machine")]
      paste("Hello from", info[1], "with CPU type", info[2])
}
names <- clusterCall(cluster, sayhello)
print(unlist(names))
# Compute row sums in parallel using all processes,
# then a grand sum at the end on the master process
parallelSum <- function(m, n)
{
A <- matrix(rnorm(m*n), nrow = m, ncol = n)
row.sums <- parApply(cluster, A, 1, sum)
print(sum(row.sums))
}
parallelSum(500, 500)
stopCluster(cluster)
mpi.exit()

Rmpi - R with OpenMPI enabled functions

R on knot has the Rmpi package installed - main documentation is at http://cran.r-project.org/web/packages/Rmpi/index.html .  In order to run R scripts that utilize the Rmpi directives you'll need to make sure that your login shell can see the right libraries for openmpi and intel (which openmpi was built with).  In Bash shell ( .bashrc ) you'll want the following lines:

export LD_LIBRARY_PATH=/opt/openmpi/lib/:/opt/intel/lib/intel64/:$LD_LIBRARY_PATH
export PATH=$PATH:/opt/openmpi/bin

Rmpi has an master or lead process which handles the delegation of work to compute nodes and the retrieval of information from those work nodes.  A very simple 'hello.R' program in Rmpi would look like...

library(Rmpi)
mpi.spawn.Rslaves(needlog = FALSE)
mpi.bcast.cmd( id <- mpi.comm.rank() )
mpi.bcast.cmd( np <- mpi.comm.size() )
mpi.bcast.cmd( host <- mpi.get.processor.name() )
result <- mpi.remote.exec(paste("I am", id, "of", np, "running on", host)) 
print(unlist(result))
mpi.close.Rslaves(dellog = FALSE)
mpi.exit()

That would be submitted to the queue with a job file like helloR.job:

#!/bin/bash
​#PBS -l nodes=2:ppn=4                                                         
#PBS -l walltime=5:00:00
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE > nodes
mpirun -np 1 R --no-save < hello.R 

A more interesting example is  Rtest.R which I found on this site  http://www.umbc.edu/hpcf/resources-tara-2010/how-to-run-R.html :

library(Rmpi)
mpi.spawn.Rslaves(needlog = FALSE)
mpi.bcast.cmd( id <- mpi.comm.rank() )
mpi.bcast.cmd( np <- mpi.comm.size() )
mpi.bcast.cmd( host <- mpi.get.processor.name() )
result <- mpi.remote.exec(paste("I am", id, "of", np, "running on", host))
print(unlist(result))
# Sample one normal observation on the master and each slave
x <- rnorm(1)
mpi.bcast.cmd(x <- rnorm(1))
# Gather the entire x vector (by default to process 0, the master)
mpi.bcast.cmd(mpi.gather.Robj(x))
y <- mpi.gather.Robj(x)
print(unlist(y))
# Sum the x vector together, storing the result on process 0 by default
mpi.bcast.cmd(mpi.reduce(x, op = "sum"))
z <- mpi.reduce(x, op = "sum")
print(z)
mpi.close.Rslaves(dellog = FALSE)
mpi.exit()

The above can be run with a similar job submission script as the helloR.job file.