Slurm Job Scheduler

The Pod cluster uses the Slurm job scheduler - it is similar to Torque, but we'll outline some differences below.  There are also some nice 'cheat sheets' out there to convert from the Torque commands you know, one nice one is here

The major differences to be aware of:

  • Queues are known as Partitions - you don't really care, except it means instead of the argument when submitting a job "-q short" to send something to the short (or some other) queue is now "-p short" (p for partition).
  • You'll need to change the various 'PBS' variables in your script.  The common ones are listed below - many others available at the

 

What Torque Slurm
Nodes/Cores #PBS -l nodes=1:ppn=10 #SBATCH --nodes=1 --ntasks-per-node 10
Walltime #PBS -l walltime=1:00:00 #SBATCH --time=1:00:00
mail to user #PBS -M username@ucsb.edu #SBATCH --mail-user=user@ucsb.edu
Mail begin/end #PBS -m be #SBATCH --mail-type=start,end
Working Directory $PBS_O_WORKDIR $SLURM_SUBMIT_DIR

Slurm will take all of your environment variables that your login shell has, so if you need a compiler, or Matlab, etc., do the 'module load' for it before you submit your job.

Basic to run a job is 'sbatch' (from Torque it was 'qsub'), e.g. you have a file named 'test.slurm' that looks like this (for first a serial, then a parallel job)

#!/bin/bash -l
#Serial (1 core on one node) job...
#SBATCH --nodes=1 --ntasks-per-node=1
cd $SLURM_SUBMIT_DIR
time  ./a.out >& logfile

and a simple parallel (MPI) example

#!/bin/bash -l
# ask for 16 cores on two nodes
#SBATCH --nodes=2 --ntasks-per-node=16

cd $SLURM_SUBMIT_DIR

/bin/hostname
mpirun -np $SLURM_NTASKS ./a.out

 

Notice that the main changes from PBS are a slightly different format on choosing the number of nodes/cores and also the directory name to CD to.   For MPI jobs, it's actually somewhat simplified in you don't need to give it a nodes file.

You run this job with 'sbatch test.slurm' (you used to use 'qsub')

you can check on the status with squeue (formerly 'qstat') e.g.

squeue -u $USER  (to see only your jobs, 'squeue' will show every job on the system)

 

you can look at details with 'scontrol show job JIOBID', sort of like the old 'qstat -f' command.

 

To kill a job you use 'scancel -i JOBID' (formerly 'qdel JOBID')

 

 

 

 

Torque Slurm
#PBS -J myjob #SBATCH -J myjob