Slurm Job Scheduler

Status
Yes

The Pod cluster uses the Slurm job scheduler - it is similar to Torque, but we'll outline some differences below. There are also some nice 'cheat sheets' out there to convert from the Torque commands you know, one nice one is here

The major differences to be aware of:

  • Queues are known as Partitions - you don't really care, except it means instead of the argument when submitting a job "-q short" to send something to the short (or some other) queue is now "-p short" (p for partition).
  • You'll need to change the various 'PBS' variables in your script. The common ones are listed below - many others available at the link above.
  • SBATCH partitions you can submit to: sbatch my.job (standard compute node) , sbatch -p short my.job (short queue, 1 hour - for testing), sbatch -p gpu my.job (GPU nodes), sbatch -p largemem my.job (large memory 1.5TB nodes)
What Torque Slurm
Nodes/Cores #PBS -l nodes=1:ppn=10 #SBATCH --nodes=1 --ntasks-per-node=10
Walltime #PBS -l walltime=1:00:00 #SBATCH --time=1:00:00
mail to user #PBS -M username@ucsb.edu #SBATCH --mail-user=user@ucsb.edu
Mail begin/end #PBS -m be #SBATCH --mail-type=start,end
Working Directory $PBS_O_WORKDIR $SLURM_SUBMIT_DIR

Slurm will take all of your environment variables that your login shell has, so if you need a compiler, or Matlab, etc., do the 'module load' for it before you submit your job.

Basic to run a job is 'sbatch' (from Torque it was 'qsub'), e.g. you have a file named 'test.slurm' that looks like this (for first a serial, then a parallel job)

#!/bin/bash -l
#Serial (1 core on one node) job...
#SBATCH --nodes=1 --ntasks-per-node=1
cd $SLURM_SUBMIT_DIR
time  ./a.out >& logfile

and a simple parallel (MPI) example

#!/bin/bash -l
# ask for 16 cores on two nodes
#SBATCH --nodes=2 --ntasks-per-node=16

cd $SLURM_SUBMIT_DIR

/bin/hostname
mpirun -np $SLURM_NTASKS ./a.out

 

Notice that the main changes from PBS are a slightly different format on choosing the number of nodes/cores and also the directory name to CD to. For MPI jobs, it's actually somewhat simplified in you don't need to give it a nodes file.

You run this job with 'sbatch test.slurm' (you used to use 'qsub')

You can check on the status with squeue (formerly 'qstat') e.g.

squeue -u $USER (to see only your jobs, 'squeue' will show every job on the system)

You can look at details with 'scontrol show job JIOBID', sort of like the old 'qstat -f' command.

To kill a job you use 'scancel -i JOBID' (formerly 'qdel JOBID')

If you want an interactive node to test some things to make sure your job will run, you can do this with

srun -N 1 -p short --ntasks-per-node=4 --pty bash (which asks for 8 cores on the short queue node, which will run for up to an hour)

Torque Slurm
#PBS -J myjob #SBATCH -J myjob