To run GPU jobs, the slurm files are slightly different. A basic example is below (that simply queries the GPU card)
#!/bin/bash #SBATCH -N 1 --partition=gpu --ntasks-per-node=1 #SBATCH --gres=gpu:1 cd $SLURM_SUBMIT_DIR /bin/hostname srun --gres=gpu:1 /usr/bin/nvidia-smi sleep 120
There's a few key things to keep in mind. Essentially the first two SBATCH lines request one GPU on one node. If you leave out the '--partition-gpu' you can just include it on the submit line, e.g. sbatch -p gpu myfile.job
Then to run the actual GPU job, you need to put the 'srun --gres=gpu:1' in front of the command. This ensures that you are getting exclusive access to the GPU that the system has reserved for you.
Note that there is a development node so you can test out your code before submitting it to the gpu queue. It's called pod-gpu and the various cuda's are installed in /usr/local/ - just
once you're logged in to pod.