To use Tensorflow on pod, it's recommended to use conda to solve your environment. So first, install anaconda (if you haven't already) from https://www.anaconda.com/download/#linux .
One time step:
Once it's installed you'll need to request an interactive GPU session so that conda sees that you've got GPUs available (I'm not positive this is necessary, but it worked for me). To do that issue a:
srun -N 1 -n 1 -p gpu --time=1:00:00 --pty bash -i
That will give you an interactive shell on a GPU node. You can verify that by typing
and it should return either node111,node112, or node113 (and maybe soon node114).
You'll need to know one other thing, type
it will say something like "Cuda compilation tools, release 9.2, V9.2.88" - you need to know the 'release number' (9.2 in this case).
Next have conda solve your tensorflow environment for you by issuing a
conda create --name tf_gpu tensorflow-gpu cudatoolkit=9.2
(Later, if running the program you're getting errors about driver mismatch, etc., it may be because you have the wrong cudatoolkit= version.)
That will create an environment named tf_gpu for use with your python scripts. Note that it will take a while for conda to work its magic. Once finished you should exit your interactive node by typing 'exit'. That's it.
To run routine jobs, now that you have your environment
A job script would look something like...
#!/bin/bash #SBATCH -N 1 --partition=gpu --ntasks-per-node=1 #SBATCH --time=30:00 #SBATCH --gres=gpu:1 #SBATCH --nodes=1 cd $SLURM_SUBMIT_DIR /bin/hostname source /home/fuz/.bashrc source activate tf_gpu srun --gres=gpu:1 python /home/fuz/classify_image.py
The above referenced classify_image.py is available at http://thumper.mrl.ucsb.edu/~fuz/classify_image.py - note you'll have to change the path in that file. Search for 'CHANGE THE PATH' and you'll see where to modify it. The pic I used is available at http://thumper.mrl.ucsb.edu/~fuz/pic.jpg . You'll have to create the directory 'imagenet' in your home directory. The classify_image.py will download a pre-trained data set and then interpret the pic.jpg you give it. Note that this is an old example.