To use Tensorflow on pod, it's recommended to use conda to solve your environment. So first, install anaconda (if you haven't already) from https://www.anaconda.com/download/#linux . Once it's installed you'll need to request an interactive GPU session so that conda sees that you've got GPUs available (I'm not positive this is necessary, but it worked for me). To do that issue a:
srun -N 1 -n 1 -p gpu --pty bash -i
That will give you an interactive shell on a GPU node. You can verify that by typing
and it should return either node111,node112, or node113 (and maybe soon node114). Next have conda solve your tensorflow environment for you by issuing a
conda create --name tf_gpu tensorflow-gpu
That will create an environment named tf_gpu for use with your python scripts. Note that it will take a while for conda to work its magic. Once finished you can exit your interactive node by typing 'exit'. That's it. A job script would look something like...
#!/bin/bash #SBATCH -N 1 --partition=gpu --ntasks-per-node=1 #SBATCH --time=30:00 #SBATCH --gres=gpu:1 #SBATCH --nodes=1 cd $SLURM_SUBMIT_DIR /bin/hostname source /home/fuz/.bashrc source activate tf_gpu srun --gres=gpu:1 python /home/fuz/classify_image.py
The above referenced classify_image.py is available at http://thumper.mrl.ucsb.edu/~fuz/classify_image.py - note you'll have to change the path in that file. Search for 'CHANGE THE PATH' and you'll see where to modify it. The pic I used is available at http://thumper.mrl.ucsb.edu/~fuz/pic.jpg . You'll have to create the directory 'imagenet' in your home directory. The classify_image.py will download a pre-trained data set and then interpret the pic.jpg you give it. Note that this is an old example.