Using Kubernetes on Nautilus
35,000' - This document aims to explain how to use the abundant 32-bit GPUs (and some 64-bit GPUs) available on the Nautilus cluster ( https://nautilus.optiputer.net ) using Kubernetes.
Caveat Emptor - 1/16/2019 - the documentation that follows relies upon at least 4 different companies ( Kubernetes, Docker, Google, NVIDIA ). It is quite possible that links and/or specifics will change in the future. Also it's quite probable that I refer to things incorrectly in this doc.
The Nautilus cluster is a cluster of mainly GPUs spread across the west coast which uses the Pacific Research Platform ( PRP ). Given the high speed network connectivity, *where* your jobs actually run is unimportant. That's where Kubernetes comes in- it abstracts the underlying nitty-gritty unpleasantness so researchers don't have to worry about it. To view a dashboard of current Nautilus usage visit: https://grafana.nautilus.optiputer.net/d/KMsJWWPiz/cluster-usage?orgId=1 .
If you publish results from your research using the Nautilus cluster, please remember to acknowledge the granting agencies. Suitable text would look like: Nautilus is supported by the Pacific Research Platform (NSF #1541349), CHASE-CI (NSF #1730158), and Towards a National Research Platform (NSF #1826967). Additional funding has been supplied by the University of California Office of the President. And if Fuz and/or Paul help you significantly during your research it's nice to acknowledge them and CSC's funding agencies: We acknowledge support from the Center for Scientific Computing from the CNSI, MRL: an NSF MRSEC (DMR-1720256) and NSF CNS-1725797.
To begin, Kubernetes has some interesting nomenclature. When you sign up for the Nautilus cluster you have to be associated with a 'namespace' in order to do anything. Paul Weakliem ( weakliem@cnsi.ucsb.edu ) is the namespace administrator for UCSB - contact him to either be associated with a namespace or to have him create a namespace for your project. It's recommended that namespaces be associated with projects and not individuals. Kubernetes is a container system. Running containers are called 'pods'. 'Jobs' are, well, jobs you can submit and walk away from, and which will spawn 'pods' to complete your work and then kill the pods when the work is finished. You create .yaml files which define your pods, jobs, and storage. Note- .yaml files are terrible - every space, every indent, each letter's case matters and they're *extremely* finicky. The .yaml files specify 'images' which are the actual containers you use to do your work. These images are often Docker images as that seems to be all the rage nowadays. You can create semi-persistent storage to write your output files to, but it's recommended to move the results off Nautilus right away as there's no guarantee that your storage won't be nuked in event of major disk failure.
Kubernetes is controlled by kubectl which you'll need- installation info here: https://kubernetes.io/docs/tasks/tools/install-kubectl/
First and foremost you need to get the Kubernetes config on your local machine so that connecting to Nautilus is seemless. When you log in to https://nautilus.optiputer.net there's a tab called 'Get Config'. Click that, download your config, hop into a terminal, mkdir .kube , and then mv Downloads/config .kube/ . That'll give you access to the Nautilus cluster using kubectl from your local machine. The config oauth tokens expire in 6 months or thereabouts so if suddenly you receive an error like "Unable to connect to the server: failed to refresh token: oauth2: cannot fetch token: 500 Internal Server Error", you'll need to download a new config and replace the one in your .kube/ directory.
To create semi-persistent storage, you create a .yaml file with the relevant info. Below you can change the name and storage capacity to suit your needs. I call this file storage.yaml . I'm not positive of Kubernetes exact specifications regarding spacing, but I've been successful with 2 spaces per indent in all my .yaml files. http://research.mrl.ucsb.edu/~fuz/storage.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: fuzvol
spec:
storageClassName: rook-block
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
To create the storage, replace the namespace ucsb-csc below with your namespace and issue a
kubectl create -n ucsb-csc -f storage.yaml
It should reply with 'persistentvolumeclaim/fuzvol created'. Now we've got some persistent storage we can write to. You can also simply write to / in your pod, but once the pod is killed, any files you wrote would be deleted.
You can create a pod and jump into it - an 'interactive' pod so to speak, but remember that you have to explicitly kill the pod once you're done with it. Pods require 'images', and those images actually contain the OS and programs you need to do your work. I think most images are docker images. If you want to create your own image you'd install docker and then create your image and then push it up to docker hub. I'll show that stuff later on. Your images need to be somewhere on the internet so that Nautilus can grab it, which is why docker hub works well.
Here's a pod .yaml file called gpu-pod.yaml which spins up a container with 1 GPU and mounts the persistent storage I created. This grabs my docker image at docker hub under fuzzrence/tf-gpu2 , and then runs nvidia-smi. The command nvidia-smi will write to the standard out and then the container will simply linger around forever because of the sleep infinity command. This is actually kind of useful as you can hop into the container. http://research.mrl.ucsb.edu/~fuz/gpu-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
namespace: ucsb-csc
spec:
restartPolicy: Never
containers:
- name: fs-container
image: fuzzrence/tf-gpu2
command: ["/bin/bash"]
args: ["-c", "nvidia-smi"]
args: ["-c", "sleep infinity"]
resources:
limits:
nvidia.com/gpu: 1
volumeMounts:
- name: fuzvol
mountPath: /fuzvol
volumes:
- name: fuzvol
persistentVolumeClaim:
claimName: fuzvol
Create the pod with a
kubectl create -f http://thumper.mrl.ucsb.edu/~fuz/gpu-pod.yaml
You can see what's happening by issuing a:
kubectl get pods
and
kubectl describe pod gpu-pod
To hop into the pod, issue a
kubectl exec -it gpu-pod -- /bin/bash
I'm not positive but if you're a member of many namespaces, you may have to add a -n namespace to your kubectl commands. So like I said, having the pod lingering around is useful to get your workflow perfect. Inside the pod you can run your code and make sure it's doing what you expect. The pod gpu-pod is an Ubuntu 16.04 container with Cuda 9, Cudnn 7, and Tensorflow-gpu. You could also do this in docker, of course, but using Nautilus to do this is nice because I, for instance, don't have a Titan GPU in my desktop. Pods are associated with the IP address of the server they're on so you can grab files from the web - like python scripts and other stuff. You hop out of the pod with a simple 'exit'. When pods have errors you can view the errors from outside the pod with a:
kubectl logs gpu-pod
And when you've figured out your workflow you then delete your pod with a
kubectl delete pod gpu-pod
Pods that don't have a 'sleep infinity' in them will run the commands given in the .yaml file and then complete. The pod won't be running or consuming resources but it will prevent another pod with the same name from being created. Also the standard out (logs) will continue to be viewable.
Jobs are .yamls that spawn pods to do your work. Here's gpu-job.yaml http://research.mrl.ucsb.edu/~fuz/gpu-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: gpu-job
namespace: ucsb-csc
spec:
template:
spec:
restartPolicy: Never
containers:
- name: fs-container
image: fuzzrence/tf-gpu2
command: ["/bin/bash"]
args: ["-c", "chmod 777 /fuzvol"]
args: ["-c", "echo 'helloworld' > /fuzvol/test4.txt"]
args: ["-c", "nvidia-smi >> /fuzvol/test4.txt"]
resources:
limits:
nvidia.com/gpu: 1
volumeMounts:
- name: fuzvol
mountPath: /fuzvol
volumes:
- name: fuzvol
persistentVolumeClaim:
claimName: fuzvol
It gets submitted with a
kubectl create -f http://thumper.mrl.ucsb.edu/~fuz/gpu-job.yaml
You'll note by issuing a
kubectl get jobs
kubectl get pods
kubectl describe job gpu-job
that the job spawns a pod with a unique identifier at the end allowing you to run the job many times spawning many pods and you won't get any collisions.
I'm not positive on the *best* way to get data off your persistent storage, but since it's on the internet there are several possibilities ranging from ftp, scp, and traditional unix methods to other methods like git.
If you'll have several pods and you want them to connect to the same storage at the same time, you'll have to use the Ceph file system as described here: https://wiki.nautilus.optiputer.net/wiki/Ceph_shared_filesystem .
The Nautilus support via RocketChat is good, so be sure to sign up for the website help at: https://rocket.nautilus.optiputer.net
A basic tensor flow test would look like...
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess=tf.Session()
2019-01-16 22:29:05.319479: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-01-16 22:29:05.489508: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:09:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-01-16 22:29:05.489562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-16 22:29:05.850885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-16 22:29:05.850940: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-01-16 22:29:05.850948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-01-16 22:29:05.851392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10405 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1)
And then an image classification test ( classify_image.py ) [note I didn't use the Red Panda pic, but a pic of a Fluoromax instrument] which loads the cuDNN 7.1.4. I hop into the pod with a
kubectl exec -it gpu-pod -- /bin/bash
And then python classify_image.py yielding...
root@gpu-pod:/fuzvol# python classify_image.py
WARNING:tensorflow:From classify_image.py:142: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
2019-01-16 23:04:45.985222: W tensorflow/core/framework/op_def_util.cc:355] Op BatchNormWithGlobalNormalization is deprecated. It will cease to work in GraphDef version 9. Use tf.nn.batch_normalization().
2019-01-16 23:04:46.131832: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-01-16 23:04:46.309394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:09:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2019-01-16 23:04:46.309425: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-16 23:04:46.628887: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-16 23:04:46.628942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-01-16 23:04:46.628950: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-01-16 23:04:46.629399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10405 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:09:00.0, compute capability: 6.1)
desktop computer (score = 0.34769)
screen, CRT screen (score = 0.06072)
printer (score = 0.04900)
radio, wireless (score = 0.02649)
desk (score = 0.02416)
If you want to make your own images, install docker and sign up for an account at: https://cloud.docker.com . There you can create your repository and then push your docker image up.
docker push fuzzrence/tf-gpu2
And here's my docker script to create the image
ARG UBUNTU_VERSION=16.04
FROM nvidia/cuda:9.0-base-ubuntu${UBUNTU_VERSION} as base
ENV CONDA_DIR=/opt/conda \
SHELL=/bin/bash \
NB_USER=root \
NB_UID=0 \
NB_GID=0 \
LC_ALL=C \
LANG=en_US.UTF-8 \
LANGUAGE=en_US.UTF-8
ENV PATH=$CONDA_DIR/bin:$PATH \
HOME=/home/$NB_USER
# Pick up some TF dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
cuda-command-line-tools-9-0 \
cuda-cublas-9-0 \
cuda-cufft-9-0 \
cuda-curand-9-0 \
cuda-cusolver-9-0 \
cuda-cusparse-9-0 \
libcudnn7=7.1.4.18-1+cuda9.0 \
libfreetype6-dev \
libhdf5-serial-dev \
libpng12-dev \
libzmq3-dev \
pkg-config \
software-properties-common \
sudo \
unzip
RUN apt-get update && \
apt-get install nvinfer-runtime-trt-repo-ubuntu1604-5.0.2-ga-cuda9.0 \
&& apt-get update \
&& apt-get install -y --no-install-recommends libnvinfer5=5.0.2-1+cuda9.0 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
ARG USE_PYTHON_3_NOT_2
ARG _PY_SUFFIX=${USE_PYTHON_3_NOT_2:+3}
ARG PYTHON=python${_PY_SUFFIX}
ARG PIP=pip${_PY_SUFFIX}
# See http://bugs.python.org/issue19846
# ENV LANG C.UTF-8
RUN export LC_ALL="en_US.UTF-8"
RUN export LC_CTYPE="en_US.UTF-8"
# RUN sudo dpkg-reconfigure locales
RUN apt-get update && apt-get install -y \
python3 \
python3-pip \
wget
RUN python3 -m pip install --upgrade pip
RUN pip --no-cache-dir install --upgrade \
pip \
setuptools
# Some TF tools expect a "python" binary
RUN ln -s $(which python3) /usr/local/bin/python
# Options:
# tensorflow
# tensorflow-gpu
# tf-nightly
# tf-nightly-gpu
ARG TF_PACKAGE=tensorflow-gpu
RUN ${PIP} install ${TF_PACKAGE}
USER $NB_USER