TensorFlow

Tensorflow is a deep learning library developed by Google with a user friendly API that allows users to build machine learning models easily. Tensorflow is available on Knot only for the CPU mode unless you run interactively on the node knot-gpu2.cnsi.ucsb.edu  ( you can ssh directly to knot-gpu2 ).  Knot-gpu2 has a Titan V, a GTX 1080 Ti, and a P100.  There are no longer any other GPUs on knot.

 We recommend using conda from anaconda to run Tensorflow on knot-gpu2.  So first, install anaconda (if you haven't already) from https://www.anaconda.com/download/#linux . Then issue a

conda create --name tf_gpu tensorflow-gpu

That will create an environment named tf_gpu for use with your python scripts. Note that it will take a while for conda to work its magic. After it finishes you can call your Tensorflow environment with a

source activate tf_gpu

That's it!

Tensorflow on CPU runs in a container based on Singularity, and uses the Ubuntu kernel. 

Instructions

To use tensorflow include the following lines in your .bashrc (or .profile)

export PATH=/sw/csc/singularity/bin/:$PATH
export LD_LIBRARY_PATH=/sw/csc/singularity/lib/singularity:$LD_LIBRARY_PATH

This is pretty much what you need to do!

Example

Below is a simple example code adapted from A. Damien's repository. This example builds a simple linear regression model using the computational graph scheme in Tensorflow

from __future__ import print_function

import tensorflow as tf
import numpy
rng = numpy.random

# Parameters
learning_rate = 0.01
training_epochs = 1000
display_step = 50

# Training Data
train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                         7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                         2.827,3.465,1.65,2.904,2.42,2.94,1.3])
n_samples = train_X.shape[0]

# tf Graph Input
X = tf.placeholder("float")
Y = tf.placeholder("float")

# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")

# Construct a linear model
pred = tf.add(tf.multiply(X, W), b)

# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
#  Note, minimize() knows to modify W and b because Variable objects are trainable=True by default
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph

with tf.Session() as sess:
    sess.run(init)

    # Fit all training data
    for epoch in range(training_epochs):
        for (x, y) in zip(train_X, train_Y):
            sess.run(optimizer, feed_dict={X: x, Y: y})

        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            c = sess.run(cost, feed_dict={X: train_X, Y:train_Y})
            print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
                "W=", sess.run(W), "b=", sess.run(b))

    print("Optimization Finished!")
    training_cost = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
    print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')

    # Testing example, as requested (Issue #2)
    test_X = numpy.asarray([6.83, 4.668, 8.9, 7.91, 5.7, 8.7, 3.1, 2.1])
    test_Y = numpy.asarray([1.84, 2.273, 3.2, 2.831, 2.92, 3.24, 1.35, 1.03])

    print("Testing... (Mean square loss Comparison)")
    testing_cost = sess.run(
        tf.reduce_sum(tf.pow(pred - Y, 2)) / (2 * test_X.shape[0]),
        feed_dict={X: test_X, Y: test_Y})  # same function as cost above
    print("Testing cost=", testing_cost)
    print("Absolute mean square loss difference:", abs(
        training_cost - testing_cost))

Suppose the name of this file is 

linear.py

Then, what you need is to include, in the same folder, is to have a job submission script (suppose it is called submit.job):

#!/bin/bash

#PBS -l nodes=1:ppn=12
#PBS -l walltime=1:00:00
#PBS -N TFlinear
#PBS -V

# Make sure that you are in the job submission directory
cd $PBS_O_WORKDIR

singularity exec /sw/csc/SingularityImg/ubuntu_w_TFlowKeras.img python linear.py > out.log

There are several points which require some attention:

  • Notice that we do not call the python on the host, but rather use the singularity container we built (ubuntu_w_TFlow.img). The job will fail without it.
  • We cannot use more than 1 node. This image does not contain the MPI utilized version of Tensorflow (which has just recently been released and we have not tested it yet).
  • Notice that the container image uses Python 2.7. 

Then, simply submit your job to the queue by

qsub submit.job