TensorFlow

Status

Yes

Tensorflow is a deep learning library developed by Google with a user friendly API that allows users to build machine learning models easily. Tensorflow is available on Knot only for the CPU mode unless you run interactively on the node knot-gpu2.cnsi.ucsb.edu ( you can ssh directly to knot-gpu2 ). Knot-gpu2 has a Titan V, a GTX 1080 Ti, and a P100. There are no longer any other GPUs on knot.

We recommend using conda from anaconda to run Tensorflow on knot-gpu2. So first, install anaconda (if you haven't already) from https://www.anaconda.com/download/#linux . Then issue a

conda create --name tf_gpu tensorflow-gpu

That will create an environment named tf_gpu for use with your python scripts. Note that it will take a while for conda to work its magic. After it finishes you can call your Tensorflow environment with a

source activate tf_gpu

That's it!

Tensorflow on CPU runs in a container based on Singularity, and uses the Ubuntu kernel.

Instructions

To use tensorflow include the following lines in your .bashrc (or .profile)

export PATH=/sw/csc/singularity/bin/:$PATH
export LD_LIBRARY_PATH=/sw/csc/singularity/lib/singularity:$LD_LIBRARY_PATH

This is pretty much what you need to do!

Example

Below is a simple example code adapted from A. Damien's repository. This example builds a simple linear regression model using the computational graph scheme in Tensorflow

from __future__ import print_function

import tensorflow as tf
import numpy
rng = numpy.random

# Parameters
learning_rate = 0.01
training_epochs = 1000
display_step = 50

# Training Data
train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                         7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                         2.827,3.465,1.65,2.904,2.42,2.94,1.3])
n_samples = train_X.shape[0]

# tf Graph Input
X = tf.placeholder("float")
Y = tf.placeholder("float")

# Set model weights
W = tf.Variable(rng.randn(), name="weight")
b = tf.Variable(rng.randn(), name="bias")

# Construct a linear model
pred = tf.add(tf.multiply(X, W), b)

# Mean squared error
cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)
# Gradient descent
#  Note, minimize() knows to modify W and b because Variable objects are trainable=True by default
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

# Initializing the variables
init = tf.global_variables_initializer()

# Launch the graph

with tf.Session() as sess:
    sess.run(init)

    # Fit all training data
    for epoch in range(training_epochs):
        for (x, y) in zip(train_X, train_Y):
            sess.run(optimizer, feed_dict={X: x, Y: y})

        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            c = sess.run(cost, feed_dict={X: train_X, Y:train_Y})
            print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
                "W=", sess.run(W), "b=", sess.run(b))

    print("Optimization Finished!")
    training_cost = sess.run(cost, feed_dict={X: train_X, Y: train_Y})
    print("Training cost=", training_cost, "W=", sess.run(W), "b=", sess.run(b), '\n')

    # Testing example, as requested (Issue #2)
    test_X = numpy.asarray([6.83, 4.668, 8.9, 7.91, 5.7, 8.7, 3.1, 2.1])
    test_Y = numpy.asarray([1.84, 2.273, 3.2, 2.831, 2.92, 3.24, 1.35, 1.03])

    print("Testing... (Mean square loss Comparison)")
    testing_cost = sess.run(
        tf.reduce_sum(tf.pow(pred - Y, 2)) / (2 * test_X.shape[0]),
        feed_dict={X: test_X, Y: test_Y})  # same function as cost above
    print("Testing cost=", testing_cost)
    print("Absolute mean square loss difference:", abs(
        training_cost - testing_cost))

Suppose the name of this file is

linear.py

Then, what you need is to include, in the same folder, is to have a job submission script (suppose it is called submit.job):

#!/bin/bash

#PBS -l nodes=1:ppn=12
#PBS -l walltime=1:00:00
#PBS -N TFlinear
#PBS -V

# Make sure that you are in the job submission directory
cd $PBS_O_WORKDIR

singularity exec /sw/csc/SingularityImg/ubuntu_w_TFlowKeras.img python linear.py > out.log

There are several points which require some attention:

Notice that we do not call the python on the host, but rather use the singularity container we built (ubuntu_w_TFlow.img). The job will fail without it.
We cannot use more than 1 node. This image does not contain the MPI utilized version of Tensorflow (which has just recently been released and we have not tested it yet).
Notice that the container image uses Python 2.7.

Then, simply submit your job to the queue by

qsub submit.job

TensorFlow

Instructions

Example

Contact

Website