Using GPU Nodes

Introduction

Knot has twelve NVIDIA Tesla M2050 GPUs, fitted in pairs into six different nodes (nodes 43, 44, 45, 88, 89 and 90). There is also a gpu head node (node139) for development work. The Tesla M2050 boards have 3GB global/device RAM. The CUDA version 7.5 is installed on Knot. Apart from the cuda compiler nvcc, several useful libraries are also included (e.g cuBLAS, cuFFT, cuRAND, and cuSparse) and they are located in /usr/local/cuda-7.5/lib64. 

To submit a job to the GPU nodes, you simply need to use:

qsub -q gpuq name_of_your_script

It is also possible to login directly to the GPU head node to do development work. To login to the GPU headnode, you can use (while logged into Knot):

ssh knot-gpu

​Example CUDA code

​CUDA, a parallel computing platform to program GPU's are installed on Knot. To use them, include the following in your .bashrc (replace export with setenv if you are using .cshrc)

export PATH="/usr/local/cuda-7.5/bin:$PATH"
export CUDA_HOME=/usr/local/cuda-7.5/
export LD_LIBRARY_PATH=":/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH"

The CUDA C compiler is nvcc. Consider the following program hello_world.cu 

#include <stdio.h>

__device__ const char *STR = "HELLO WORLD!";
const char STR_LENGTH = 12;

__global__ void hello()
{
        printf("%c\n", STR[threadIdx.x % STR_LENGTH]);
}

int main(void)
{
        int num_threads = STR_LENGTH;
        int num_blocks = 1;
        hello<<<num_blocks,num_threads>>>();
        cudaDeviceSynchronize();

        return 0;
}

To compile, just use:

nvcc -o hello.x hello.cu

Then you can run by ./hello.x. The CUDA toolkit is also installed on the GPU nodes. For example, if you have a code that uses cublas, you can simply compile by

nvcc -cublas myProgram.cu

Portland Compilers

The Portland compilers (pgcc and pgf90) have some GPU extensions to compile your code - see the -acc (for programs with OpenACC directives) and -Mcuda which is for using Cuda FORTRAN (basically an extended FORTRAN).