CSIRO Linux GPU Cluster

Specialised Graphical Processing Unit computing facility

Getting Started

User Information

Accessing the GPU cluster
Users can access the GPU cluster via ssh.
To access the production cluster, users should ssh to the production head node, called linuxgpu.csiro.au Users with an account can log in using their nexus ident as their username, and entering their nexus password when prompted for a password.

To access the test cluster, users should ssh to the test development/head node, called testgpu.csiro.au Users with an account can log in using their nexus ident as their username, and entering their nexus password when prompted for a password.

Please Contact a ASC Cluster Administrator and request to add your NEXUS account to the GPU cluster if you do not have an account on the GPU cluster. (Assistance email below)

Logging In
Getting a console application working just requires Putty (or ssh to linuxgpu.csiro.au from a Linux host)
You can learn about Putty or just install it if it is not already included on your machine

Once installed you can manually add linuxgpu.csiro.au to the list of hosts or just download the pre-configured entries. After downloading the zip, extract the .reg files and right click and select merge for those you require. If the application you will be running on the cluster requires a graphic user interface see this FAQ item

Running Jobs on the GPU cluster
Job submission is via the head node "linuxgpu.csiro.au". linuxgpu is also a login node where jobs can be compiled under nvcc/cudacc environment.

Please note:
There are no gpus attached to the production head node (see GPU Structural Configuration "pdf")

For job submission logon to linuxgpu and then submit your job by running

qsub "jobname"

Interactive jobs and debugging can be done on the Test, Development and Training system, "testgpu.csiro.au"

We will also be setting up a test queue with a small number of test nodes available for users to test their batch jobs before moving the jobs to the production cluster. Once this is set up, users will be able to do:
qsub -q test "jobname"

to have their jobs run on a test node.

There is a guide for using the torque batch system at: http://www.hpsc.csiro.au/userguides/blade/localguide.php#PBS
It refers to a different cluster run by the ASC group in CSIRO, but the qsub syntax is the same.
One difference is that you need to explicitly request GPUs (see the FAQ item)

Using software packages on the GPU Cluster
All the nodes in the GPU cluster are currently set up to run the x86_64 version of Suse Enterprise Linux (SLES) 11. We have also made the Software Development Kit for x86_64 SLES 11 available to these nodes. If you require a software package from either x86_64 SLES 11 or its SDK, please let us know the package name so we can install it for you. (Assistance email below)

We have also specific software packages such as the intel compilers and openmpi into a shared directory called /tools, and set up modules so users can have their paths and environment set up to use the software installed in /tools.

To see what software packages in /tools are available under modules, do:
module avail

To load a particular package, do:
module load packagename

If you wish to compile code using cuda libraries, load the following modules:
module load cuda cuda-sdk

Storage, file systems, quotas and backups
There are /home /data and /flush file systems, which are shared to all nodes in both the production and test linux GPU clusters.

Currently /home is not backed up so you are required to maintain your own copy of any files you generate on another file server.

When the high performance storage (HNAS) is brought into operation on Thursday 26/11/2009, /home will also be backed up.

/data and /flush are intended to provide users with temporary space for use while running jobs. Neither /data and /flush will be backed up, and /flush will be set up so that files older than 7 days may be automatically deleted to free up space if /flush fills. All shared filesystems will have user quotas implemented on them when the HNAS storage system becomes operational.

The following user quotas are set on the shared file systems:

/home 10GB
/data 500GB
/flush 1TB

We have also set a default use quota of 150,000 inodes for each file system, to help protect against runaway jobs. If you need a higher inode quota please contact us.

Help and Resources
For assistance on the gpu cluster, email: gpuhelp@hpsc.csiro.au

Show gpu cluster status linuxmanage status

Show gpu cluster utilisation GPU Cluster monitoring


There are various resources available for people wishing to learn more about GPU development. These are available at:

https://wiki.csiro.au/confluence/display/terabyte/GPU+Forum

http://www.nvidia.com/object/cuda_home.html

http://gpgpu.org/