Bulletin 159 - 2006 Jul 26

  1. SX-6 scheduling changes
  2. Exit status of SX MPI programs
  3. Altix Application Tuning Workshop on 11th August
  4. cherax outages (28/07 & 05/08) & software change
  5. tardir utilities
  6. CSIRO: Modules coming, pkgenv going
  7. Matlab available on burnet

1. SX-6 scheduling changes

We have recently changed the scheduling strategy on the SX-6s to lessen the chances of the scheduler allocating jobs to a node that will lead to the over-commitment of memory (and the limited swap area).

We have set the scheduler not to allocate a job to nodes where the memory in use plus the candidate job will over-commit memory. We have also set up the scheduler to use more conservative estimates of job memory in use - more account is being taken of the requested memory rather than the actual memory in use, to allow for jobs whose memory usage fluctuates significantly.

This makes it even more important for users to put accurate memory limits on their jobs, to get the best throughput for their own and other users' jobs. To see the maximum memory used by a job: please look at the output accounting information produced from the epilogue from your jobs. If the information does not appear, then please insert the command jobinfo as the last command in your job script.

You can also interrogate the job accounting information directly. For example, for a job run today on node sx666, use the csh commands:

set date=`date '+%Y%m%d'` # in sh just: date=`date '+%Y%m%d'`
rsh sx666 "acctjob -u $USER -itFM /var/sx/adm/jobacct.$date"

We can produce graphs showing memory use over time for jobs if users would like to see that.

[ page top ]


2. Exit status of SX MPI programs

MPI/SX watches for an exit status of MPI processes, because it determines whether a termination status of an MPI program is normal termination or abnormal termination. If and only if every MPI process returns 0 as its exit status, MPI/SX assumes that the program terminates with success. Otherwise the program is assumed to be of failure. Therefore MPI/SX recommends the following programming styles.

  • At a normal termination of MPI process, the process should return zero as its exit status. In a C program, for example, the return value of the main function should be zero.

  • At an abnormal termination of MPI process, the process should return non-zero value as its exit status. In a C program, for example, non-zero value should be specified as a return value of the main function, an exit status to the exit(2) system call, or an error code to the MPI_ABORT procedure.

  • If an MPI process is started indirectly via a sort of a shell script, the script should return the exit status of the process. The following shell script shows an example.

    #!/bin/sh
    /execdir/mpi.exec # starts MPI process
    RC=$? # holds the exit status
    command # non-MPI program/command
    exit $RC # declare the exit status

For all the gory details see req #6772 and merges #6774 #6775 #6793 #6812.

[ page top ]


3. Altix Application Tuning Workshop on 11th August

Learn to get the best out of your Altix programs at the free Altix Application Tuning Workshop.

where: SGI office, 357 Camberwell Road, Camberwell VIC phone number: +61 3 9963 1900

when: 11th August, 9:30am - 4pm

Please tell Teresa if you are coming: teresa.curcio@csiro.au

[ page top ]


4. tardir utilities

Cherax will be unavailable from 9:30am to 6:30pm on the next two Saturdays: 28th July and 5th August. After the second of these outages, cherax will be running Novell's SUSE Linux Enterprise Server version 9 (SLES9) instead of a variant of Red Hat Enterprise Linux.

Sometime before we go into production with the new cherax environment, we recommend that users of cherax logon to cherax-1, a test setup for the new cherax. Note that your home directory on cherax-1 is just a remote (NFS) mount of your home directory on cherax. Please make sure that your work functions on cherax-1 so that we have time to fix any problems before the new cherax goes in to production on 05/08.

On cherax-1 and on the future cherax you have to use the module command instead of pkgenv. See the item below.

[ page top ]


5. tardir utilities

Would any remaining users of the tardir utilities (for tarring up directories of files) please contact Rob. Bell. on 03 9669 8102.

There have been recent updates to the utilities.

[ page top ]


6. CSIRO: Modules coming, pkgenv going

CSIRO users of the HPCCC have been making increasing use of Gareth Williams' "pkgenv" system to set up their environment to use particular optional software, such as the sxcross software. However there is a similar system called "modules" which is somewhat standard, at least in APAC grid related sites. So we are switching to that.

The module command is available by default on login shells on cherax and burnet. We normally recommend that batch jobs are run as login shells so that they have the same environment as is used during interactive testing. You can achieve this by having the following first line in batch scripts:

     #!/bin/sh -l

The module command is not available by default in all shell script executions on cherax and burnet, nor is it available by default on any scripts on the TX7s or SX6s. To guarantee that module is available it is always safe to do the following on any HPCCC system:

In a csh/tcsh environment put the line

     source /usr/local/etc/hpccc.cshrc

or in a sh/bash/ksh/pdksh/attksh environment put the line:

     . /usr/local/etc/hpccc.shrc

You can check if the module command is available and find out what subsystems it supports by doing:

     module whatis

if, for example, you want to use the latest SX cross compiler on cherax you can then do (interactively or in your batch script):

     module load sxcross

For more information on modules please consult:
http://intra.hpsc.csiro.au/userguides/faq/env_modules.php.

[ page top ]


7. Matlab available on burnet

Matlab is available on burnet. To use it:

  1. Make sure X windows is working from your current session by trying an xclock (or other suitable) command.
  2. Login to burnet.hpsc.csiro.au with "ssh -Y" (if available) or "ssh -X" (if -Y gives an error).
  3. qsub -I -l vmem=1000mb,walltime=1:00:00 -v DISPLAY This will start an interactive batch job running matlab on one of burnet's nodes.
  4. Do: "module load matlab" (see item on modules above).
  5. Run matlab to bring up a matlab development environment.

It is also possible to use existing matlab M-files in noninteractive batch jobs using

    matlab -nodesktop -r MATLAB_command

[ page top ]



BoM Solar Help:

CSIRO ASC Help:

For urgent help at all times:
  • CSIRO users 0428 108 333
  • Bureau out of hours emergencies are managed through internal policy
HPCCC WWW Site: http://www.hpccc.gov.au/
CSIRO External ASC Site: http://www.hpsc.csiro.au/
CSIRO ASC Users' Site: http://intra.hpsc.csiro.au/

Comments to:


© Copyright 2010, CSIRO Australia
Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement