Bulletin 136 - 2005 Apr 08

  1. HPCCC Userguides
  2. Change of home base for the req (wreq) problem reporting system
  3. pkgenv utility on the SX-6/TX7 system
  4. qsubnew command for the SX-6/TX7 system, and -l cpunum_job parameter
  5. Downtime - CSIRO IBM clusters
  6. Fixed dmls command on cherax
  7. Old backups of SX-5 /cs/home area
  8. Backups of SX-6/TX7 file systems
  9. Future HPCbull format
  10. Downtime - Sam (Bureau Users)

1. HPCCC Userguides

The HPCCC Userguides have been updated at

http://intra.hpccc.gov.au/userguides/

or

http://intra.hpsc.csiro.au/userguides/

The Userguides should be available from the "User Documentation" link from the top-level hpccc.gov.au WWW pages within a fortnight.


2. Change of home base for the req (wreq) problem reporting system

The problem reporting system known as wreq or req, will change webserver from www.hpc.csiro.au to intra.hpccc.gov.au at the end of next week.

All email addresses will remain unchanged. The only users affected will be those who use the web interface, and have "Bookmarks" pointing to www.hpc.csiro.au. Substitution of intra.hpccc.gov.au into such bookmarks should fix the problem. For example, the new URL to get a list of current requests will be

http://intra.hpccc.gov.au/cgi-bin/wreq/req?list

or alternatively as the CSIRO address alias

http://intra.hpsc.csiro.au/cgi-bin/wreq/req?list


3. pkgenv utility on the SX-6/TX7 system

The pkgenv utility has been installed on the SX-6/TX7 system.

This utility, which has been available on other HPSC/HPCCC systems, allows easy selection of software packages by users. We have typically used it to augment paths and set up environment variables to enable users to use add-on applications and newer versions.

Type pkgenv for help and a list of available packages.

The first package is commonbin, which augments the users' path to include a new common area across the SX-6/TX7 complex.

New versions of do_tx6 and do_tx7 are in there - these new versions include support for the -n option, mimicking the -n option on rsh commands. These versions will replace the existing versions shortly.


4. qsubnew command for the SX-6/TX7 system, and -l cpunum_job parameter

A new version of the local qsub wrapper called qsubnew is available for users to test. This version supports the -l cpunum_job parameter fully into the scheduling. It is installed in the same location as the current qsub wrapper, and will eventually replace it.

Please continue to add the -l cpunum_job=ncpus parameter to your jobs, so that more accurate placement of jobs on node can be made in future with an increasing workload.


5. Downtime - CSIRO IBM clusters

The CSIRO clusters will be unavailable on Saturday 9th April (tomorrow) from 08:00 to 13:00 during a software upgrade.


6. Fixed dmls command on cherax

A new version of DMF user commands will be installed on cherax on Saturday 9th April (tomorrow), to fix a problem which caused dmls commands to abort when there were NFS problems.


7. Old backups of SX-5 /cs/home area

As previously announced, backups of the SX-5 /cs/home areas will be deleted by the end of May 2005, a year after the close of the SX-5 service.

Copies of users' /cs/home areas at the time of closure remain on cherax in /cs/datastore/SX5userdata (these areas were not moved into users' $HOME area on cherax).


8. Backups of SX-6/TX7 file systems

The file systems listed below are backed up monthly into the Bureau's Bladestore. There is an additional backup done weekly with rsync, but each of these over-writes the previous week's backup.

/bm/nmoc/keep /bm/home /bm/share /bm/keep

The /cs/home file system is backed up nightly onto cherax using rsync, with only one physical copy of each unique file kept, but with each night's backup area giving the appearance of the complete file system at the time of backup (using hard links).

The reason for this backup regimen is a weakness in the Clusterpro failover Software, which has often resulted in unnecessary SX-GFS interruptions initiated by mistaking backup loads for system problems.

The HPCCC is endeavouring to re-introduce regular backups according to previous policies, as used for the SX-5. At least two possible solutions will be evaluated over the next months.

Please consider your strategies for protecting important data.


9. Future HPCbull format

The author of the HPCbulls is trying to reduce the length of the HPCBulls - we will try to put more information in future into the local Userguides, and simply reference the updates.

We are also considering just sending out notification of the release of each HPCbull by e-mail (without the substance), with a reference to the WWW location.

Your comments on this are sought.


10. Downtime - Sam (Bureau Users)

This coming Wednesday, 13th April 2005: start time 8:15 AEST, Finish time 11:00.

Sam will be unavailable for 2 hours 45 minutes during which its file systems will be checked, secondary boot updated and preparatory work will be carried out for a VxVM upgrade in May.



BoM Solar Help:

CSIRO ASC Help:

For urgent help at all times:
  • CSIRO users 0428 108 333
  • Bureau out of hours emergencies are managed through internal policy
HPCCC WWW Site: http://www.hpccc.gov.au/
CSIRO External ASC Site: http://www.hpsc.csiro.au/
CSIRO ASC Users' Site: http://intra.hpsc.csiro.au/

Comments to:


© Copyright 2010, CSIRO Australia
Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement