Bulletin 193 - 2009 October 21

  1. Preparation for departure of SX-6
  2. Cherax: outage
  3. CSIRO Data Store: space quotas
  4. Burnet: return to service
  5. CSIRO ASC: Training session for Accelrys Materials Studio
  6. CSIRO ASC: Software Upgrades and Default Changes

1. Preparation for departure of SX-6

The SX-6's will no longer be available from early next year, and the CSIRO deadline for migrating from the SX-6 is the beginning of December - after this time the CSIRO nodes (sx600-sx604) will no longer be available.

The replacement Sun constellation at NCI is available now, and the Sun constellation being installed at BOM will be available soon.

1.1 Porting to Sun: Third Party Software

If you are using any third-party software on the SX-6 or TX-7 and will need it to be available on the new Sun, please let us know at or

1.2 CSIRO SX-6 Data

Files will not be copied for users from the SX-6/TX7 to either of the new Sun Constellation systems.

Files in CSIRO $HOME are backed up and an archive will be kept of these files but it will be non-trivial to extract files from the archive. If you need to copy files from the SX-6/TX7, please do so before it becomes unavailable rather than relying on the backups.

Though CSIRO will soon not have access to SX-6 nodes, there will still be access to the TX7s so there is a limited window of time to move files. Please clean up after yourself so we have less to archive.

Files in CSIRO $DATADIR and $WORKDIR are not backed up - it is up to each user to ensure any files you wish to keep are moved from $DATADIR (or $WORKDIR) to cherax, burnet or one of our partner systems (the NCI Sun Constellation is available now).

More than 70% of the data currently in $DATADIR (~680/940GB) has not been used or modified for at least three months - if you have forgotten data on the SX-6 it is at risk of being lost after December.

Please log on and check your data holdings today, delete any that is no longer needed and back up any that you still need.

1.3 BOM SX-6 Data

BOM data in SX-6 $HOME and $DATADIR will not be reproduced on the Sun Constellation - users should individually ensure any files they wish to keep are copied over.

The SX-6 filesystems will remain accessible until March 2010.

[ page top ]



2. Cherax: outage

Around 11:40 on Friday 16th, cherax crashed due to a memory failure.

There will be an outage from 16:00 on 28th October to replace the faulty memory chips. Long-running jobs may not start until after then. There will be extra opportunities for short jobs!

[ page top ]



3. CSIRO Data Store: space quotas

At present, the only user quotas on the /cs/datastore areas ($HOME) on cherax are for inodes (numbers of files and directories), with a default value of 150,000.

Recently, a few very active users filled most of the 6.6 Tbyte of space in the file system, and it filled up, interupting service to all users. To recover, most files had to be removed from the cache disk (then resident on tape only) and the system was difficult to work with for an extended time afterward until the cache was repopulated.

To avoid this situation repeating, we plan to introduce an on-line space quota of (initially) 2 Tbyte per user on 21st October 2009.

This value may be reviewed over time.

This quota will apply to the on-line storage, not to the total storage managed by DMF for each user.

Some users may encounter problems with this during periods of heavy usage. The solution is to use the dmput -r command (on recalled files, not on new files please), to release the disc space occupied by files as processing on them completes.

See man dmput and the Data Store Userguide at http://intra.hpsc.csiro.au/userguides/ds/ for more information.

If you use (and recall) files owned by a colleague, this will count to their usage, and you can't dmput files you don't own. This will make any failures interesting when quotas are reached, but will be a far better situation than filling the file system.

(On 8th October, data from four users occupied over 0.5 Tbyte each, with the total usage by these users being nearly half the available space. We have seen instances where one user's files occupied about half the space.)

[ page top ]



4. Burnet: return to service

The CSIRO burnet cluster is back in operation, after an extended outage.

The problem was with the file servers: the change to using NFS with a higher-performance disc configuration merely exacerbated an underlying problem, which turned out to be a bug in the Linux network driver code. A fix was found and installed. (Although the bug was over two years old, the patch was not in the current distributions.)

At present, only one file server is in use, on temporary hardware. More outages may be needed in the future to get a permanent set-up.

Thanks to the system team, and especially to Daniel Smith, who have spent many days (including weekends) on the problems.

[ page top ]



5. CSIRO ASC: A free introductory training session for Accelrys Materials Studio

Accelrys Materials Studio (http://accelrys.com/products/materials-studio/) is a powerful software package for simulating chemicals and materials at atomic and meso-scales. CSIRO has licenses for the software, which can be accessed through IMT's Advanced Scientific Computing (http://intra.hpsc.csiro.au/).

A FREE introductory training session will be held on Thursday 29th October, from 9am-noon, Melbourne time. The session is conducted via the web, so you can join in from anywhere with a phone and computer.

Or you may care to join us in the Rivett Central Seminar Room (2.07) at CMSE Clayton, where morning tea will be provided!

Presented by the Computational and Simulation Sciences and the Advanced Materials TCPs

For more information please contact Kate Nairn ( ) or Ming Liu ( ). Please RSVP to Kate Nairn ( ), who will pass on details for web access and training materials.

[ page top ]



6. CSIRO ASC: Software Upgrades and Default Changes

The following have been recently installed:

  • Matlab R2009b (32 and 64 bit) (burnet,myhost)

The following software will have the default version changed on or after the 4th November:

  • Intel MKL - default upgraded to 10.0.1.014 (previously was 9.1.023)

    The default version of software can be loaded by specifying the software name without the version number in the 'module load' command.

    For example:

      module load intel-mkl
    


BoM Solar Help:

CSIRO ASC Help:

For urgent help at all times:
  • CSIRO users 0428 108 333
  • Bureau out of hours emergencies are managed through internal policy
HPCCC WWW Site: http://www.hpccc.gov.au/
CSIRO External ASC Site: http://www.hpsc.csiro.au/
CSIRO ASC Users' Site: http://intra.hpsc.csiro.au/

Comments to:


© Copyright 2010, CSIRO Australia
Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement