Bulletin 127 - 2004 Oct 15

  1. HPCCC SX-6/TX7 extended outage on 14th October.
  2. HPCCC changes to NEC Global File Systems
  3. HPCCC - SX-6/TX upgrade update
  4. Data protection - warnings
  5. HPCCC - visits
  6. HPCCC Applications support services
  7. HPCCC - expanded queues on the TX7s
  8. cherax - upgrade
  9. cherax - update on problems
1. HPCCC SX-6/TX7 extended outage on 14th October.

The scheduled outage on the morning of 14th October to do electrical work for the upgrade, encountered problems with other upgrades being attempted at the same time on the console systems.

The authentication services did not come back properly after the outage, and the downtime extended to 11:50 local time.

Also, some of the jobs services failed to start correctly, and jobs were constrained in the nodes they could reach - these problems were fixed in the afternoon.

As well, we are reviewing the notification procedures - not all users received advance notice of the outage.

2. HPCCC changes to NEC Global File Systems

The HPCCC has been evaluating a new feature from NEC which provides greater resiliency for the Global File Systems (GFS) in the event of the failure of a file server. The feature, called SYNC, ensures writes are completed to the disc before continuation of processing. The HPCCC believes it is important to implement this feature for the highest possible integrity of the GFS file systems.

In a previous evaluation in July, the SYNC feature was found to have a serious impact on I/O performance. Since then, NEC has done extensive testing, and made a number of tuning changes that have made the feature work with minimal impact on applications, providing the applications do not perform extensive i/o with small block sizes.

Prior to the upgrade, it is imperative that any small block i/o is minimised to maintain I/O performance. NEC testing has shown that appropriate use of certain tuning methods can improve application I/O on the current GFS, and after upgrade to the SYNC GFS, I/O performance will be within 1-2% in the majority of cases. Note that for 'un-tuned' applications, the difference in I/O performance could be up to 2.5 times slower with SYNC.

The HPCCC is planning a phased implementation of the SYNC feature, on a per file system basis, starting from mid-November. The first file systems upgraded will be the CSIRO file systems, followed by the Bureau non-operational file systems, and finally the operational file systems.

In preparation for the introduction of SYNC GFS, users are urged to reread HPCbull 125 item 1, and HPCBull 97, items 6 and 7, available at http://www.hpccc.gov.au/hpccc/user_news_advice/news/index.shtml

HPCCC staff plan to contact major user groups to confirm the state of their applications, and to help users understand and implement the minimal changes necessary.

3. HPCCC - SX-6/TX upgrade update

The hardware for the SX-6/TX7 upgrade has been delivered, and is being installed.

HPCCC staff members in conjunction with NEC are planning configuration changes for the new components. In particular, extensions to the queues and file systems are being mapped out.

If you wish to provide suggestion on how the queue and file systems structures might be improved to facilitate your work, please contact the HPCCC Help Service. Extended $WORKDIR and $DATADIR areas are being planned.

4. Data protection - warnings

Please note that with the SX-6/TX7 large disc areas, and lack of a need to flush for long periods, it is easy for us all to become complacent about protection of files.

The following are the only SX-6/TX7 user areas which are backed up:

Bureau users: $HOME - a weekly backup to ??? disc and tape is done. CSIRO users: $HOME - an incremental dump is done every night into the cherax data store.

Note that none of the $WORKDIR or $DATADIR areas have any backup done. If you accidentally delete or corrupt a file there, we have no way to help you.

The optimum strategy is to use these areas, but ensure that critical data is stored elsewhere as well - in the data stores.

Flushing will be increasingly invoked on the $WORKDIR areas to ensure large space is always available. However, the $DATADIR ares are not subject to flush, and will become harder to manage as they fill up - we will have to rely on users self-managing the areas with reports available from the HPCCC.

Quotas have been imposed recently on the Bureau $DATADIR and $WORKDIR areas.

(The incremental dump of the CSIRO $HOME areas on the SX-6/TX7 system uses the rsync utility, in a way that allows us to view the state of the CSIRO $HOME file system at various times in the past, but keeps only one physical copy of each file).

On cherax, only the $HOME area is subject to backup - daily incremental and weekly full backups. However, these backups cover only un-migrated files and the metadata of migrated files. Backups are effective only for the the previous 35 days.

It is good practice to remove all write access for important file holdings, to prevent accidental deletion: for example

 chmod -R a-w my-jewels

Note that there is currently no active off-site storage of the files in the CSIRO Data Store - if this is a concern to you, we suggest critical data like source code be stored as well at local sites.

5. HPCCC - visits

The HPCCC is happy to have users come to visit us to discuss issues regarding the usage of the facilities, and is also happy to host visits for HPCCC users to the Central Computing Facility (CCF) to see the actual machines.

If you would like to visit the CCF please contact the HPCCC Help Service.

6. HPCCC Applications support services

Under the Bureau contract with NEC for the supply of the SX-6/TX7 system, the Bureau has access to NEC's applications support services. Under a separate arrangement, CSIRO has recently arranged for its users to have limited access to such services.

If you wish to use these services to provide:

  • porting of applications
  • assistance with optimisation
  • assistance with parallelisation of applications
for the SX-6/TX7 system, then please contact the HPCCC Help Service.

The NEC analysts have provided significant performance and throughput improvements for many applications.

7. HPCCC - expanded queues on the TX7s

We now have extra queues on the TX7s, allowing separation of operational and non-operational work, and separate queues for each TX7, so that if one is unavailable, batch processing can continue on the other.

All users should submit jobs for the TX7s to the queues tx or txrt - these routing queue then direct the work to the appropriate execution queue, as follows!

tx -> txbm
          -> txbm0
          -> txbm1
   -> txcs
          -> txcs1
          -> txcs0

txrt -> txbmrt
               -> txbmrt0
               -> txbmrt1
     -> txcsrt
               -> txcsrt1
               -> txcsrt0
8. cherax - upgrade

The SGI Altix has now been upgraded to 64 1.3 GHz processors. As well, the memory has been upgraded to 174 Gbyte, shared by all the processors.

This gives the system an interesting capability for very large problems.

9. cherax - update on problems

Since the last HPCbull, we have had continuing problems with cherax.

  • We have had three further crashes. SGI is analysing the dump from one of these.,/li>
  • We have had worsening slowdowns. SGI staff from the development centre in Melbourne are giving close support to try to identify the cause. When the system slows, even characters typed in a window are not echoed immediately - we suspect the kernel is locked in some state which prevents most processes from getting access. On the afternoon of 13th October, the system went so slow that we took the decision at 17:15 to re-boot.
  • The /cs/datastore file system filled on at least two occasions. We have implemented a wrapper script to dmget, so that requests to recall large amounts of data are broken into smaller chunks. We have also increased the target amount of free-space DMF will keep on the file system from 10% to 15%. We have also implemented a DMF feature called FREE_DUALSTATE_FIRST. This means that when space becomes low, the system will firstly delete the data blocks for files which are already on tape, rather than seeking to write data to tapes. We hope this will improved the responsiveness when the file system is filling.

One consequence of this last change is that you will see a different behaviour in the selection of files to be removed from disc - the FREE_DUALSTATE_FIRST option selects dual-state files first, sorted by age, and then regular files, sorted by age, when seeking to create more free space. When you recall a file with dmget, its access time does not get updated, and it can be an ideal candidate when the system is short of free space. Please use the touch -a command before a dmget in these circumstances.

As well, to improve turnaround for short batch jobs, a new queue called 'short' has been created.

  • jobs in the queue will be limited to 4 running at a time, 10 CPUs total, and 10 Gbyte of memory.
  • each job will be limited to 15 minutes of CPU time, and 10 Gbyte of memory.

New jobs which do not specify a queue will go to queue short or queue batch, depending on the resources requested or picked up as defaults.

To get the most out of the new queues, values should be specified for CPU time and memory limits, avoiding the defaults: e.g.

#!/bin/tcsh
#PBS -l cput=0:12:00,walltime=0:30:00
#PBS -l mem=100mb
#PBS -l ncpus=1
#PBS -j oe -r n


BoM Solar Help:

CSIRO ASC Help:

For urgent help at all times:
  • CSIRO users 0428 108 333
  • Bureau out of hours emergencies are managed through internal policy
HPCCC WWW Site: http://www.hpccc.gov.au/
CSIRO External ASC Site: http://www.hpsc.csiro.au/
CSIRO ASC Users' Site: http://intra.hpsc.csiro.au/

Comments to:


© Copyright 2010, CSIRO Australia
Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement