Bulletin 135 - 2005 Mar 23

  1. HPCCC - NQS II and ERS II upgrades, scheduler and batch changes
  2. New versions of netCDF libraries and utilities
  3. Priority scheme
  4. cherax and CSIRO Data Store updates

1. HHPCCC - NQS II and ERS II upgrades, scheduler and batch changes

1.1 New versions of NQS II and ERS II

New versions of NQS II and ERS II were installed on the SX-6/TX7 system and front-ends between 07:30 and 10:30 on Tuesday 22nd March. The new version has support for many new features, including the "-l cpunum_job=ncpus" parameter.

Details of the changes can be found at http://www.hpccc.gov.au/hpccc/userdocs/NQS+ERS.upgrade.2005-03.shtml

The output of erstatj and qstat commands has been altered slightly - there is a new field for the no-migrate flag (see item 1.3 below), and the priority field in the erstatj output has been widened (see item 4 below).

The was one un-anticipated problem. For some users, rsh commands from the SX-6 nodes to the TX7s hung.

If you encounter this problem, please contact HPCCC staff for assistance.

We do not understand yet the exact environment that leads to the hangs, but it may be related to the use of the NQS II -S parameter, whose behaviour has changed (see HPCBull 134.2.3). It is preferable not to use this parameter to specify a batch job shell, but to use #!/bin/ksh as the first line of a script to specify the shell.

We do know that in some circumstances, adding a -n flag to the SX-6 rsh commands (after the host name) by-passes the problem (by suppressing the reading of standard input).

1.2 "-l cpunum_job=ncpus" parameter

Users should now add the "-l cpunum_job=ncpus" parameter to all NQSII jobs, with a deadline of 11th April for this to be done. Jobs should continue to also specify the parameter "-l cpunum_prc=ncpus", to specify to the Gang Scheduler how many CPUs will be used by each executable. For example, ensemble jobs should specify values of perhaps 8 and 1 for these parameters.

The qsub wrapper will be updated to allow processing of this parameter for all queues.

1.3 NQS II no-migrate, no-hold/checkpoint and no-rerun flags

There are three facilities that users should consider on all jobs.

They are:

  • the ability of jobs to be re-run if a system failure is encountered
  • the ability of a job to be held and checkpointed (needed if there is an overload on a node, or when a shutdown is scheduled)
  • the ability of a job to be migrated to other nodes (if an overload is detected, or if a node is required to be taken down for maintenance or upgrades).

These abilities can be specified with the qsub parameters

-r y | n
selects re-runnable or not
-H y | n
selects holdable or not
-J y | n
selects migratable or not

The default is "y" for these three options.

We recommend that users select the options -r y -H y -J y. The exceptions are:

  • For users using local file systems on nodes, use -r y -H y -J n (although the -G option for specifying files to be migrated with a job might be overcome the restriction).
  • For large memory jobs, use -r y -H n -J n (but consult HPCCC staff first before using this - general jobs unable to be checkpointed may block urgent operational work).
  • For jobs updating files, use -r n -H y -J y, since the effect of a re-run may be undesirable - think of a rerun of a transaction, such as updating a bank balance.
  • For operational and urgent jobs, use -r y -H n -J n

In general, allowing your job to be migratable should give you better throughput.


2. New versions of netCDF libraries and utilities

New versions of netCDF libraries and utilities are now available as development versions for evaluation.

The new versions, netCDF 3.6.0p1 libraries and utilities ncdump and ncgen, are installed on all SX-6 nodes and the cross environments gale/eccles/mawson/cherax for execution on the SX-6s.

Note that the default versions have not been changed, but will be at a later date if the new versions are found to be satisfactory.

Documentation for netCDF 3.6 is at the UCAR website:
http://www.unidata.ucar.edu/content/software/netcdf/docs/

The detailed release notes and directory structures for the new versions, prepared by Stephen Leak of NEC, are available on the HPCCC web site, under User Documentation.
http://www.hpccc.gov.au/hpccc/userguides/faq/netcdf-3.6.0p1.php

(However, please contact hpchelp rather than Stephen Leak for assistance.)

NEC will consider the provision of additional netCDF utilities for the SX-6s as required.


3. Priority scheme

The ERS and NQS system on the SX-6s has provision for a priority scheme. This will allow users to specify a priority for a job with the -p parameter.

The HPCCC sees the need for the scheme, to allow users to select the priority of their work - to get fast turnaround for development work, and to defer less time-critical work.

The priority scheme will not alter the over-riding URGENT class jobs, e.g. operational work.

The scheduler can be set to give significant weight on the priority parameter, so that higher priority work would be more likely to start, and more likely to stay running. The scheduler would also be set up so that higher-priority would incur a higher 'charge' within ERS, so persistent use of higher priority would result in later work receiving lower priority.

The priority specified on the jobs would also be reflected into the priority of access to processors at execution time, which would become active when there is over-commitment on nodes.

User feedback on this proposal is sought.


4. cherax and CSIRO Data Store updates

A patch was installed on cherax on Saturday 19th March, to enable better handling of the inode hash table in the kernel, and to enable accounting to be invoked without provoking crashes.

The changes to the inode handling provided dramatic performance improvements, and the long delays in such things as character echo and file listing should be reduced.

For example, we typically saw a delay of 10 to 100 s in the execution of a command (which should have taken 55 s) at least once per day. In the first few days after the patch install, the worst case delay was 2 s, with an average of about 0.1 s.

For system-related commands dependent on file system scans, the improvement has been dramatic - this will enable much more responsive services on the Data Store.

Here are some examples of typical elapsed times for tasks before and after the patch install.

Task Before After
DMF startup dmdaux /backup up to 1.5 hr 120 s
DMF startup dmdaux /cs/datastore up to 1.5 hr 380 s
dmfsfree /backup 55 min 48 s
dmfsfree /cs/datastore 1.5-2 hr 5 min
Dump /cs/datastore scan ~1 hr ~2 min
Directory and non-directory data 7-8 hr ~45 min
The entire dump cycle ~ 10 hr < 2 hr
dmscanfs > 45 min 5 min
dmscanfs -r > 1.5 hr 16 min
dmhdelete ~1.5 hr 6 min



BoM Solar Help:

CSIRO ASC Help:

For urgent help at all times:
  • CSIRO users 0428 108 333
  • Bureau out of hours emergencies are managed through internal policy
HPCCC WWW Site: http://www.hpccc.gov.au/
CSIRO External ASC Site: http://www.hpsc.csiro.au/
CSIRO ASC Users' Site: http://intra.hpsc.csiro.au/

Comments to:


© Copyright 2010, CSIRO Australia
Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement