|
Bulletin 154 - 2006 April 13
1. Interruptions to service There was a fault in the electrical services to the machine room around 23:15 on 2006-03-22. Many services were impacted, including the SX-6/TX7 system, and all the CSIRO systems except the clusters (which have dual power supplies). Services were not restored to cherax until about 11:00 on 23 March. One of the CSIRO servers providing local network and mail services did not recover, and a new system had to be built to handle those services. There was a gap in local mail services, including to the req system. [ page top ] 2. Build up of SX-6 workload Last week, the workload on the SX-6s built up to high levels. Please note that operational work has the highest priority on the system. Therefore, other work, will get pushed off nodes, and, if you select the no-migrate option, can wait for hours until a particular node becomes free - this was seen several times during the times of high demand. The delays are even worse for multi-node jobs, because all the selected nodes have to be free for a job to restart. The HPCCC advice is to use GFS and allow job migration up until the final stages of development of operational model runs. [ page top ] 3. Use of rsync command for archiving Recently, a user reported very slow times to copy files from the SX-6/TX7 to the Bureau's SAM-FS archive store. It turned out that the user was transferring files to over-write existing files in SAM-FS. When rsync is used to update a file over a NETWORK it examines the file to be updated to determine how much new (incremental) data it needs to send to update the file, in an effort to keep the network usage to a minimum. When the file resides on an HSM file system this causes the file to be recalled from tape: thus the rsync copy can not start until the file has been returned from tape. In this case, this is a bad idea, since it was found that a copy command typically taking 15-20 s was now taking 15-20 minutes. Our advice is to always specify the -W, --whole-file copy files whole (avoiding the file recall.) The following flags should never be used unless you know what you're doing,
-c, --checksum
--no-whole-file
Similar behaviour will occur when archiving to the CSIRO Data Store. See also the req FAQ #39 for techniques for error checking and recovery of transfers. [ page top ] 4. Change of SX-6 share structure for BMRC users The ERS scheduler on the SX-6 system keeps a database of users and their shares and resources used, to enable it to schedule work according to pre-defined shares of the system. It has been decided to move all the BMRC users into one group, to reduce the work required by HPCCC staff to have the share structure reflect current groups and staff placement. There is no great loss of functionality:
BMRC users may need to change their .acct files on all systems that they submit jobs from, but NOT until contacted by HPCCC staff. [ page top ] 5. Use of the TX7s systems for applications Since the start of the SX-6/TX7 service, HPCCC staff have been focusing on file system services from the TX7s servers, and have therefore restricted the running of applications on those systems to maximise their reliability for that function. With the current stability of the TX7 systems, the HPCCC will entertain limited applications execution on the TX7s, particularly those involving data handling functions. However, the impacts of applications development and execution and how they affect SX-6 I/O must be monitored, so each user who desires to run an application on TX7 must contact the HPCCC for pre-approval, with details of the application and its runtime constraints. Approved applications will be repetitively executed applications rather than one-off developments, and will be related to either NMOC production, pre-production testing, or a long-term and repetitive research project. Initial applications are envisaged to be file housekeeping tasks that will run more expeditiously on TX7 as compared to SX6. Note that reliable synchronisation of SX6 and TX7 jobs can be non-trivial, and that aspect must be designed for self-recovery or auto-restart. Currently, TX7 queues have fairly short time limits - you can see the limits with a command like:
qstato -Q -f txbm0
See the req FAQ #40 for more details. When an application is approved for porting to TX7, appropriate runtime arrangements will be provided. [ page top ] 6. MPI jobs - output and error file handling For MPI jobs on the SX-6s, if you change a line like: mpirun -v -np 1 -max_np 3 oasis3to: mpirun -v -np 1 -max_np 3 /usr/lib/mpi/mpisep.sh oasis3 then stderr of each process will be separated into files called stderr.${MPIUNIVERSE}:${MPIRANK}. If you first set the environment variable MPISEPSELECT, then the output will be split up as follows: (see MPI Users Guide chp 3.3)
MPISEPSELECT=1 : stdout separated into stdout.$UNIVERSE:$RANK
MPISEPSELECT=2 : stderr separated into stderr.$UNIVERSE:$RANK
MPISEPSELECT=3 : both of the above
MPISEPSELECT=4 : stdout and stderr for each process separated
into std.$UNIVERSE:$RANK
[ page top ] 7. SX-6 multi-node job limits NEC uses the terminology 'job' to mean that part of an NQSII request running on one node. So, the limits available to queues and jobs, listed as 'Per-job' with the 'qstato -Q -f queuename' command, are really per-node. In particular, settings like #PBS -T mpisx -b 6 #PBS -l memsz_job=60gb request 6 nodes and 60 Gbyte of memory per node, and will not run on the HPCCC SX-6s, which have a usable area of about 54 Gbyte per node. [ page top ] 8. Upgrade to the CSIRO Altix (cherax) operating system and DMF Within the next two months, the HPCCC plans to upgrade the operating system on cherax to SuSE running the 2.6 Linux kernel. This will bring much-needed performance and scalability enhancements - it is hoped that the irritating slowdowns will become much rarer. It will also unfortunately mean a changed environment. As well, an upgrade to DMF is planned, bring new features like the ability to have parts of files on-line and other parts off-line - this would allow, for example, the metadata of a file to be always on-line, with the actual data off-line. We plan to have a partition of cherax set up with 16 processors running the new operating system, to allow users to test the new operating system environment prior to any cut-over. [ page top ] 9. cherax outages There was a failure of the FC switch on cherax at 04:37 Sat 8 April. The root file system subsequently filled. Service was restored by 20:24 Sat. SX-6/TX7 jobs accessing cherax may have failed during this time. There were subsequent problems on Monday 10 April caused by a temporary shortage of tapes - nearly 2 Tbyte of small files were added over a weekend. There is to be an outage of cherax on Wednesday 19 April from about 08:00 to 10:00, to allow for the replacement of a faulty power supply. [ page top ] 10. CSIRO APAC grant and usage information All CSIRO APAC users should now be able to access the information on CSIRO partner grants and usage at at https://nf.apac.edu.au/PARTNERS/CSIRO/or from http://intra.hpsc.csiro.au/user/usage/apac/[ page top ]
|
|
Comments to: © Copyright 2010, CSIRO Australia Use of this web site and information available from it is subject to our Legal Notice and Disclaimer and Privacy Statement |